|
|
|
|
Gray List Filtering (GLF)
A form of email filtering that hits SPAM at its Achilles heal
By: Dave McMahon, Perneus Inc.
Created: July 31, 2005
eMail that is sent from one place to another over the Internet is
transmitted by a Mail Transfer Agent (MTA) using a standard, open
protocol, called Simple Mail Transfer Protocol (SMTP).
All mail servers, regardless of vendor, use the SMTP protocol to
communicate with domains outside their own. All movement of mail between
domains is done through the communication between MTAs using SMTP and is
not done directly by the client application. Microsoft Exchange is an MTA, Outlook
is an email client. By domains I mean harmonixmusic.com, perneus.com,
aol.com,...
Grey List Filtering (GLF) works by using a feature of SMTP that allow a
receiving MTA to communicate to a sending MTA that the receiving MTA is
busy and to try and send the email at a later time.
When a mail message is received from a remote MTA, the receiving MTA
(running GLF) makes note of the sender (someone@perneus.com), who the
mail is for (someone@harmonixmusic.com) and the IP address
(66.92.85.252) of the sending MTA. The receiving MTA looks up this 3
tuple in a local (in memory) database to see if there is a match. If
there is a match and the tuple is whitelisted (allowed to pass), the
mail travels on it's way unimpeded. If there is no entry in the local
database, the receiving MTA inserts the tuple into it's local database,
along with the time it received the messages and returns an SMTP code
(421) which tells the sender that the receiver is busy right now and to
retry sending the message later. This "I am busy message" is a very
common occurrence for MTA that handle large domains and are considered
"normal" operations. Most users never see any hint of this normal
activity and might occasionally see a mail message arriving a few
minutes late. When a legitimate sending MTA does as instructed and tries
again later, the tuple for the mail message will be in the local
receiving MTAs database and the message is now "whitelisted", passing
without interruption. GLF then tags the database with the message time
and date and then starts a whitelist count-down timer. This timer will
allow messages sent for the matching tuple before the timer expires to
pass through unimpeded. Upon receiving a message that matches a
whitelisted tuple, and the white list count-down timer has not expired,
the message is sent without interruption and and the timer is reset to
the whitelist count-down timers initial value.
As an example, if I send mail to someone@harmonixmusic.com for the first
time, my MTA will be told to try again in 5 minutes. If I am a
legitimate person with a normal MTA, it will dutifully resend the
message again after waiting, and the harmonixmusic.com MTA will accept
it and deliver it. The harmonixmusic.com MTA will then set the whitelist
count down timer (configurable) for 3 days. If I send mail from the same
tuple to someone@harmonixmusic.com within 3 days, it will go through
directly and be delivered to someone@harmonixmusic.com without delay.
The harmnixmusic.com MTA will reset my whitelist timer to 3 days.
As long as I send someone@harmonixmusic.com messages from the same tuple within the whitelist
count-down timer, the message will be delivered without delay to
someone@harmonixmusic.com and
I will get another 3 days before the timer expires. If I exceed the 3
days, we start the process over again and my MTA will get the "Try again
later" message.
So, why does GLF work to substantially reduce spam? Spammers can not
send mail from a fixed IP address (one member of the 3 tuple) because
it's too easy to block a particular IP or series of IPs. How they "hide"
there actual source IPs is that they use compromised relay MTAs. They
use a fire an forget technique to spam you. They send a bunch of
messages through a bunch of compromised relay MTAs and hope they arrive.
What GLF does is force the spammer to keep track of the SMTP transaction
(resource intensive) and to reveal their real IP. Since they are
depending on low or no incremental cost, they can't afford to get tied
up listening to the response from the receiving MTA and setting timers
to try again in a specified period. Most importantly, to have this 2 way
conversation, they must reveal their true IP.
Users of the GLF report going form 100-1500 messages a day of spam to 1
or two. The one or two are from known IP address and can then be easily
blocked. We have it running on a machine that filters about 15 domains
for various customers and have not had any complaints about mail not
delivered. Spam for my personal domains that were getting about 100
messages a day of spam have gone to zero. Again, no missed email
complaints. We also filter our companies email (perneus.com and
perneus.net) without complaint.
GLF works best on the fringe advertising and does not work for
legitimate companies sending bulk email. If you are trying to stop the
vicodin and viagra ads, GLF is superb. If you are trying to stop Verizon
ads, it does not help.
GLF may be defeated by spammers, but this substantially raises the bar
to do so. GLF also reduces the load on your MTA since it receives only
the email header before stopping it. Spam filtering like Bayesian
filters require you to receive and analyze the entire message.
Check out these other Birds-Eye.Net papers/products
regarding Email:
Can Birds-Eye.Net help you or your Company?
Receive your Birds-Eye.Net articles and white
papers hot off
the presses by adding our RSS feed to your reader.
|
|
(C) Copyright Birds-Eye.Net, All rights reserved.
It is against the law to reproduce this content or any portion of it in any form without the explicit written permission of Birds-Eye Network Services, LLC. Federal copyright law (17 USC 504) makes it illegal, punishable with fines up to $100,000 per violation plus attorney's fees.
|