Spam Filtering for Mail Exchangers:
Prev		Next

Glossary

These are definitions for some of the words and terms that are used throughout this document.

B

Bayesian Filters

A filter that assigns a probability of spam based on the recurrence of words (or, more recently, word constellations/phrases) between messages.

You initially train the filter by feeding it known junk mail (spam) and known legitimate mail (ham). A bayesian score is then be assigned to each word (or phrase) in each message, indicating whether this particular word or phrase occurs most commonly in ham or in spam. The word, along with its score, is stored in a bayesian index.

Such filters may catch indicators that may be missed by human programmers trying to manually create keyword-based filters. At the very least, they automate this task.

Bayesian word indexes are most certainly specific to the language in which they received training. Moreover, they are specific to individual users. Thus, they are perhaps more suitable for individual content filters (e.g. in Mail User Agents) than they are for system-wide, SMTP-time filtering.

Moreover, spammers have developed techniques to defeat simple bayesian filters, by including random dictionary words and/or short stories in their messages. This decreases the spam probability assigned by a baynesian filter, and in the long run, degrades the quality of the bayesian index.

C

Collateral Damage

Blocking of a legitimate sender host due to an entry in a DNS blocklist.

Some blocklists (like SPEWS) routinely list the entire IP address space of an ISP if they feel the ISP is not responsive to abuse complaints, thereby affecting all its customers.

D

Domain Name System

(abbrev: DNS) The de-facto standard for obtaining information about internet domain names. Examples of such information include IP addresses of its servers (so-called A records), the dedication of incoming mail exchangers (MX records), generic server information (SRV records), and miscellaneous text information (TXT records).

DNS is a hierarctical, distributed system; each domain name is associated with a set of one or more DNS servers that provide information about that domain - including delegation of name service for its subdomains.

For instance, the top-level domain "org" is operated by The Public Interest Registry; its DNS servers delegate queries for the domain name "tldp.org" to specific name servers for The Linux Documentation Project. In turn, TLDPs name server (actually operated by UNC) may or may not delegate queries for third-level names, such as "www.tldp.org".

DNS lookups are usually performed by forwarding name servers, such as those provided by an Internet Service Provider (e.g. via DHCP).

Delivery Status Notification

(abbrev: DSN) A message automatically created by an MTA or MDA, to inform the sender of an original messsage (usually included in the DSN) about its status. For instance, DSNs may inform the sender of the original message that it could not be delivered due to a temporary or permanent problem, and/or whether or not and for how long delivery attempts will continue.

Delivery Status Notifications are sent with an empty Envelope Sender address.

E

Envelope Sender

The e-mail address given as sender of a message during the SMTP transaction, using the MAIL FROM: command. This may be different from the address provided in the "From:" header of the message itself.

One special case is Delivery Status Notification (bounced message, return receipt, vacation message..). For such mails, the Envelope Sender is empty. This is to prevent Mail Loops, and generally to be able to distinguish these from "regular" mails.

F

False Negative

Junk mail (spam, virus, malware) that is misclassified as legitimate mail (and consequently, not filtered out).

False Positive

Legitimate mail that is misclassified as junk (and consequently, blocked).

J

Joe Job

A spam designed to look like it came from someone else's valid address, often in a malicous attempt at generating complaints from third parties and/or cause other damage to the owner of that address.

M

Mail Delivery Agent

(abbrev: MDA) Software that runs on the machine where a users' mailbox is located, to deliver mail into that mailbox. Often, that delivery is performed directly by the MTA Mail Transport Agent, which then serves a secondary role as an MDA. Examples of separate Mail Delivery Agents include: Deliver, Procmail, Cyrmaster and/or Cyrdeliver (from the Cyrus IMAP suite).

Mail Loop

A situation where one automated message triggers another, which directly or indirectly triggers the first message over again, and so on.

Imagine a mailing list where one of the subscribers is the address of the list itself. This situation is often dealt with by the list server adding an "X-Loop:" line in the message header, and not processing mails that already have one.

Another equivalent term is Ringing.

Mail Transport Agent

(abbrev: MTA) Software that runs on a mail server, such as the mail exchanger(s) of a internet domain, to send mail to and receive mail from other hosts. Popular MTAs include: Sendmail, Postfix, Exim, Smail.

Mail User Agent

(abbrev: MUA; a.k.a. Mail Reader) User software to access, download, read, and send mail. Examples include Microsoft Outlook/Outlook Express, Apple Mail.app, Mozilla Thunderbird, Ximian Evolution.

Mail Exchanger

(abbrev: MX) A machine dedicated to (sending and/or) receiving mail for an internet domain.

The DNS zone information for a internet domain normally contains a list of Fully Qualified Domain Names that act as incoming mail exchangers for that domain. Each such listing is called an "MX record", and it also contains a number indicating its "priority" among several "MX records". The listing with the lowest number has the first priority, and is considered the "primary mail exchanger" for that domain.

Micropayment Schemes

(a.k.a. sender pay schemes). The sender of a message expends some machine resources to create a virtual postage stamp for each recipient of a message - usually by solving a mathematical challenge that requires a large number of memory read/write operations, but is relatively CPU speed insensitive. This stamp is then added to the headers of the message, and the recipient would validate the stamp through a much simpler decoding operation.

The idea is that because the message requires a postage stamp for every recipient address, spamming hundreds or thousands of users at once would be prohibitively "expensive".

Two such systems are:

O

Open Proxy

A proxy which openly accepts TCP/IP connections from anywhere, and forwards them anywhere.

These are typically exploited by spammers and virii, who use them to conceal their own IP address, and/or to more effectively distribute transmission loads across several hosts and networks.

P

proxy

A machine that acts on behalf of someone else. It may forward e.g. HTTP requests or TCP/IP connections, usually to or from the internet. For instance, companies - or sometimes entire countries - often use "Web Proxy Servers" to filter outgoing HTTP requests from their internal network. This may or may not be transparent to the end user.

R

Ratware

Mass-mailing virii and e-mail software used by spammers, specifically designed to deliver large amounts of mail in a very short time.

Most ratware implementations incorporate only as much SMTP client code as strictly neccessary to deliver mail in the best-case scenario. They provide false or inaccurate information in the SMTP dialogue with the receiving host. They do not wait for responses from the receiver before issuing commands, and disconnect if no response has been received in a few seconds. They do not follow normal retry mechanisms in case of temporary failures.

Relay

A machine that forwards e-mail, usually to or from the internet. One example of a relay is the "smarthost" that an ISP provides to its customers for sending outgoing mail.

S

Spam Trap

An e-mail address that is seeded to address-harvesting robots via public locations, then used to feed collaborative tools such as DNS Blacklists and Junk Mail Signature Repository.

Mails sent to these addresses are normally spam or malware. However, some of it will be collateral, spam - i.e. Delivery Status Notification to faked sender addresses. Thus, unless the spam trap has safeguards in place to disregard such messages, the resulting tool may not be completely reliable.

Z

Zombie Host

A machine with an internet connection that is infected by a mass-mailing virus or worm. Such machines invariably run a flavor of the Microsoft� Windows� operating system, and are almost always in "residential" IP address blocks. Their owners either do not know or do not care that the machines are infected, and often, their ISP will not take any actions to shut them down.

Fortunately, there are various DNS blocklists, such as "dul.dnsbl.sorbs.net", that incorporate such "residential" address blocks. You should be able to use these blocklists to reject incoming mail. Legitimate mail from residential users should normally go through their ISP's "smarthost".

Prev	Home	Next
Final ACLs		GNU General Public License