Subj : Re: Pre-filtering Mail (WAS: Re: Do M$ usually spam people? ^_~)
To   : comp.os.linux
From : ibuprofin
Date : Tue Aug 24 2004 06:47 pm

In article <cge0lk$7k0$1@s1.uklinux.net>, Paul Nolan wrote:
>Even though it is totally harmless to me (and probably most of the rest
>of this ng), it's still quite annoying - the email takes quite a while
>to download (currently I've got 5 copies under various names and addresses)

Could be quite a bit worse. At the beginning of July, I was seeing _WELL_
over 150 spam mails AND another 20 plus worms each day.

>Is it possible to put a pre-fetch filter so they don't even get
>downloaded?

Yes, but not with your run-of-the-mill mail client.

>(I'm using Thunderbird, but I would switch to Evolution - I'm heading that
>way anyway as something is wrong with Debian's Enigmail)

I don't think they're going to do it.

This isn't for the faint of heart.  The basic principles are this:

  Connect to your ISPs POP or IMAP server on the appropriate port.
  Authenticate as required
  Send RFC1939 commands (look at LIST, STAT, DELE and RETR).
  Send QUIT to exit

I wrote a script that uses netcat to handle the connection, and after
authenticating, sent a STAT command to find how many messages were
available. I used the LIST <message_number> to see how big the message
was. Later, I revamped things considerably, and used the TOP command
to grab the headers and first N lines of the body.  This was fed to a
fairly lengthly filter to identify spam, worms, and so on. After reviewing
that information, I used the DELE command to delete unwanted crap from
the server (and RETR to retrieve desired stuff). It was definately
worth it _for me_ because it reduced my delivered spam load to less
than 3 a week.

[compton /spam.filter]$ wc get.spam killspam
    381    1794   12060 get.spam
     33     145    1003 killspam
    414    1939   13063 total
[compton ~]$

Obviously, I'm not going to waste the bandwidth here showing 400+ lines
of shell script, nevermind that I don't want spammers seeing what filter
triggers I'm using (spammers do dumb things, and I don't want to point
out to them exactly how stupid they are - them _might_ learn something).

I repeat:  This isn't for the faint of heart. You really do need to read
and understand the appropriate RFCs. RFC1939 is for POP. Actually
writing the filter (which was included in the 'get.spam' script above)
meant knowing what the mail headers actually look like, and having a
fair skill in shell scripting and regular expressions. 

I should mention that I also found several applications that would do
filtering on the ISPs mail server, though they didn't meet my needs.
Unfortunetly, that was about a year ago, and I've now forgotten the names
of those programs. I found them on google. Look for the keywords "filter"
and "POP".  You would still need to know what your desired mail AND
unwanted junk look like, and EXACTLY how they differ.

        Old guy

.