README - bmf - bmf (Bayesian Mail Filter) 0.9.4 fork + patches
 (HTM) git clone git://git.codemadness.org/bmf
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       README (5151B)
       ---
            1                 bmf -- Bayesian Mail Filter
            2 
            3 About bmf
            4 =========
            5 
            6 This is a mail filter which uses the Bayes algorithm as explained in Paul
            7 Graham's article "A Plan for Spam".  It aims to be faster, smaller, and more
            8 versatile than similar applications.  Implementation is ANSI C and uses POSIX
            9 functions.  Supported platforms are (in theory) all POSIX systems.
           10 
           11 This project provides features which are not available in other filters:
           12 
           13 (1) Independence from external programs and libraries.  Tokens are stored in
           14 memory using simple vectors which require no heavyweight external data
           15 structure libraries. The tokens are stored in plain-text "flat" files.
           16 
           17 (2) Efficient processing.  Input data is parsed by a handcrafted parser
           18 which weighs in under 3% of the equivalent code generated by flex.  No
           19 portion of the input is ever copied and all i/o and memory allocation are
           20 done in large chunks.  Updated token lists are merged and written in one
           21 step.  Hashing is being considered for the next version to improve lookup
           22 speed.
           23 
           24 (3) Simple and elegant implementation.  No heavyweight, copy-intensive mime
           25 decoding routines are used.  Decoding of quoted-printable text for selected
           26 mime types is being considered for the next version.
           27 
           28 Note: the core filter function is from esr's bogofilter v0.6 (available at
           29 http://sourceforge.net/projects/bogofilter/) with bugfix updates.
           30 
           31 For the most recent version of this software, see: 
           32 
           33         http://sourceforge.net/projects/bmf/
           34 
           35 How to integrate bmf
           36 ====================
           37 
           38 The following procmail recipes will invoke bmf for each incoming email and
           39 place spam into $MAILDIR/spam.  The first sample invokes bmf in its normal
           40 mode of operation and the second invokes bmf as a filter.
           41 
           42         ### begin sample one ###
           43         # Invoke bmf and use return code to filter spam in one step
           44         :0HB
           45         * ? bmf
           46         | formail -A"X-Spam-Status: Yes, tests=bmf" >>$MAILDIR/spam
           47 
           48         ### begin sample two ###
           49         # Invoke bmf as a filter
           50         :0 fw
           51         | bmf -p
           52 
           53         # Filter spam
           54         :0:
           55         ^X-Spam-Status: Yes
           56         $MAILDIR/spam
           57 
           58 The following maildrop equivalents are suggested by Christian Kurz.
           59 
           60         ### begin sample one ###
           61         # Invoke bmf and use return code to filter spam in one step
           62         exception {
           63                 `bmf`
           64                 if ( $RETURNCODE == 0 )
           65                         to $MAILDIR/spam
           66         }
           67 
           68         ### begin sample two ###
           69         # Invoke bmf as a filter
           70         exception {
           71                 xfilter "bmf -p"
           72                 if (/^X-Stam-Status: Yes/)
           73                         to $MAILDIR/spam
           74         }
           75 
           76 
           77 If you put bmf in your procmail or maildrop scripts as suggested above, it
           78 will always register an email as either spam or non-spam.  To reverse this
           79 registration and train bmf, the following mutt macros may be useful:
           80 
           81   macro index \ed "<enter-command>unset wait_key\n<pipe-entry>bmf -S\n<enter-command>set wait_key\n<save-message>=spam\n"
           82   macro index \et "<enter-command>unset wait_key\n<pipe-entry>bmf -t\n<enter-command>set wait_key\n"
           83   macro index \eu "<enter-command>unset wait_key\n<pipe-entry>bmf -N\n<enter-command>set wait_key\n<save-message>=inbox\n"
           84 
           85 These will override these commands:
           86 
           87   <Esc>d = de-register as non-spam, register as spam, and move to spam folder.
           88   <Esc>t = test for spamicity.
           89   <Esc>u = de-register as spam, register as non-spam, and move to inbox folder.
           90 
           91 Alternatively, if you use gnus you could add the following lines to your
           92 .gnus to accomplish a similar result:
           93 
           94 (defun spam ()
           95   (interactive)
           96     (pipe-message "/usr/local/bin/bmf -S")
           97     (gnus-summary-move-article 1 "nnml:Spam"))
           98 
           99 (defun notspam ()
          100   (interactive)
          101     (pipe-message "/usr/local/bin/bmf -N")
          102     (gnus-summary-move-article 1 "nnml:inbox"))
          103 
          104 (add-hook
          105   'gnus-sum-load-hook
          106   (lambda nil
          107     (define-key gnus-summary-mode-map (read-kbd-macro "C-c C-o") 'spam)
          108     (define-key gnus-summary-mode-map (read-kbd-macro "C-c C-p") 'notspam)))
          109 
          110 How to train bmf
          111 ================
          112 
          113 First, please keep in mind that bmf "learns" how to recognize spam from the
          114 input that you give it.  It works best if you give it exactly the email that
          115 you receive, or have received in the recent past.
          116 
          117 Here are some good techniques for training bmf:
          118 
          119   - If you keep a history of email that you have received, use your current
          120     and/or saved emails.  It is fairly easy to create a small shell script
          121     that will pass all of your normal email to "bmf -n" and all of your spam
          122     to "bmf -s".  Note that if you do not use the mbox storage format, you
          123     MUST invoke bmf exactly once per email.  Using "cat * | bmf -n" will NOT
          124     work properly because bmf sees the entire input as one big email.
          125 
          126   - If you already use spamassassin, you can use it to train bmf for a
          127     couple of days or weeks.  If spamassassin tags it as spam, run it
          128     through "bmf -s".  If not, run it through "bmf -n".  This can be
          129     automated with procmail or maildrop recipes.
          130 
          131 Here are some things that you should NOT do:
          132 
          133   - Get impatient with the training process and repeatedly pass one email
          134     through "bmf -s".
          135 
          136   - Manually move words around between lists and/or adjust the word counts.
          137 
          138 Final words
          139 ===========
          140 
          141 Thanks for trying bmf.  If you have any problems, comments, or suggestions,
          142 please direct them to the bmf mailing list, bmf-user@lists.sourceforge.net. 
          143 
          144                                                         Tom Marshall
          145                                                         20 Oct 2002