/usr/share/doc/spambayes/contrib/nway.py is in spambayes 1.1b1-4.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | #!/usr/bin/env python
"""
Demonstration of n-way classification possibilities.
Usage: %(prog)s [ -h ] tag=db ...
-h - print this message and exit.
All args are of the form 'tag=db' where 'tag' is the tag to be given in the
X-Spambayes-Classification: header. A single message is read from stdin and
a modified message sent to stdout. The message is compared against each
database in turn. If its score exceeds the spam threshold when scored
against a particular database, an X-Spambayes-Classification header is added
and the modified message is written to stdout. If none of the comparisons
yields a definite classification, the message is written with an
'X-Spambayes-Classification: unsure' header.
Training is left up to the user. In general, you want to train so that a
message in a particular category will score above your spam threshold when
checked against that category's training database. For example, suppose you
have the following mbox formatted files: python, music, family, cars. If
you wanted to create a training database for each of them you could execute
this series of mboxtrain.py commands:
sb_mboxtrain.py -f -d python.db -s python -g music -g family -g cars
sb_mboxtrain.py -f -d music.db -g python -s music -g family -g cars
sb_mboxtrain.py -f -d family.db -g python -g music -s family -g cars
sb_mboxtrain.py -f -d cars.db -g python -g music -g family -s cars
You'd then compare messages using a %(prog)s command like this:
%(prog)s python=python.db music=music.db family=family.db cars=cars.db
Normal usage (at least as I envisioned it) would be to run the program via
procmail or something similar. You'd then have a .procmailrc file which
looked something like this:
:0 fw:sb.lock
| $(prog)s spam=spam.db python=python.db music=music.db ...
:0
* ^X-Spambayes-Classification: spam
spam
:0
* ^X-Spambayes-Classification: python
python
:0
* ^X-Spambayes-Classification: music
music
...
:0
* ^X-Spambayes-Classification: unsure
unsure
Note that I've not tried this (yet). It should simplify the logic in a
.procmailrc file and probably classify messages better than writing more
convoluted procmail rules.
"""
import getopt
import sys
import os
from spambayes import hammie, mboxutils, Options
prog = os.path.basename(sys.argv[0])
def usage():
print >> sys.stderr, __doc__ % globals()
def main(args):
opts, args = getopt.getopt(args, "h")
for opt, arg in opts:
if opt == '-h':
usage()
return 0
msg = mboxutils.get_message(sys.stdin)
try:
del msg["X-Spambayes-Classification"]
except KeyError:
pass
for pair in args:
tag, db = pair.split('=', 1)
h = hammie.open(db, True, 'r')
score = h.score(msg)
if score >= Options.options["Categorization", "spam_cutoff"]:
msg["X-Spambayes-Classification"] = "%s; %.2f" % (tag, score)
break
else:
msg["X-Spambayes-Classification"] = "unsure"
sys.stdout.write(msg.as_string(unixfrom=(msg.get_unixfrom()
is not None)))
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
|