SLUG Mailing List Archives
Re: [SLUG] Meaning of Nonsense in s p a m
- To: Nick Croft <nicko@xxxxxxxxxxx>, slug@xxxxxxxxxxx
- Subject: Re: [SLUG] Meaning of Nonsense in s p a m
- From: "Mark A. Bell" <m487396@xxxxxxxxxxxxxx>
- Date: Sat, 14 Jun 2003 03:03:29 -0700 (PDT)
--- Nick Croft wrote:
> Just wondering if someone could enlighten me as to the meaning,
> purpose or
> origin of those strange nonsense words in spam. I'm busy refining
> & bogofilter atm and I see a lot of those words.
It's got to do with fooling Bayesian statistical filters, I think. See
this article for a description:
I guess what normally happens is that if a message is identified as
spam, it's keywords are added to the filter's database of
spam-keywords. These messages make sure that a bunch of nonsense is
added to the database at the same time, eventually clogging it up with
That's just a guess - I'm no expert on this stuff. Surely it's easy
enough to filter out the nonsense, and _then_ run the message through
If it was me, I'd be looking for _collocations_ of keywords (e.g.
'prescription' only if it co-occurs within a given span with 'viagra'
or 'vallium'...). But like I said, this isn't my field.
Are there any filters that attempt to scan for grammatical pattens,
like for example imperatives "Visit our exciting web-site!" or clauses
without a finite verb "Free viagra content here!" ? Or do they all just
work at the vocabulary level?
mark a. bell
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!