Tugger the SLUGger!SLUG Mailing List Archives

Re: [SLUG] Meaning of Nonsense in s p a m


Talking of which, does anyone know any good or interesting approaches to
identifying these junk strings?

A checksum algorithm based spam system (eg vipul's razor) could be
modified to work with checksums of only the recognized words in an email.
All unrecognized stuff (based on a standard wordlist) would get stripped
before the checksum was generated.  This would help for a while, and I'd
be interested to hear about anything out there, but the spammers could
deal with it easily enough by modifying their approach to just tack on
half a dozen common words selected at random.

I presume that algorithms have been developed in the area of detecting
copyright violations which look at percentage overlap between different
bits of text, but I'm far from clear on how you could do that efficiently.

Anyone have any pointers?

Andrew



On Sun, 15 Jun 2003 mlh@xxxxxxxxxx wrote:

> Date: Sun, 15 Jun 2003 12:25:16 +1000
> From: mlh@xxxxxxxxxx
> To: slug@xxxxxxxxxxx
> Subject: Re: [SLUG] Meaning of Nonsense in  s p a m
>
>
> I've always presumed that it was some type
> of cookie; in the bovious way, to validate the
> email, but also if you complained to the isp
> they would be able to know who complained if
> the isp then showed the complaint to the
> spammer.
>
>
> But yeah, trying to fool filters is probably the
> main purpose.
>
> Matt
>
>
>

--

No added Sugar.  Not tested on animals.  If irritation occurs,
discontinue use.

-------------------------------------------------------------------
Andrew McNaughton           In Sydney
                            Working on a Product Recommender System
andrew@xxxxxxxxxxx
Mobile: +61 422 753 792     http://staff.scoop.co.nz/andrew/cv.doc