Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed topublic posts

From: Chris Ricker
Date: Fri Jan 23 2004 - 10:40:05 EST


On Thu, 22 Jan 2004, David S. Miller wrote:

> On Thu, 22 Jan 2004 14:58:54 -0800 (PST)
> Linus Torvalds <torvalds@xxxxxxxx> wrote:
>
> > Have you played with Markov chains? What happens is that you don't just
> > build up a list of words and their likelihood of being spam or ham, you
> > build up a list of word _combinations_ and the likelihood of one
> > particular word following another one.
> >
> > That's how a lot of the "random phrase" generators on the web work.
> >
> > They can be absolutely hilarious, exactly because the sentences they
> > generate actually _almost_ make sense. Sometimes you get an almost
> > readable story, but one that reads like somebody having a bad trip and his
> > reality just shifted 90 degrees. (Usually the best stories come if the
> > training material is coherent, which email sadly usually isn't).
>
> This reminds me of the literary work done by William S. Burrough and his
> infamous "cut ups". Similar result for the reader.

jwz's DadaDodo program <http://www.jwz.org/dadadodo/> was inspired by
Burroughs' cut ups, and it does this sort of Markov chain-based
generation....

I'm not so sure that Markov chain analysis for spam filtering will work well
in practice for email, though, particularly when dealing with international
email. I don't write / read German or French well, but I can write in them
enough that a German / French speaker can figure out what I'm trying to say,
maybe. My German in particular is bad enough that it probably wouldn't fit
Markov chain models of how someone fluent in high German would write,
though. Similarly, look at some of the fractured English emails that appear
on this list. I can understand them, but a Markov model relaxed enough to
allow them will also include a lot of random spam....

later,
chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/