<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Spam stats</title>
	<atom:link href="http://bit-player.org/2008/spam-stats/feed/" rel="self" type="application/rss+xml" />
	<link>http://bit-player.org/2008/spam-stats</link>
	<description>An amateur's outlook on computation and mathematics.</description>
	<pubDate>Mon, 01 Dec 2008 20:50:52 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
		<item>
		<title>By: Ø§Ù…ÙˆØ²Ø´</title>
		<link>http://bit-player.org/2008/spam-stats#comment-1744</link>
		<dc:creator>Ø§Ù…ÙˆØ²Ø´</dc:creator>
		<pubDate>Fri, 25 Jul 2008 15:41:34 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=148#comment-1744</guid>
		<description>VERY GOOD</description>
		<content:encoded><![CDATA[<p>VERY GOOD</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ø¨Ø¯Ù† Ø³Ø§Ø²</title>
		<link>http://bit-player.org/2008/spam-stats#comment-1704</link>
		<dc:creator>Ø¨Ø¯Ù† Ø³Ø§Ø²</dc:creator>
		<pubDate>Sat, 21 Jun 2008 09:46:33 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=148#comment-1704</guid>
		<description>Thanks</description>
		<content:encoded><![CDATA[<p>Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jim Ward</title>
		<link>http://bit-player.org/2008/spam-stats#comment-1699</link>
		<dc:creator>Jim Ward</dc:creator>
		<pubDate>Fri, 06 Jun 2008 13:00:46 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=148#comment-1699</guid>
		<description>Speaking of science and sex, Mary Roach has a new book out, "Bonk". Her previous books, "Stiff" and "Spook" were pretty good. I only have one recommendation for a group theory book ...</description>
		<content:encoded><![CDATA[<p>Speaking of science and sex, Mary Roach has a new book out, &#8220;Bonk&#8221;. Her previous books, &#8220;Stiff&#8221; and &#8220;Spook&#8221; were pretty good. I only have one recommendation for a group theory book &#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jim Ward</title>
		<link>http://bit-player.org/2008/spam-stats#comment-1698</link>
		<dc:creator>Jim Ward</dc:creator>
		<pubDate>Fri, 06 Jun 2008 12:44:57 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=148#comment-1698</guid>
		<description>Re: The "mistaken for birdshit" link, I was listening to Feynman's "What Do You Care What Other People Think?" yesterday and I was interested to learn that in his day physics was divided between experimentalists and theorists. The experimentalists would discover some odd fact, and then go to the theorists for an explanation. It looks like June 1963 was the exact point when the experimentalists and theorists caught up with each other. Today the theorists have lapped the experimentalists, which explains the slow progress of physics today. Too much theory, not enough data.

I also learned of tang and clevis joints, in case you want to write "Physics in the Bedroom". De Sade has beaten you to "Philosophy".</description>
		<content:encoded><![CDATA[<p>Re: The &#8220;mistaken for birdshit&#8221; link, I was listening to Feynman&#8217;s &#8220;What Do You Care What Other People Think?&#8221; yesterday and I was interested to learn that in his day physics was divided between experimentalists and theorists. The experimentalists would discover some odd fact, and then go to the theorists for an explanation. It looks like June 1963 was the exact point when the experimentalists and theorists caught up with each other. Today the theorists have lapped the experimentalists, which explains the slow progress of physics today. Too much theory, not enough data.</p>
<p>I also learned of tang and clevis joints, in case you want to write &#8220;Physics in the Bedroom&#8221;. De Sade has beaten you to &#8220;Philosophy&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brian</title>
		<link>http://bit-player.org/2008/spam-stats#comment-1696</link>
		<dc:creator>brian</dc:creator>
		<pubDate>Thu, 05 Jun 2008 19:42:00 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=148#comment-1696</guid>
		<description>I was hoping you would ask. It's such a fascinating story!

I counted those emails by hand, scanning 2,715 subject lines in chronological order. The first non-English message I came to was in Russian, so I listed that category first. Next came a message in an Asian language. Do you begin to see the pattern? The first German email happened to arrive &lt;em&gt;after&lt;/em&gt; the first Italian one but &lt;em&gt;before&lt;/em&gt; the first Spanish one. (The "unknown" category is something I split off from the Asian group after the fact; I'm pretty sure those messages are in some form of Japanese.)

By the way, I never want to do this kind of counting again. I had tried to automate the language labelling by extracting strings such as "charset=iso-2022-jp" (which designates a Japanese character encoding). The results were totally bogus. Most of the messages have no such encoding designators, and many of them have misleading markers, or malformed ones.

Next time -- if there is a next time -- I'm going to writing a little &lt;em&gt;n&lt;/em&gt;-gram language recognizer. Or else figure out how to abuse the Google language tools and have them do the job for me.</description>
		<content:encoded><![CDATA[<p>I was hoping you would ask. It&#8217;s such a fascinating story!</p>
<p>I counted those emails by hand, scanning 2,715 subject lines in chronological order. The first non-English message I came to was in Russian, so I listed that category first. Next came a message in an Asian language. Do you begin to see the pattern? The first German email happened to arrive <em>after</em> the first Italian one but <em>before</em> the first Spanish one. (The &#8220;unknown&#8221; category is something I split off from the Asian group after the fact; I&#8217;m pretty sure those messages are in some form of Japanese.)</p>
<p>By the way, I never want to do this kind of counting again. I had tried to automate the language labelling by extracting strings such as &#8220;charset=iso-2022-jp&#8221; (which designates a Japanese character encoding). The results were totally bogus. Most of the messages have no such encoding designators, and many of them have misleading markers, or malformed ones.</p>
<p>Next time &#8212; if there is a next time &#8212; I&#8217;m going to writing a little <em>n</em>-gram language recognizer. Or else figure out how to abuse the Google language tools and have them do the job for me.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Barry Cipra</title>
		<link>http://bit-player.org/2008/spam-stats#comment-1695</link>
		<dc:creator>Barry Cipra</dc:creator>
		<pubDate>Thu, 05 Jun 2008 16:32:28 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=148#comment-1695</guid>
		<description>Just curious:  How did you choose the ordering of the wedges in your pie charts?  In particular, why place German between two romance languages?</description>
		<content:encoded><![CDATA[<p>Just curious:  How did you choose the ordering of the wedges in your pie charts?  In particular, why place German between two romance languages?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
