July brought quite an impressive spam storm, which dumped 10,738 messages on me:
That’s a record for my inbox, well beyond the spike of 7,506 messages received last October. The mean number of messages over the two-year period shown is 3,867; the standard deviation is 2,190.
I’m intrigued by the amount of noise in this signal. The magnitude of the fluctuations suggests to me that somewhere in the spam economy there is a small-number bottleneck. Maybe there are only a few high-volume spammers in the world, so that when one of them goes on vacation, the overall volume sinks dramatically. Or maybe there are only a limited number of customers willing to pay for big spam mailings, so that the renewal or cancellation of a single contract can have a noticeable impact. Or it could be that there are only a few major lists of harvested email addresses; when the scraper misses one of my mailboxes, I see a big change.
A bottleneck of this kind is not the only possible explanation. In some other areas with high volatility–the stock market, for example–the apparent cause is not a small population of agents but strong correlations between agents, who all follow the same signs and signals. I suppose that might happen in the spam market, too, with broader economic trends affecting everyone in the same way, but somehow it seem less likely.
The graph below breaks down the monthly totals according to which of my various email addresses the spam targeted. Again the numbers seem to be bouncing around pretty wildly. For example, the July spike mostly came from my address here at bit-player, but the peak last October was dominated by an address at amsci.org.
Of the seven addresses I monitor, five are openly published on the web, and thus I shouldn’t be surprised that they attract their share of spam. But the other two addresses have never been published, and one of them I have never used or even handed out to friends. Those obscure addresses are getting about 1,000 spams a month in total.
One feature of my spam that doesn’t seem to fluctuate much from month to month is the proportion written in Russian or other languages that use the cyrillic alphabet. The fraction has hovered near one-half for the past year. I have a hard time imagining a model that produces such linguistic stability along with volatility in other dimensions.