A comment on comment spam

Someone out there is being paid to post comments on bit-player.org–and doubtless on tens of thousands of other blogs as well. The comments are mostly bland and inoffensive, sometimes effusive, always hastily composed. “Thanks for article..good work,” they say. “Amazing!!” “i like your article and i will be wating your net article….”

The payload attached to each of these comments is a link to a web site that someone wants to promote. Some of the sites are selling goods or services; others are billboards full of pay-per-view ads; a fair number are mysterious to me, being written in languages I don’t understand. I would not be astonished to learn that some of the sites are distributing malware.

Years ago, the first wave of comment spam was powered by scripts that flooded blogs and wikis and forums with hundreds of postings full of program-generated gibberish and long lists of links. That abuse was stopped by captchas and other simple filters, like the one I’ve been using here on bit-player. Another important defense is the “nofollow” tag, which instructs search engines to ignore links in comments, thereby eliminating the incentive of gaining PageRank points.

The comment spam arriving now is not generated by a Perl script. Somewhere in the world a person is being paid to read these very sentences, then to prove his or her humanity to the Turing-test filter, and finally to write a few words in response and sneak in a paid link. I’m both fascinated and appalled to learn that the Internet economy can support this activity. What’s the going rate for writing comment spam? Is it worth a penny to get your link briefly exposed to the vast daily readership of bit-player.org? How about a tenth of a penny?

I have a sinking feeling that the people doing this work are themselves victims of a scam, and that they’ll never see even the tenth of a penny. They have probably succumbed to a 21st-century version of the ads I used to see on matchbook covers: “Work at home! Make $500 a week stuffing envelopes in your spare time!”

Of all the ways that poor and desperate people are exploited, this is not the worst. Presumably the work is safe and sanitary, and it even rewards literacy. Some of my comment spammers would surely have interesting ideas to contribute if only they had the luxury of time.

All the same, this kind of commercial graffiti is not something I want to encourage. The available countermeasures include prohibiting all links in comments, holding all comments until a moderator approves them, or requiring commenters to register with a verifiable email address. None of these options appeals to me, but I may have to consider them if the problem persists. For now, though, I’m going to continue the human approach–manually deleting spammy comments as quickly as I can get to them. I am also closing comments on all but the 10 most-recent items on bit-player; the spammers seem to favor older posts.

I have to add that spotting comment spam is not always as easy as you might think. Consider this comment, which came in response to a story about editorial changes at Scientific American magazine:

Many times, when i read your American Scientist columns, I have asked myself that is any other country’s scientist didn’t give anything to the world?

The text of the comment is pertinent to the topic; it raises a question that’s entirely appropriate in this context; and there’s clear evidence that the author has actually been reading bit-player (and even my American Scientist columns) rather than merely spewing comments at random. This is someone I would like to be able to welcome into the community. But the link associated with the comment was an ad for a web-hosting service, and another comment from the same IP number advertised a different service. Was I wrong to hit the delete button?

You’re welcome to comment below, but without spammy links, please.

This entry was posted in computing, modern life.

16 Responses to A comment on comment spam

  1. D. Eppstein says:

    If deleting inane comments with a spammy payload is wrong, then I don’t want to be right.

  2. Pingback: Michael Trick’s Operations Research Blog : Comment Spam

  3. Jon Snyder says:

    Why not analyze the link urls from comments and see if there is any pattern? You might be able to automatically filter the comments based on the url of the link.

  4. Craig says:

    Jon, I would go further — don’t analyze the URL, but download the page at that URL and analyze it. It’s easy for a spammer to disguise their payload in an innocuous-looking comment, but I imagine it would be more difficult to obfuscate the destination of that payload if they want it to generate any revenue when visited.

    Another approach would be to analyze the universe of legitimate links in the history of comments on this blog, and try to find a pattern in those. If such a pattern existed, you could have a whitelist-style filter that would only let through some comment links.

    A third option would be to make the spam screening question much harder. At least that way, your low-paid spammers would be furthering their mathematical education…

  5. Joao says:

    This is the first I hear about the issue of spam commenting (yes… I must be new in the intertubes…), but I wonder how certain one can be that all comments that bear an ad link are spam.

    Forgive my technological ignorance, but how hard it would be for the baddies to create malware that would add such links to all posts originated from the infected computer? Maybe simply by automatically filling this Homepage URL box that was offered to me.

  6. Doc Terror says:

    You guys should learn what it’s like on the dark side.

    People regularly pay $100-$300 for a link from an old site that’s established: those are typically ‘good’ links, but crummy links from blogs and forums are worth at least a buck.

    See

    http://slightlyshadyseo.com/ and http://bluehatseo.com/ to learn about cutting edge techniques.

  7. Pingback: bit-player » Blog Archive » A comment on comment spam Scripts Rss

  8. unekdoud says:

    I don’t think it hurts that much to disable links. People can still copy and paste the URL for just a few seconds more. In that case, going one step further and creating a whitelist of link domains would be better.

    I don’t think it makes sense to close comments on the older posts. Maybe just withholding those comments or allowing comments by email would work.

    Also, a heuristic approach to blocking links might be used, where the occurrence of certain terms is used to determine whether a link is likely to be spam/undesirable.

  9. unekdoud says:

    I don’t think it hurts that much to disable links. People can still copy and paste the URL for just a few seconds more. In that case, going one step further and creating a whitelist of link domains would be better.

    I don’t think it makes sense to close comments on the older posts. Maybe just withholding those comments or allowing comments by email would work.

    Also, a heuristic approach to blocking links might be used, where the occurrence of certain terms is used to determine whether a link is likely to be spam/undesirable.

    There is also a possibility of hiding dubiously spammy comments like Youtube does, or moving them elsewhere. Of course, all this would depend in the blog technology, but it isn’t impossible.

  10. Pingback: Unblogged Bits for Saturday, 07 November 2009 | ***Dave Does the Blog

  11. baoilleach says:

    On blogger, there is an option to moderate only comments on blog posts more than 2 weeks old. I find this is a good compromise.

  12. If you are using WordPress blogging software (which I am happy with) the ‘akismet’ spam filtering add-on works very well.

  13. SundaraRaman says:

    I’d agree with unekdoud - disabling direct links, and instead allowing links within the comment text would be a good idea, if your aim is to allow links to some relevant material alone.

    On the other hand, if you are concerned about allowing other people to ‘link back’ to themselves and form a network of sorts, URL filtering is probably the way to go (depending on how much of a pattern they have).

  14. Interesting topic. I just put to work an experimental technique to control the automated SPAM I was starting to defeat, but clearly I’m not ready to defeat human SPAMers. Dude, I can’t imagine anything more boring than working as SPAMer!

    Anyway, I don’t want to be a SPAMer myself, but opinions on my SPAM control technique are very welcome.

    http://www.isegura.es/blog/stop-spam-your-site-being-invisible-honeytrap-drupal-comments-form

    I just really want to find a SPAM control techique that works without causing much inconvenience to me or the good users. I find CAPTCHA a bit annoying, so I tried other solutions. Opinions welcome.

  15. Tony Smit says:

    Howabout before moderation, the links are unclickable, so a reader would have to use copy-and-paste (moderators, of course, have click-ability)

    After moderation, the links become clickable. The longer the lead time on moderation …

    about that spam screening, was the sequins in carrots ?

  16. Nilanka says:

    Hi,

    It was a nice writing you have done and there are a few factors that you have overlooked. You say “Some of my comment spammers would surely have interesting ideas to contribute if only they had the luxury of time.” but the reality is quite different than that. The correct one should be “Some of my comment spammers would surely have interesting ideas to contribute if only they had the luxury of MONEY.”

    Some years ago, I was a paid “spammer” where I had to post comments that are relevant to the article with some back-link. The article should relevant to the website of the URL, so the URL’s lead generation would boost. For people who hire others for such activity paid about $20 for about 100+ comment posts. Do you know how much is $20 for a person who is living in a country where GDP (per capita) is below $400? This is the lower end of the spam comment machine. Just the execution front.

    I myself love mathematics and I believe forums like these should be free of spamming and it is an unethical thing to do. But this is how Internet works. And I really appreciate the attitude of the others who have commented, as they focus on getting rid of spamming, rather than accusing the person who physically posts it. SEO has injected a lot of bad things to web, and many good things too!

    Anyways, this is a nice forum for mathematics.