Last night I posted this note:
Since Sunday afternoon bit-player.org has been under some sort of mysterious DDoS attack, with a rotating suite of IP numbers repeatedly downloading the same PDF files several times a second. For the time being, I’ve taken most PDFs offline. If you urgently need something and you get a permission-denied error, please send me an email. Sorry for the inconvenience. Back soon, I hope.
Turns out it was not net pests. It was my 15 minutes of fame. A link to an old story of mine found its way to the front page of Hacker News, and I misinterpreted the resulting net traffic jam. A couple of hours later (and after a couple of messages from helpful HN readers), I realized there was nothing malicious going on, and I put the files back on line.
I’m an amateur in all things, but I’m especially inept as a sysadmin. Looking at the server logs this morning, I’m still unsure of exactly what I’m seeing, but I understand a little more than I did 12 hours ago.
What set me off in the first place was seeing long lists of requests like these, all from the same IP number:
16/Mar/2014:14:17:25 "GET /AmSci-2005-11-Hayes-NewOrleans.pdf HTTP/1.1" 200 79640 16/Mar/2014:14:17:26 "GET /AmSci-2005-11-Hayes-NewOrleans.pdf HTTP/1.1" 206 65886 16/Mar/2014:14:17:26 "GET /AmSci-2005-11-Hayes-NewOrleans.pdf HTTP/1.1" 206 8781 16/Mar/2014:14:17:27 "GET /AmSci-2005-11-Hayes-NewOrleans.pdf HTTP/1.1" 206 65891 16/Mar/2014:14:17:27 "GET /AmSci-2005-11-Hayes-NewOrleans.pdf HTTP/1.1" 206 65892 16/Mar/2014:14:17:27 "GET /AmSci-2005-11-Hayes-NewOrleans.pdf HTTP/1.1" 206 65893
Good grief, I thought: Somebody is downloading the same PDF file six times within three seconds. I failed to notice (or appreciate the significance of) the response codes near the end of each line. The “200” code on the first line is the normal HTTP “OK” signal, but the “206” on the next five lines signifies “partial content.” What’s going on here—if I now understand correctly—is not one person downloading the same file six times; it’s one person downloading a file in six pieces. (The size of the file in question is 270,570 bytes. The byte counts at the ends of the six lines above add up to 272,343. I can’t account for the discrepancy; I’m still an amateur.)
The file mentioned in the six requests above is not the one linked to by Hacker News. That’s another reason for my confusion: I was seeing wholesale downloading of hundreds of different files. Apparently, when some people find an item that interests them on a web site, they
wget -r the whole site. And I guess I understand why: If you don’t grab it immediately, the skittish site owner is likely to panic and take it offline. (But wouldn’t it be polite to throttle the request rate?)
The story that started all this fuss is a bit of whimsy I wrote almost 30 years ago for Computer Language, a magazine long defunct. In the past 18 hours the PDF has been successfully downloaded almost 12,000 times, which may be greater than the circulation of Computer Language.