F. Fortesque Fingerhut

July 16th, 2008

Riffling through some file folders last night, I happened upon an item that I evidently clipped out of Datamation years ago. It’s titled “Magic Moments in Software,” by Deborah Sojka and Philip H. Dorn. I can’t find a date on any of the pages, but internal evidence suggests it’s from the early 1980s.

The piece is organized as a timeline, listing various notable events in the history of software, such as the first successful run of a “user-written, meaningful Fortran program” (20 April 1957), the release of Lisp (1962) and BASIC (1964), and the publication of The Mythical Man-Month (1975). One of the early entries caught my eye:

1949—F. Fortesque Fingerhut, while trying to debug his first program on the ACE computer at the National Physical Laboratory, cannot find the problem. He cracks under the strain, disappears, and is not seen until 1981 when he reemerges as the net court judge at Wimbledon.

All the other events recorded in this chronology seem to be genuine, and there’s no April Fool disclaimer at the end. Is it possible that the story of F. Fortesque Fingerhut is not a joke?

arXival mysteries

July 2nd, 2008

Catching up on new submissions to the arXiv, I came across a paper by Robert Baillie, “Summing the Curious Series of Kempner and Irwin,” which is item 0806.4410 in the mathematics listings. Here’s the abstract, exactly as it appears at http://lanl.arxiv.org/abs/0806.4410v1:

In 1914, Kempner proved that the series 1/1 + 1/2 + … + 1/8 + 1/10 + 1/11 +… + 1/18 + 1/20 + 1/21 + …, where the denominators are the positive integers that do not contain the digit 9, converges to a sum less than 90. (The actual sum is about 22.92068.) In 1916, Irwin proved that the sum of 1/n where n has at most a finite number of 9’s is also a convergent series. We show how to compute sums of Irwins’ series to high precision. For example, the sum of the series 1/9 + 1/19 + 1/29 + 1/39+ 1/49 + … where the denominators have exactly one 9, is about 23.04428708074784831968. Note that this is larger than the sum of Kempner’s “no 9″ series. We also show how to construct nontrivial subseries of the harmonic series that have arbitrarily large, but computable, sums. For example, the sum of 1/n where n has at most 434 occurrences of the digit 0 is about 10016.32364577640186109739.

Baillie’s article is full of really interesting mathematics and algorithmics, which ought to be reason enough to mention the paper here. But it was something stranger that caught my attention. Look closely at the large number in the last line of the abstract. The HTML source of the arXiv page looks like this:

  1<a href=\"abs/0016.3236\">0016.3236</a>4577640186109739.

It seems the sequence 0016.3236, embedded in a larger string of digits, has been interpreted as an arXiv identifier. I can only guess that some program at arXiv.org is scanning abstracts looking for strings of the form nnnn.nnnn, where n is any decimal digit. It’s a little like those rogue search-and-replace scripts that do amusing things like turning every “gay” into a “homosexual.”

As it happens, there is no arXiv paper with the identifier 0016.3236. There can’t be because the identifier format is actually yymm.nnnn, encoding the year and month of submission in the first four digits. Obviously there is no month 16, and the identifier scheme had not yet been introduced in the year 00. Thus, as far as I know, the competition is still open for the first perfectly self-referential arXiv preprint—one that finds a legitimate reason to embed its own identifier in its abstract. (Part of the challenge is that you can’t know in advance—at least not with high precision—what number will be assigned to a paper when it is submitted.)

[More on the Kempner series: Baillie and Thomas Schmelzer also have an article on related work in the latest American Mathematical Monthly. The article is available (pdf) from Schmelzer’s web site.]

Unscrabbled

July 1st, 2008

I’ve been Scrabbling by email lately. In today’s game my partner started out by playing

                    H
                    E
                    X

and I responded with

                   A H
                   W E
                   E X

At this point my opponent might well have continued with another three-letter word to make a tidy square block such as:

                  Y A H
                  E W E
                  S E X

In actuality she did something quite different. She played a seven-letter “bingo,” using all her letters to earn a 50-point bonus; as a result I’m hopelessly far behind in the scoring. But let us say no more about the tawdry details of winning and losing; there’s a puzzle here. Looking at that three-by-three block of letters and words, it occurs to me there must surely be legally reachable configurations of a Scrabble board that have no legal continuation. Scrabble rules say that, except for the first move, letters can be added to the board only on squares adjacent to existing letters, and all sequences of two or more letters (both vertically and horizontally) must be dictionary words. The rules say nothing about the situation where continued play is impossible.

I’m sure there must be many stymied positions, where no words can be formed, regardless of what letters you have on your rack. Or so I assert; but, the fact is, I haven’t been able to find even one such configuration. A cursory examination of the list of all allowed two-letter words argues that no two-by-two block of letters can be stymied. What about two-by-three or three-by-three blocks? Somebody must have settled these questions, but my Googling has failed to find the answer. What is the smallest stymied position? (I don’t require that a solution be a rectangular block of letters, but having stray letters dangling off the edges of a block makes it easier rather than harder to form words.)

Sleight of handle

June 23rd, 2008

As I mentioned, the American Scientist web site is undergoing an overhaul. One aspect of the transition that’s still in transition is redirecting http requests so that old links and bookmarks will retrieve the correct document on the new site. I wish I could snap my fingers and fix this problem globally, but that seems to be beyond my power. Over the weekend, however, I decided I could at least try to reduce the entropy of my own little corner of the WWW by repairing all the links at bit-player.org that point to American Scientist articles. It wasn’t quite as much fun as I had expected.

The first problem is that the old links are completely opaque. They look like this:

     http://www.americanscientist.org/AssetDetail/assetid/48550

The identifier “assetid/48550″ offers no clue to what it is identifying. It could be anything the magazine has published in the past 10 years. The second problem is that you can’t just follow the link to find out where it leads. None of these links are working anymore—that’s the whole point.

Does this situation suggest a certain lack of foresight on my part? Oh well. I’ll just sit here in the corner until the paint on the floor dries.

For links that point to my own columns, I’m generally able to infer from their context what the proper target is. And, if I get stuck, the Wayback Machine is there to rescue me (thank you Brewster Kahle). Still, reviving a batch of dead links is a dreary exercise. I find a link; I figure out which column it refers to; I run a search on the new web site to find the updated URL; I record the mapping from old link to new, so that this information won’t be lost; I correct the link in the bit-player posting. Wash, rinse, repeat.

After fixing only about a dozen links in this way, I came to a moment of self-knowledge: During my remaining years on this planet, however few or many they may be, I never want to go through this process again. Which raises the question: What’s the best way to create permanent pointers to objects that may not stay where you put them?

The answer to this question has been known for decades. What’s needed is double indirection: a pointer to a pointer. Also known as a handle.

  • Zeroth-order indirection: I send you the document itself.
  • First-order indirection: I send you an address where you can find the document.
  • Second-order indirection: I send you an address where you can find the address of the document.

At order zero—with no indirection at all—the document is beyond my control from the moment I send it. With a single level of indirection, I can update or correct the document and you’ll see the new version, but if I move it, you’ll lose access. With double indirection I can change either the content of the document or its location whenever I please, as long as I take care to update the intermediary pointer—the forwarding address.

Double indirection is already a well-established technology in Internet operations. It is the basis of the Domain Name System. When you follow a link to “bit-player.org,” a nameserver looks up that string of characters and returns an IP number, such as 69.89.21.70, which specifies the real whereabouts of the page you’re now reading. I can move bit-player.org to a new host with a new IP number and you’ll still be able to find me, provided I let the world’s nameservers know the new address.

The same kind of mechanism can be made to work at a finer level of detail. In particular, Digital Object Identifiers offer an infrastructure for attaching permanent names to documents or other online resources. For example,

    http://dx.doi.org/10.1511/1998.5.410

is the DOI irrevocably assigned to one of my old columns, titled “Bit Rot.” In that column, written 10 years ago, I discussed the sad tendency of digital information to go stale, to decay, to become inaccessible. Alas. As you have doubtless guessed, the DOI has lost touch with its target; following the link above will get you nothing but a “page not found” error. And fixing errant DOIs looks no easier than fixing raw, broken links. I’m afraid that “dx.doi.org/10.1511/1998.5.410″ is almost as opaque as “assetid/15568.”

In practice, the world seems to have settled on a different mechanism of double indirection for keeping track of stuff on the web. We don’t try to remember or record URLs; we simply go to Google and run a search. The trouble is, though, Google itself relies on the whole distributed network of links to trace and rank documents. If everyone counts on Google to know where everything is, Google will have no way of finding anything.

According to legends of yore, the Internet was designed to survive a nuclear attack. And at the hardware level, the Net is indeed extraordinarily resilient. But the superstructure of linked documents we’ve built atop that foundation is not so unshakeable. If we were to wake up one morning and find that all the links on all web pages were broken, it wouldn’t be much consolation that the underlying documents still existed. Much of their value lies in the connections between them.

Tim Berners-Lee, the first weaver of this web, offers some excellent advice: URLs should never change. If only we’d thought of that sooner.

Back in 1994 I wrote a column titled simply “The World Wide Web.” It was all so new then. My elaborate definitions and explanations seem very quaint now, like someone describing in meticulous detail how to dial a telephone or flush a toilet:

What is the Web? Is it a place? A program? A protocol? One of the <A HREF=”http://info.cern.ch/hypertext/WWW/TheProject”> documents</A> in which the Web describes itself offers this assessment: “The World Wide Web (W3) is the universe of network-accessible information, an embodiment of human knowledge.” That about covers it.

For something as vast as a universe, the Web is surprisingly easy to find your way around in. It works like this. On a computer connected to the Internet you start up a program called a browser; the browser goes out over the network and retrieves a document, which we can assume for the moment is simply a page of text. Within the text are some highlighted phrases, displayed in color or underlined. When you select one of the highlighted words by clicking on it with a mouse, a new document appears, with new highlighted “links.” Clicking on one of these links takes you to still another document. Each time you follow a link, you may be visiting another network site, perhaps quite distant from your original destination as well as from your own location.

Is it any surprise that the link embedded in this passage—which I have formatted so that the URL remains visible—summons a “404 Not Found” message?

Jottings on .js

June 22nd, 2008

Theorists and theologians of programming languages give a lot of thought to issues like referential transparency, lexical scope rules and idempotency. More often than not, though, programming languages live or die for reasons that have nothing to do with such syntactic and semantic virtues. In the early 1980s everyone wrote programs in Microsoft BASIC because that’s what shipped with the IBM PC. A little later we all switched to C because…. Well, I’m actually not sure why, but I know it had nothing to do with referential transparency. Turbo Pascal was enormously popular for a very simple reason: It cost 50 bucks at a time when a C compiler would set you back $500.

Now we have JavaScript. It’s not Microsoft BASIC or Turbo Pascal. As a matter of fact, it’s a language with some decidedly highbrow features. And yet I’m pretty sure the root of its popularity does not lie in its lexical closures or its first-class functions. JavaScript is thriving because it’s the way to make nifty stuff happen inside a web page. Those Google maps that glide around in their box when you push them with the little hand—that’s JavaScript magic, and I was thunderstruck the first time I saw it. And the drag-and-drop pictures on Flickr, the fill-in forms that won’t let you schedule a delivery on February 30th, the online surveys, the clickable seating chart on the airline-reservation page, the everybody’s-an-editor features of Wikipedia, virtually everything on Facebook—none of it would happen without JavaScript. (It’s only fair to mention that most of the obnoxious things on web pages, such as popup ads, also depend on JavaScript.)

So here are a few miscellaneous JavaScript developments that I’ve been noticing lately.

John Resig has ported the Processing graphics language to JavaScript. The original Processing implementation of Ben Fry and Casey Rees produces Java applets. Java programs can also be embedded in web pages (here’s an example ready at hand), so what’s the point of switching to JavaScript? Both languages have their pluses and minuses, but Java has just enough startup overhead that many folks in our impatient age are reluctant to run it. JavaScript avoids that delay.

(Note: This is where I’m supposed to explain that Java and JavaScript are not dialects of the same language, or even closely related. On the other hand, JavaScript is the same as LiveScript, JScript and ECMAScript, and it’s almost indistinguishable from ActionScript. I don’t know what we did to deserve this confusion of names. Maybe it had something to do with excess consumption of froufrou coffee drinks. For more on the sorry history of the two languages see Steve Champeon or Douglas Crockford.)

Cat is another graphics language with an interpreter that emits JavaScript.

Looking beyond all the web-gui-glitz, and more in the direction of referential transparency and the like, Sjoerd Visscher has released some interesting code that makes JavaScript look more like Haskell.

There’s a new JavaScript (excuse me, ECMAScript) standard on the way, someday. (It’s been coming for at least seven or eight years.)

A couple of things I should have known long ago but just discovered: The dashboard “widgets” introduced in Apple OS X 10.4 a few years ago are coded in JavaScript. And several big Adobe applications (InDesign, Photoshop, Illustrator) have JavaScript interpreters built in.

There’s more news from Apple: WebKit, the open-source core of the Safari browser, is getting a new JavaScript interpreter called SquirrelFish. Here’s what the announcement says:

SquirrelFish is a register-based, direct-threaded, high-level bytecode engine, with a sliding register window calling convention. It lazily generates bytecodes from a syntax tree, using a simple one-pass compiler with built-in copy propagation.

As for the deep mystery of the name “SquirelFish,” if there is there a connection to Holocentrus adscensionis, I don’t get the joke.

Still another technology connected with Apple—and still another goofy name—is SproutCore:

SproutCore is a framework for building applications in JavaScript with remarkably little amounts of code. It can help you build full “thick” client applications in the web browser that can create and modify data, often completely independent of your web server, communicating with your server via Ajax only when they need to save or load data.

The provenance and current status of this system are somewhat murky to me. SproutCore was created by Charles Jolley, an independent software developer, but apparently it was embraced and promoted at the recent Apple Worldwide Developer Conference as a technology for creating web applications that resemble the OS X Cocoa framework (see here and here). But so far Apple has said nothing openly to confirm their adoption of SproutCore. It’s worth noting that there are lots of other JavaScript libraries and frameworks (for example, the Yahoo User Interface Library YUI, qooxdoo, John Resig’s JQuery).

When I look upon all these somewhat baffling developments, my role is that of a not-quite-innocent bystander. I don’t have the expertise or experience to speak authoritatively on the best choice of programming language and development environment. On the other hand, I have an interest in the outcome of the language competition. I want to be able to write small, illustrative programs (like the Processing sketch mentioned above, or another done in ActionScript) and make them available to the world at large via web pages. JavaScript and its variants are likely to be the vehicle for that activity. Could we make it a smooth-running and reliable vehicle, please?

I close with an interesting comment from Hackerdashery:

As the correct metaphor for a web page moves farther from “document” and closer to “application”, maybe it makes sense for browsers to act more like operating systems.

Bloom-filtered Britney

June 9th, 2008

Imagine an unending stream of names:

Britney, Brad, Angelina, Britney, Pamela, Jessica, Jessica, Britney, Clay, Brad, Britney, Britney, Pamela, Clay, Brad….

Your job is to keep a running tally of the number of unique names. (The snippet above, with 15 names altogether, has 6 unique names.)

There’s a straightforward method of solution: Keep a record of all the unique names seen so far, then match each name received against all those already on record. If a new name matches one in your database, ignore it. If there is no match, add the new name to the database and increment the count of unique names.

The trouble with this scheme is that the memory requirement is potentially unbounded. You’ll need to store one copy of each unique name. In the worst case (where no name is ever repeated), your database has to be capable of storing the entire stream. If that stream consists of, say, hundreds of millions of queries submitted to an Internet search engine—with more pouring in every second—you may have a hard time keeping up.

Tasks like this one are the subject of my new “Computing Science” column, titled “The Britney Spears Problem.” It’s now available online, and paper-and-ink magazines should be reaching subscribers and newsstands soon. (Note: American Scientist has just revamped its web site. A lot of effort has gone into this migration, but it’s still a work in progress, so please be patient. Links to the old site are not yet being redirected to the corresponding page of the new site. If you discover anything else that’s not working or not to your taste, I’d be grateful if you’d drop me a note or leave a comment below. I’ll pass suggestions along.)

But back to Britney and her friends.

The basic assumption in the study of stream algorithms is that you get only one look at the stream. You are standing on a bridge, watching the river flow by below. When something interesting floats along in the current, you can choose to grapple it out and store it on shore. But if you let it float past, you never get another chance at it. And you have only a fixed amount of storage space for things you choose to save. There’s also a time constraint: You have to be finished with one object before the next comes along. (In some cases the time constraint can be relaxed to an averaged or “amortized” bound.)

With a fixed and finite amount of storage and a stream of infinite variety, there’s no way to count the unique elements exactly. In my American Scientist column I describe some approximation methods. Here I want to briefly mention another approach that I didn’t have room to include in the column. It’s called a Bloom filter, after Burton H. Bloom, who described the idea in 1970. (See Bloom, Burton H. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7):422–426. Link (subscription needed).)

The term “filter” is a little misleading in this context; what Bloom defined is really a specialized memory. The underlying data structure is an array of m bits, all initially set to zero. Also needed are k different “hash” functions, each of which can be applied to any item being stored in the memory. The hash functions map a data element to a number between 0 and m–1; this number can then be interpreted as an index into the array of bits. (An ideal hash function is both perfectly random and perfectly deterministic—a hard combination to achieve in practice. It’s random in the sense that any data item can hash to any of the m possible values with equal probability. It’s deterministic in the sense that the same data item will always hash to the same value.)

There are two operations defined on the Bloom filter:

  • To store a data element, set each of the bits specified by the k hash functions to 1. (If any of the bits are already set to 1, leave them as they are.)
  • To find out if a data element is already present, calculate the k hash functions for that element and examine the corresponding bits; if any of the bits is 0, then the element cannot be present in the memory.

For the name-list problem, the Bloom filter offers a way to check every name as it comes along to see whether or not it has appeared before. The checking can be done in a fixed amount of space and a fixed amount of time per name. Problem solved, no? Not quite. Although the Bloom filter can never produce a false negative (claiming that a name is not present when it really is) there is a risk of false positives (reporting that a name has been seen before when in fact it is novel). For example, with k=3 and m=10, suppose the name Britney hashes to {0,3,6} and Brad yields bits {3,4,9}. Now, with those two names already stored, Angelina comes along and hashes to the values {4,6,9}. All of those bits are set, and so the Bloom filter will falsely report that Angelina has been encountered previously.

The probability of a false positive obviously depends on the proportion of bits that are set to 1 in the memory array. You might want to try working out just how this probability grows as a function of m, k and n (the number of items stored). Or take the easy way out and read Andrei Broder and Michael Mitzenmacher’s survey. I also recommend the Wikipedia article on Bloom filters.

The Bloom filter strikes me as one of those ideas that seems thunderingly obvious after you’ve read about it. But it’s really very elegant and clever.

Does anyone know who Burton H. Bloom is or was? In 1963 he wrote an MIT AI memo (No. 47) on heuristics in chess. The ACM Guide to Computing Literature lists just one other publication, from 1969. By then he was working for the Computer Usage Company in Newton Upper Falls, Mass., which apparently closed up shop in 1986. I’m not the first (see here and here) to go looking for him.

Spam stats

June 5th, 2008

Hormel Foods, the Minnesota meatpacker, reports a surge in sales of Spam. News accounts attribute the rising popularity of the pink meat-in-a-can to higher prices for other commodities. Or maybe it’s the Spam musubi fad.

Meanwhile, the other kind of spam seems to be surging as well. I’ve been keeping track of my personal spam consumption for the past five years. (I first wrote about this in 2003, with a follow-up in 2007.) Here’s a record of the total number of messages landing in my spam bin each month since the start of 2007:

spamvolume.png

The lull last spring gave me some hope that spam was finally in decline; the monthly intake even fell below 1,000 messages in March and April. But the respite didn’t last. There was steady growth through last summer and fall, and now another spike in volume has brought the rate to nearly 3,000 messages per month.

The message counts charted above lump together spam sent to several email addresses. Here’s a breakdown by address, covering the entire 17-month period:

mailboxes.png

The two addresses that attract the most unwanted traffic—namely, my address here at bit-player.org and another at amsci.org—are both published openly on the web, without any form of obfuscation. So are the addresses identified in the pie chart as “il-perms” and “il-prints”; they appear on my industrial-landscape.org web site. I’m certainly not surprised that spammers have discovered these addresses; they are fair game to anyone who knows how to scrape a web site. But there are still some puzzles in the data. I have several more email addresses that are equally vulnerable—they are published in the same places—but they receive nary a spam. Why not? And my earthlink.net and acm.org addresses are not published (or even much used), yet they get a healthy share of junk mail.

The content of the spam remains much the same—replica watches, blue pills, pirate software, phishing expeditions. Numbingly repetitious. In one week I got 25 messages with the same subject line: “eBay New Unpaid Item Message from snorelax67.” Then there were the 34 messages with subject lines such as “Viadzgra - $1.20,” “Viabqgra - $1.75,” “Viafmgra - $1.09″ and “Viategra - $1.38.” (Evidently someone has written a little program to insert random letter pairs in the middle of the word. My spam filter was not fooled. Nor did it fall for “Hihg - qualiyt repliacs of the ebst lcock of the wrold!!”) In “How Many Ways Can You Spell V1@gra?” I argued that most of the world’s spam is coming from a relatively small number of senders—tens or possibly hundreds, but not thousands—and I think the evidence continues to support that conjecture.

One interesting trend in my spam is that it seems to be growing more cosmopolitan. Back in 2003, about 18 percent of the spam I received was written in languages other than English; the figure now is 34 percent. The distribution of languages is curious. Here are the data for May 2008, when I received a total of 933 non-English spams:

spamlangs.png

Does everybody get gobs of spam in Russian, or is it just me? Is there something about my Internet activity that leads mailing-list compilers to believe I read Russian? Well, here’s the sad truth: My knowledge of Russian is so totally lacking that I’m not even sure all those messages are really Russian. They come with a Cyrillic character encoding, but for all I know some of them could be Bulgarian or Ukrainian. I’m equally in the dark about the 153 messages that appear to be written in various Asian languages (Chinese, Japanese, Korean). As for the German messages, they are something of a novelty. Until a few weeks ago, I almost never saw spam in German, and now there’s a sudden spate. It’s pretty clear that all of it comes from the same source. I’m seeing no French spam, nor Portuguese, nor Hindi, Urdu, Arabic, Hebrew.

Linguistic diversity is laudable, and in general I’m pleased to see challenges to Anglophone hegemony. I’m always flattered when someone addresses me in another language—even if I can’t respond in kind. But in this case I’m afraid there’s no reason to be congratulating myself. The spammers are not sending me these multilingual documents because they take me for an accomplished and urbane polyglot. They’re sending them to me (and to millions of others) because selectivity just isn’t worth the bother. Addressees like you and me are too cheap to count. Spam is becoming something like the cosmic microwave background radiation. It’s everywhere, it’s meaningless, it can be mistaken for birdshit.

Update 2008-07-01. More pink meat. I’ve tallied up the receipts for June, and my personal spam volume has set a new record: 3,354 messages, an increase of 20 percent over the previous high of 2,794 in May. The updated graph now covers 18 months:

spamvolume701.png

It’s worrisome to see the quantity growing so fast, but let me try to put the matter in perspective. Alongside the 3,354 spams I received in June, I also received 1,245 nonspam messages. Thus the proportion of spam is about 73 percent—well under the figure of 90 percent that’s often bandied about by companies that sell anti-spam products and services. Moreover, the spam causes me very little actual bother; almost all of it goes directly into the junk folder without need for human intervention. The nonspam messages, on the other hand, demand to be read and responded to. Perhaps I’d get more accomplished if more of my mail were spam.

I have not done a language analysis of the new batch, but I can tell at a glance that I’m still attracting a bizarre glut of Russian spam. A subject line that caught my eye reads:

programspam.png

I can sound out just enough Russian to guess the transliteration “programme spam.” Inside the message is an image of an advertisement (also in Russian) for various warez. But the decoy text that’s meant to get the message through the spam filters is a sports story written in German. Thus even individual messages are now becoming multilingual.

Unnatural logarithms

June 1st, 2008

I have a longstanding friendly feud with my Editor-in-Chief over the use of logarithmic scales in graphs. I tend to go for a log plot if there’s the slightest hint of an exponential trend in the data; she argues that the human sense of numbers is inherently linear, and thus a nonlinear scale should be used only in exceptional circumstances. Linear is natural, she says; logs are for geeks.

numberline.pngVindication is always sweet, so I was delighted to see a report in the latest issue of Science (subscription needed for full-text access) suggesting that members of an Amazonian indigenous group seem to order numbers on a logarithmic scale. Stanislas Dehaene, Véronique Izard, Elizabeth Spelke and Pierre Pica worked with 33 adults and children of the Mundurucu group in Brazil’s Para province. The subjects were shown a line segment with labeled end points (illustration above) and asked to place various other numbers within this interval. The numbers were presented as patterns of dots, as sequences of tones or as spoken number words in the Mundurucu language or in Portuguese. The results of the experiment show, according to Dehaene et al.:

The Mundurucu seem to hold intuitions of numbers as a log scale where the middle of the interval 1 through 10 is 3 or 4, not 5 or 6.

In contrast, 16 American control subjects from the Boston area placed numbers on the scale pretty much in proportion to their magnitude.

resultgraphs.png

For the sake of my ongoing quibble with the E-in-C, it’s in my interest to support and endorse this finding. I wish could do so.

logcurves2.pngExactly what does it mean to “hold intuitions of numbers as a log scale”? Simply taking base-10 logarithms of numbers isn’t quite what’s wanted. The logarithm function maps the integers 1 through 10 onto a range from 0 to 1. To map numbers in the domain [1, 10] onto the same co-domain “logarithmically,” we need a transformation along the lines of R = (9 log10 S) + 1, where R is the “response location” and S is the “stimulus number.” But the resulting curve, superimposed in red on the graph above, doesn’t match the experimental data very well at all. And, looking at the data more closely, it’s hard to see how any plausible transformation is going to account for the positions of those dots and error bars. There’s trouble in the middle, where 5 was judged larger than 6. And there’s trouble at both ends: No matter where you place the numbers 2 through 9 on the scale, shouldn’t 1=1 and 10=10? The anomalies at the ends are made more puzzling by the instructions given to the test subjects:

Only two training trials were presented, with sets of dots whose numerosity corresponded to the ends of the scale (1 and 10). The participants were told that these two stimuli belonged to their respective ends, but that other stimuli could be placed at any location.

In view of these instructions, it seems peculiar that stimulus number 1 has a response location near 1.5, and stimulus number 10 corresponds to a response location of roughly 8.

I wonder if the data aren’t better explained by a simple step function. Perhaps the subjects counted dots up to 4, but beyond that merely estimated the number of spots, lumping together any stimulus with 5, 6 or 7 dots in one class, and putting patterns of 8, 9 or 10 spots in a second class.

Or maybe the arrangement is logarithmic after all. The Mundurucu may have a lot more mathematical sophistication than Dehaene et al. give them credit for. When presented with a scale that runs from 1 to 10, rather than 0 to 10, they intuit that the interval belongs to the series …, 0.01, 0.1, 1, 10, 100,…, rather than the more prosaic series …, –10, 0, 10, 20, 30, …. Anyway, I think that’s what I’ll tell my Editor-in-Chief.

Promoting my promotions

May 24th, 2008

I’ll be on the radio Monday (May 26th) at 6 p.m., chatting with Dorian Devins about my recent book Group Theory in the Bedroom, and Other Mathematical Diversions. If you’re within shouting range of Jersey City, N.J., you can listen in on WFMU; otherwise, stream it live over the net; or wait for the podcast. (Incidentally, even if you have no interest in listening to my self-promotional rants, I recommend the WFMU web site for excellent advice and instructions on doing Internet radio.)

In other GTiBaOMD news, I received a Google Alert the other day, pointing me to a new review of the book. When I followed the link, I was momentarily perplexed. Before I could see the review, I had to press a button bearing the legend: “I am over 18 and agree to the viewing of sexually explicit material.” As it happens, the page that awaited me beyond this warning had no sexually explicit material at all. Indeed, that lack was the essence of the reviewer’s complaint:

Sex and Math. You would think I would be in heaven at the mere thought that somebody had written a book combining these two things. And this book would be heaven if it had combined those two things. Instead, sadly, this is a case where the title is both a little too literal and yet not quite accurate.

Point taken. It’s true that there’s little titillation in my tale of mattress-flipping. Considering that disappointment, the reviewer’s assessment is remarkably even-handed. (”Would I recommend this book? Sure, especially if you have a fancy for computing, math and solving problems.”) Thanks, Naughty Bookworm.

On the spot

May 24th, 2008
redspot.jpg

Wow. Jupiter has sprouted a third red spot. It was just two years ago that the Great Red Spot was joined by a smaller companion, which was quickly dubbed “Junior.” I guess the new red spot, discovered in the past few weeks, will have to be called “III.”

In the view above, from the Hubble Space Telescope, Junior is southwest of the Great Spot, and the new, smallest member of the family is due west of the big one and a little farther downwind. This is a false-color image, constructed by assigning colors to monochromatic images recorded at three wavelengths, but the intent is to correctly render colors as perceived by the human eye. Evidently none of the spots are really red at the moment. If they were all newly discovered right now, we would have the Great Peach of Jupiter and the Two Little Apricots.

When I get beyond merely admiring the glorious, painterly spectacle of this Jello-chiffon dessert in the sky, what fascinates me most is the time scale of the red spot phenomenon. The Great Spot has been there for at least a century or two, and probably much longer. It is a storm, with rapid counterclockwise circulation clearly visible in the time-lapse photos returned by the Voyager I spacecraft in 1979.

Storms are something we can relate to from our earthling experience; we have cyclones here too. But what kind of storm lasts for hundreds of years? Even allowing for the larger spatial scale of events on Jupiter, the Great Spot seems extraordinarily long-lived. The rotation period is roughly one earth-week, which means the spot has survived for something on the order of 10,000 revolutions. And it is geographically stable, too: Although the spot drifts in longitude, it seems to be pinned in latitude, hovering at a swirling boundary between easterly and westerly wind belts.

Very likely, the key to the Great Spot’s longevity is that Jupiter has no continents or other surface irregularities to disrupt the flow of the atmosphere. But that fact makes the uniqueness of the spot somewhat mysterious. If such features can arise spontaneously, purely from the dynamics of the atmospheric flow, like a pearl created without any need for a grain of sand, then why is there just one red spot? You’d think that such storms would develop from time to time wherever conditions were favorable.

And now we have our answer: There’s not just one red spot. But the question of time scales doesn’t entirely go away. It seems implausible that one storm would go on for centuries in lonely splendor, and then suddenly two more would evolve within a couple of years. Perhaps there have been others and we just didn’t notice? Not within the past 50 years, I think. Another possible explanation of this improbable coincidence is that the births of Junior and III are not independent events. All three storms are nearby (at least by Jovian standards) and are surely interacting. If that’s the case, we may not have seen the end of this sequence of events. Will there be more spots? Will they collide or coalesce? Stay tuned.

In the matter of time scales, I can’t help noting that Jupiter has a connection with another epochal event in the modern Internet era. In July of 1994 comet Shoemaker-Levy 9 crashed into Jupiter, and the world followed along via the web. The idea that anyone with a modem could download the images directly from JPL—no waiting for the news media—made quite an impression. The Netscape icon was the apotheosis of this event.

Links:

More third-spot images and explanations of how they were made, from Imre de Pater, UC Berkeley.

Reporting from Science Blog.

Reporting from New Scientist.

A report from the Philippine Daily Inquirer with some background on who first spotted the new spot.

The Wikipedia article on the Great Red Spot (which already has a note on the new one).