Archive for May, 2006

ENIACiana

Wednesday, May 31st, 2006

The ENIAC may or may not have been the first of its kind—the first fully functional, all-electronic, general-purpose, digital computer—but there’s no doubt it was the last of its kind. They never built another one like it. Computers designed just a few years later seem reasonably familiar in their gross anatomy, but the ENIAC was a creature from another world. A program for the ENIAC was not a sequence of instructions to be executed one after another; it was a configuration of patchcords and plugboards, directing the flow of data from one calculating unit to another.

ENIAC plugboard programmers

Ruth Gorden and Ester Gerston wire an ENIAC plugboard circa 1946. U.S. Army photo from http://ftp.arl.mil/ftp/historic-computers/

For the first time in many years it’s now possible to watch the ENIAC in action; if you’re ambitious, you can even try wiring up the plugboard for a problem of your own choosing. Till Zoppke and Raúl Rojas of the Freie Universität Berlin have created an ENIAC simulator in the form of a Java applet, which they describe in the latest IEEE Annals of the History of Computing. (This year is the 60th anniversary of the ENIAC’s public debut, and so the Annals have a series of celebratory articles.)

Unfortunately, I haven’t been able to play with the simulator myself because of a mysterious bug or incompatibility. Deep in the bowels of my Java Virtual Machine there must be a patchcord in the wrong plugboard.

Room 641A

Wednesday, May 24th, 2006

These are the days of miracle and wonder
This is a long distance call

—Paul Simon

As a person who occasionally sends e-mail and talks on the telephone, I’ve been following with interest and curiosity all the recent press reports about alleged eavesdropping and data-mining by U.S. government agencies. Mathematically, the most intriguing part of this story has to do with the analysis of the call-detail graph, the big database of phone-company records that shows who calls whom (without revealing who said what). For now, though, I want to set that topic aside and focus on the rumors that someone might actually be listening in on my telephone conversations or reading over my shoulder when I’m online.

The current round of controversy began last December with a story by Eric Lichtblau and James Risen in The New York Times, claiming that the National Security Agency, with the cooperation of telephone companies, had “traced and analyzed large volumes of telephone and Internet communications flowing into and out of the United States.” Of course there has been speculation for many years that the NSA surreptitiously monitors international communications; in a fabled program known as Echelon, the agency supposedly intercepted satellite signals and deployed divers or submarines to secretly tap undersea cables. The new allegations describe a much less arduous procedure. According to Lichtblau and Risen, “senior government officials arranged with officials of some of the nation’s largest telecommunications companies to gain access to switches that act as gateways at the borders between the United States’ communications networks and international networks.”

The Lichtblau and Risen account was short on specifics, offering no hint of which switching centers had been tapped, or even which companies might be participating. But in April a more-concrete assertion surfaced. Mark Klein, a retired technician for AT&T, described the installation of equipment that he identified as NSA surveillance gear at an AT&T facility in San Francisco. He said the monitoring hardware was installed late in 2002 or early in 2003 in room 641A at 611 Folsom Street, a building then owned by SBC Communications, where three floors were occupied by AT&T. (SBC has since merged with AT&T.) Furthermore, Klein said, a coworker had told him of similar surveillance outposts in Seattle, San Jose, Los Angeles and San Diego. Since there’s no obvious reason to single out the West Coast, it’s a reasonable inference that if any of these reports are true, then similar eavesdropping facilities may have been built in other major cities as well.

Klein is a witness in a law suit brought against AT&T by the Electronic Frontier Foundation. Documents that Klein submitted as evidence have been put under seal by the court, but a few items have apparently leaked out, and on Monday Wired News published what purports to be the complete collection of documents—or at least that’s the suggestion conveyed by the headline, “Whistle-Blower’s Evidence, Uncut.” An accompanying story says the material came from “an anonymous source close to the litigation.” The documents include several memos, tables and diagrams labeled “AT&T Proprietary.”

Much about the case remains murky. Still, if we accept Klein’s interpretation of the documents, there’s enough information to begin asking some quantitative questions.

Drinking from the Firehose

From Klein’s account, it seems this particular installation monitors only the packet-switched network (roughly speaking, the Internet) and not the circuit-switched part of the communications infrastructure (including most telephone calls). We are told that fiber-optic cables carrying traffic to and from 16 other Internet providers and exchanges had “splitters” installed, so that a replica of the signals could be diverted into room 641A. The 16 fiber circuits operate at various data rates: four at OC-3 (roughly 150 megabits per second), eight at OC-12 (600 megabits per second), and four at OC-48 (2400 megabits per second). The total bandwidth is thus about 15,000 megabits per second. Converting this figure from the bits-per-second convention of the communications industry (where a million is 106) to the bytes-per-second world of computing (where a million is 220), we arrive at a data rate of a little under 2 gigabytes per second. That’s the maximum capacity of the fibers, but they surely do not run full all the time. If we suppose that the load factor is 50 percent (a number I’ve just plucked out of thin air), then room 641A is taking in 1 gigabyte per second, or 86 terabytes per day.

Coping with such a flood of data is a significant challenge, but it’s not totally beyond imagining. In high-energy physics and astronomy a few experiments will soon be generating data volumes of the same order of magnitude. If the NSA wanted to record the entire data stream for later perusal, it could probably be done, at least for brief bursts. One item of equipment in room 641A, according to the Klein documents, is a Sun StorEdge T3 disk array; the exact model is not indicated, but the largest version available in 2003 held 168 terabytes of storage space—enough to last a couple of days. Much more capacious storage arrays—with capacities beyond a petabyte (1,000 terabytes) are readily available now. (But the bandwidth of the connections being monitored has probably also grown in the past three years.)

It’s been said that if Google can index the entire World Wide Web, and even store many of the pages in its cache files, then surely the NSA can do the same. But the task is not, in fact, the same. For one thing, the Internet is not just the Web; there are many other streams of data passing over the network, including everything from e-mail and “instant messages” to FTP transfers. What’s more important, Google sees the Web mainly as a quasi-static structure, made up of pages that it can visit on its own schedule, and which it needs to re-index only after some change is noted. A sniffing device installed on an Internet backbone circuit has a very different view of the Web. If 100,000 people visit yahoo.com, then the monitor sees the Yahoo home page streaming by 100,000 times. This redundancy complicates the eavesdropper’s task; on the other hand, it also offers the potential of capturing additional information. You not only learn what’s on every Web page but also how many people are visiting that page, and maybe even who they are.

Frankly, trying to squirrel away a copy of everything is a brute-force strategy that seems unlikely, unnecessary and a bit dull-witted. It would merely defer the main problem: At some point you have to digest all that information. There’s no point in collecting things that you’ll never have time to read. Better to be more selective up front, scanning the bit stream as it rushes by, and only saving items that look like they might be worth closer scrutiny. The Klein documents support the hypothesis that room 641A is set up for such sifting. On the manifest of equipment to be installed in the room, the most distinctive item is a machine called a Narus STA 6400. Narus, Inc., is a company headquartered in Mountain View, Calif., whose name is said to derive from the Latin gnarus, meaning all-knowing. The STA in STA 6400 stands for Semantic Traffic Analysis, which is presumably meant to give the impression that the device can classify Internet transmissions by their meaning.

According to the Narus Web site, the company got its start helping Internet service providers monitor, measure (and in some cases police) traffic on their own networks. For example, some carriers prohibit voice-over-internet-protocl (VoIP) telephone calls. The STA 6400—like the new model that has supplanted it, called the NarusInsight—is said to distinguish VoIP packets from other kinds of traffic. Other network operators might want to detect or regulate peer-to-peer file sharing; again, the Narus software claims to recognize such activity. These tasks of discrimination are not easy, but in my opinion they don’t quite rise to the level of “semantic” analysis. Recognizing a packet as belonging to a Skype conversation or a Kazaa download doesn’t penetrate very deeply into the packet’s meaning. And merely classifying packets according to their type or protocol doesn’t look like a very promising approach for detecting terrorist activity. But we can’t know what the NSA is doing with this equipment—if indeed it exists and if they are operating it.

Grepping the Net

For the sake of argument, suppose it’s all true, and the NSA is equipped to intercept and evaluate every bit that passes through the global Internet. If you were an analyst expected to retrieve useful information from this data stream, what kinds of patterns would you look for? Here are some possibilities that occur to me. I’d be interested to hear other ideas.

  • Tracking specified individuals. If we knew the IP number of Osama bin Laden’s computer, we could slurp up every packet traveling to or from that machine, and reconstruct all the details of his Internet activity, just as if we were looking over his shoulder. (Then again, if we knew his IP number, we would also have a pretty good idea of his physical whereabouts, and we could go have an offline chat.) Domain names, e-mail addresses, account names and other kinds of identifiers could also provide a key to tracking known individuals. However, if surveillance of targeted individuals is the main aim of the program, wiretapping the entire Internet seems like a grotesquely wasteful way to go about it. There are more direct and efficient means of doing the same thing.
  • Monitoring specified organizations. Once upon a time, this was the raison d’être of intelligence services—planting bugs in embassies, breaking the codes of the KGB or the GRU. Comprehensive Internet surveillance is presumably useful for such purposes too, although Al Qaeda doesn’t have an embassy.
  • Getting inside the envelope. Some of the people charged in the train bombings in Madrid in 2004 are said to have communicated through a shared Yahoo e-mail account, from which they never actually sent any e-mail. According to the Spanish courts, one conspirator would log onto the account and write a draft of a message, leaving it unsent; then the rest would log on, using the same name and password, and read the draft. Perhaps they thought that an unsent message would never be exposed to eavesdroppers, but that is a misconception; in fact the text does flow over the network every time someone views it. (How else could it be displayed on the screen?) There’s no evidence that the NSA actually spotted any such messages, but in principle they could, if they knew what to look for, and if the packets passed through a node of the network where they had a listening post.
  • Staking out the watering hole. When we think of wiretapping the Internet, what comes to mind first is reading people’s e-mail, or maybe their instant-message traffic. These are person-to-person modes of communication that seem to offer some modicum of privacy—which may be why we suppose that an intelligence agency would want to pry them open. Other Internet forums, such as usenet news groups, are so public that there’s no need for skullduggery to read their contents. Nevertheless, it’s possible that e-mail would not be the main target of surveillance, and that public areas of the Internet would attract considerable attention. For example, the NSA might find it worthwhile to compile lists of visitors to certain web sites (maybe even this one).
  • All of the above. A government agency with legal clout and insider knowledge could gather all this information by more-conventional and less-intrusive means—but what a pain to have to pursue each case individually. If I were suspected of terrorist activity, I’m pretty sure my Internet service provider would give up the goods on me, but there’d be so much paperwork to do, and then the same rigmarole all over again with the hosting service that provides space for this Web site, and with Google for my gmail account, and so on. A tap on the Internet backbone offers one-stop shopping. No need to ask amazon.com what books I bought. No need to ask Google what terms I searched for. You just scoop it out of the river as it floats by.
  • Grepping the Net. Spying on people who are already known or suspected malefactors may be the routine business of the intelligence services, but the mythic promise of the all-seeing surveillance device is the possibility of uncovering a conspiracy de novo, breaking up the plot before it’s even hatched. Legend has it that the heart of the Echelon program was blind scanning of intercepted communications traffic for interesting words or phrases. And so now we can imagine the NSA programming its Narus STA 6400s to grep “bomb \(plot \| conspiracy)” against the entire Internet. Maybe this works. I’m skeptical.
  • Retrospective analysis. Here’s a reason for trying to save a copy of the entire Internet data stream, at least for a period of days. On September 10, 2001, an e-mail mentioning AA11, UA175, AA77 and UA93 would not have attracted the slightest attention, but a day later those flight numbers were notorious. I have no idea whether such an e-mail was ever sent, but I’m sure that investigators would have welcomed an opportunity to find out.
  • Social network analysis. The reported interest of the NSA in telephone call-detail graphs hints at an attempt to discover communities of people with shared interests. The most clear-cut case is to identify cliques in the graph: sets of people who have all called one another. In the case of the long-distance telephone network, the databases of call records already exist, compiled by the telephone companies for their own purposes. Similar kinds of social-network analysis—possibly even more interesting—could be done with Internet data, but as far as I know, no one keeps the necessary records. Compiling such a database strikes me as a plausible motivation for the kind of monitoring the NSA is said to be doing. The closest analogy to the telephone case would be looking for groups of people who have all exchanged e-mail (or other kinds of direct messages), but the technique is not limited to that. Someone might also take an interest in finding groups of people who have all visited the same set of Web sites within a certain period of a time, or downloaded the same files. And it’s notable that this kind of information is particularly easy to acquire; it mostly depends on addresses of data packets, not their content.
  • Jiggery-pokery. It would be unfair to conclude this list without mentioning some of the more cynical speculations about the likely uses of an Internet spinal tap. According to the Bush administration, the program of warrantless wiretapping was approved in the weeks after 9/11 as part of an effort to find the people responsible for the attacks and prevent any recurrence. But the search for the 9/11 conspirators led mainly to shadowy figures in Afghanistan. In 2001, Afghanistan was possibly the least-wired place on earth; the Taliban had banned the Internet, and there was only one computer in the country with a sanctioned connection. Thus the Internet would not seem to be the most obvious place to look for “chatter” among those hiding out in the caves of Tora Bora. On the other hand, the Internet is an excellent place for keeping track of political opponents or watching out for signs of disloyalty within your own ranks. (But that would be wrong.)

Occam’s Other Razor

Does any of this make sense? Set aside all questions of what’s legal or moral or even prudent, and simply consider what’s feasible. Can you listen in to the simultaneous clatter of several hundred million computers and extract anything of value from all the noise?

It’s easy to find people who aren’t hiding, but they’re probably not the ones you want to catch. And for those who do want to conceal their presence and their intentions, the Net offers many, many dark corners, despite room 641A. For example, it’s well known that 70 or 80 percent of all e-mail is spam, but the NSA dare not ignore it. If I were sending signals to the members of a covert cell, I would be tempted to embed my message in an offer for V1@gra or replica wristwatches. Indeed, much spam comes with an addendum of apparently random text, meant to fool spam-blocking filters. How do we know that’s all it fools? Or one could ride piggyback on the high-bandwidth BitTorrent, hiding out among the kids trading music tracks and warez. Then there’s cryptography; more on that below.

The story told by Mark Klein is highly plausible, and the published documents make a strong case, yet there are a few doubtful details. None of the documents that come from AT&T mention the NSA or any other government agency. The NSA connection comes solely from Klein’s own testimony. In statements to the press he has said that the NSA interviewed AT&T employees for a job that would involve working in room 641A, and eventually hired someone. Similarly, in a statement published by Wired News, Klein says, “only people with security clearance from the National Security Agency can enter this room.” Although I’ve personally never had any dealings with the NSA (as far as I know), this story sounds fishy. Would the NSA—long known as “No Such Agency”—identify itself so openly when interviewing telephone company employees? Do they hand out cards that say “NSA Security Clearance”?

But if not an NSA snooping roost, what is in room 641A? One possibility is a traffic-monitoring facility that AT&T might have built for its own use. Some years ago AT&T developed a system called PacketScope, not too different from the modern Narus devices; papers in the open literature [see update below] discuss the installation of PacketScope equipment at two AT&T Internet hubs. Other network operators do the same kind of monitoring. They want to know where their traffic comes from and goes to, and what their customers are doing with the bandwidth. But if the 16 splitters and the Narus STA 6400 and the Sun StorEdge T3 all have such an innocent explanation, why doesn’t AT&T say so? Not everyone would believe them, but the denial would certainly muddy the water. Instead AT&T has released a bland statement about their commitment to privacy and their obligation to assist law enforcement. And then the federal government intervened to dismiss the suit against AT&T, claiming that the issue cannot be adjudicated without revealing state secrets and endangering national security. It’s almost as if they want us to believe the worst.

Okay, I’ll go along. And I’ll keep in mind that this is the looking-glass world, ruled by Occam’s other razor, the one that favors whatever explanation is most convoluted and counterintuitive. Why would the government want the whole world to know that we’re listening? Here’s my best shot at an answer: Maybe to encourage people to encrypt their most-sensitive communications. How would that benefit the eavesdroppers? Packets carrying encyrpted text should be easy to recognize because of their “flat” statistics; they label themselves suspicious. But, you say, it’s no use recognizing an encrypted message if you can’t read what’s inside. Who can’t read what’s inside?

Update 2008-09-07: The “papers in the open literature” mentioned above are no longer quite so open. The link above has gone dead; the paper is still listed in the bibliography of Ramon Cáceres, but links to PDF and Postscript versions have been removed. As of today a PDF can still be downloaded through Citeseer.

The oddest numbers

Wednesday, May 10th, 2006

At the library the other day I was perusing the Collected Rantings and Ravings of Edsger W. Dijkstra [Note 1]. While leafing through the pages in search of something else [Note 2], I stumbled across “An Exercise for Dr. R. M. Burstall” [Note 3], a brief essay in the form of an open letter, written in 1976. The “exercise” involves a little recursive function of the natural numbers, which Dijkstra named fusc. I’m a collector of such trifles, so I immediately perked up. The fusc function, though 30 years old, was new to me. Here’s how Dijkstra defined it [Note 4]:

fusc(1) = 1
fusc(2n) = fusc(n)
fusc(2n+1) = fusc(n) + fusc(n+1)

And here’s the Lisp version [Note 5] I quickly knocked together:

(defun fusc (n)
  (cond ((= n 1) 1)
        ((evenp n) (fusc (/ n 2)))
        ((oddp n) (+ (fusc (/ (- n 1) 2))
                     (fusc (/ (+ n 1) 2))))))

In the letter to Burstall, Dijkstra had this to say about fusc:

(The function fusc is of mild interest on account of the following property: with f1 = fusc(n1) and f2 = fusc(n2) the following two statements hold for n1 ≠ n2: “if there exists an N such that n1 + n2 = 2N, then f1 and f2 are relatively prime” and “if f1 and f2 are relatively prime, then there exist an n1, an n2, and an N, such that n1 + n2 = 2N”. In the above recursive definition, this is no longer obvious, at least not to me; hence its name.)

It looked pretty fuscular to me too. Neither the definition nor Dijkstra’s explanation gave me an immediate sense of what the function was meant to calculate, so I ran the program and tabulated a few argument-value pairs:

n       =  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
fusc(n) =  1 1 2 1 3 2 3 1 4  3  5  2  5  3  4  1  5  4  7  3

The first pattern I was able to spot here was in the values of fusc(n) for n = 2, 4, 8, 16…. Whenever n is a power of 2, fusc(n) = 1. It’s easy to see why: When n = 2k, the function merely invokes itself on successive values of n/2 until eventually n = 1, at which point the recursion terminates. It’s also clear that only for n = 1 and n = 2k can fusc have the value 1. I noted a similar pattern in the intervals between values of n for which fusc(n) = 2, namely the values n = 3, 6, 12, 24…. Interesting, but not all that illuminating. And the distribution of 3s in the sequence looks less regular, with values of 3 at n = 5, 7, 10, 14, 20….

In another attempt to figure out what was going on, I sketched the tree of recursive function calls the program must make when it is started on a specific value of n:

call tree for fusc(21)

The general pattern seemed to make sense; in particular, each odd value of n gave rise to two recursive calls (on (n–1)/2 and (n+1)/2), whereas each even n had only one descendant in the call tree, on n/2. And I could see that the eight 1s forming the leafy fringe of this tree are summed up to produce the returned value fusc(21) = 8. But there were still plenty of questions. For example, I could not predict the next term in the fusc series.

If I had persisted, perhaps I would have worked all this out on my own, but persist I did not. What I did was succumb to temptation. I cheated. I looked up the series in Neil J. A. Sloane’s Encyclopedia of Integer Sequences. In 3.547 seconds flat I had a detailed report listing 91 terms of the series and discussing its properties, applications and history. The history, it turns out, extends back long before Dijkstra [Note 6].

Sloane’s EIS also revealed some remarkably conspicuous patterns in the series that I had somehow missed entirely. Consider the subsequence extracted by starting at fusc(2) and continuing with every second value: 1, 1, 2, 1, 3, 2…. It’s the original sequence all over again! This also works for every fourth element starting at fusc(4), for every eighth element starting at fusc(8), and so on. The sequence has a nested, self-similar structure:

1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1 5 4 7 3 8 5 7 2 7
  1   1   2   1   3   2   3   1   4   3   5   2
      1       1       2       1       3       2
              1               1               2

Another distinctive pattern involved consecutive pairs of elements from the series. If you haven’t already discovered it for yourself, you might want to try again before reading on. (Assuming that you, too, haven’t already looked everything up in the EIS.)

Here’s the pattern: Take each pair of adjacent numbers and form a fraction: 1/1, 1/2, 2/1, 1/3, 3/2, 2/3, 3/1, 1/4, 4/3, 3/5, 5/2, 2/5, 5/3, 3/4, 4/1…. If you keep going in this way, you will eventually generate every rational number exactly once, all of them reduced to lowest terms [Note 7].

The EIS identifies the output of the fusc function as “Stern’s diatomic sequence” and refers to an 1858 paper by M. A. Stern [Note 8]. Seeing that reference rang a bell for me. I recognized both the name of the author and the title of the paper. And several of the other references also looked familiar. And then, as I scanned farther down the list, I had the disconcerting experience of finding one of my own articles mentioned. Not that I’m complaining about this. It’s always a fine thing to have your work cited—to see your name up in lights—but it’s a little worrisome, all the same, when you’re struggling to understand a new idea and you’re referred back to yourself for the answer.

What I knew of Stern was the Stern-Brocot tree [Note 9], which also generates all the rationals, but by a somewhat different construction. Or at least it appears different. The basic idea is to take any two fractions and create a new intermediate fraction, the mediant, by adding the numerators and denominators. For example, 1/2 and 2/3 form a mediant of 3/5. Applying this one rule systematically produces an infinite tree of rational numbers:

Stern-Brocot tree of rationals

All the ratios from the fusc sequence appear here, row by row, although their order within each row is none too obvious. Even less obvious (to me) is the connection between the fusc recursion and the algorithm for building the Stern-Brocot tree. The fusc process seems to be focused on the distinction between even and odd, whereas the Stern-Brocot algorithm is all about a weird kind of averaging. Yet it can’t be coincidence that both operations yield the same set of numbers.

In an effort to better understand what’s happening in the fusc procedure, I created an “instrumented” version of the program. The fusc recursion can terminate only when n is reduced to 1, and the value returned by the function is a count of how many times this has happened. I wanted to see how many times the program triggers the other two clauses—those for cases where n is even and those where n is odd and greater than 1. Here are some results:

n      1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
ones   1  1  2  1  3  2  3  1  4  3  5  2  5  3  4  1  5  4  7  3
evens  0  1  1  2  2  2  3  3  4  3  4  3  5  4  6  4  7  5  7  4
odds   0  0  1  0  2  1  2  0  3  2  4  1  4  2  3  0  4  3  6  2

The row labeled ones records the number of times the (= n 1) clause is executed; the evens row give the number of times the (evenp n) predicate is satisfied, and odds counts the number of times the (oddp n) predicate in the third clause fires [Note 10]. As expected, the numbers in the ones row correspond exactly to the values of fusc(n). Also note that whenever n is an exact power of 2, the entry in the evens row is that power, and the entry in the odds row is 0.

I want to call attention to still another pattern. Observe that odds(n) never exceeds evens(n)—at least within the range of n shown here—and that the two quantities are equal only in a few cases, namely n = 1, 3, 5, 11, 21. Is there something special about these numbers? Here are some more values of n for which odds(n) = evens(n)—in fact, all such values less than 1 million:

1 3 5 11 21 43 85 171 341 683 1365 2731 5461 10923 21845 43691
87381 174763 349525 699051

Before spoiling all our fun by turning to the magic of the EIS, consider what these numbers signify. In a certain sense, they are the oddest of all numbers: When decomposed by the fusc recursion, each of these numbers runs through the odd branch of the circuit more often than any other numbers of similar size. But why do these particular numbers, and no others, have that property? I’m not sure I know the answer to that question, but I have a way of thinking about it that I find amusing. Suppose I ask you to construct a series of natural numbers with the following properties: Every number is odd, and every number is double the preceding number in the series. This task is obviously impossible: No integer that is twice another integer can be odd. But the series of “oddest” numbers listed above represents a kind of best-effort attempt to comply with the rules. All the numbers in the series are certifiably odd, and the ratio of successive terms converges on 2 in the limit of large n.

There’s more to be said. Each of these “oddest” numbers lies between two successive powers of 2. Let us index the oddest numbers as Mk, with M0=1, M1=3, and so on. Then for any k, 2k < Mk < 2k+1. Furthermore, as k increases, the ratio of Mk to 2k+1 approaches 2/3, and the ratio of Mk to 2k approaches 4/3. Why those ratios? Well, where would you expect the “oddest” numbers to be found—at the halfway point between successive powers of 2? Of course not: That’s the line of even division.

The series I have been calling the oddest numbers is sequence A001045 in the EIS, where it is identified as the Jacobsthal sequence [Note 11]. It turns out the Jacobsthal numbers are close kin of the Fibonacci numbers; they are generated by the recursion Jn = Jn–1 + 2Jn–2, with J0=0 and J1=1. And they turn up in quite a variety of places. For example Jn gives “the number of ways to tie a necktie using n+2 turns.” Closer to home, a note from Amarnath Murthy points out that the sums of consecutive terms of the Jacobsthal series yield the powers of 2 in order. (I hadn’t noticed.)

In recent weeks I have been chastised (in the friendliest way!) for writing blog entries that leave no room for readers to contribute thoughts of their own. Well, there’s no shortage of loose ends in this story. Here are a few questions for which I would like to know answers.

1. As noted above, the series of n for which fusc(n)=1 begins 1, 2, 4, 8, 16, 32, 64, and the corresponding series for fusc(n)=2 is 3, 6, 12, 24, 48, 96, 192. But the distribution of other fusc values is not so easy to describe. The positions where fusc(n)=3 begin 5, 7, 10, 14, 20, 28, 40, 56, 80; this series is identified in the EIS as a set of numbers whose “binary expansion is 1x100…0 where x = 0 or 1.” Does that property explain why the numbers appear in this context? For fusc(n)=4, the series of n is 9, 15, 18, 30, 36, 60, 72, which are numbers whose “binary expansion is 1xx100…0 where xx = 00 or 11.” Huh?

2. In the tabulation of evens and odds above, I have a million reasons to believe that the number of odds never exceeds the number of evens, but I don’t have a proof. Am I missing something obvious?

3. The action of the fusc function depends critically on the distinction between odd and even, or in other words on residues modulo 2. We can write analogous functions that classify integers modulo 3, 4, 5, and so on. For example, here’s fus3 in Lisp:

(defun fus3 (n)
  (cond ((= n 1) 1)
        ((= n 2) 2)
        ((= (mod n 3) 0) (fus3 (/ n 3)))
        ((= (mod n 3) 1)
         (+ (fus3 (/ (- n 1) 3))
            (fus3 (/ (+ n 2) 3))))
        ((= (mod n 3) 2)
         (+ (fus3 (/ (- n 2) 3))
            (fus3 (/ (+ n 1) 3))))))

And here’s a generic function, fusk [Note 12], that performs the equivalent calculation modulo k. (Interestingly, it’s simpler than the mod-3 version.)

(defun fusk (n k)
  (cond ((< n k) n)
        ((= (mod n k) 0) (fusk (/ n k) k))
        (t (let ((r (mod n k)))
             (+ (fusk (/ (- n r) k) k)
                (fusk (/ (+ n (- k r)) k) k))))))

Here’s some output from a few of these variant functions (fus2 is of course the standard fusc):

n      1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
fus2   1  1  2  1  3  2  3  1  4  3  5  2  5  3  4  1  5  4  7  3
fus3   1  2  1  3  3  2  3  3  1  4  4  3  6  6  3  5  5  2  5  5
fus4   1  2  3  1  3  3  3  2  5  5  5  3  4  4  4  1  4  4  4  3
fus5   1  2  3  4  1  3  3  3  3  2  5  5  5  5  3  7  7  7  7  4

You won’t find any of the series other than fus2 in the EIS. There seems to be a pattern of repetition emerging in the higher-order series, but what is it exactly, and what does it mean?

Notes:

Note 1: No, that’s not really the title of the book. It’s Selected Writings on Computing: A Personal Perspective, Springer-Verlag, 1982. But my title gives a clearer impression of the contents. Dijkstra was a major figure in the early development of computer science, and by the time of his death in 2002 he was a senior statesman of the field. Nevertheless, his preferred role was always that of the small boy who tells the truth about the emperor’s nakedness. All of the material from the book, and much else as well, is now available online at the E. W. Dijkstra Archive at the University of Texas at Austin.

Note 2: I was looking for “How Do We Tell Truths that Might Hurt?,” a 1975 memo that serves as a fine exemplar of Dijkstra at his most vituperative. I was looking it up because the version published a few years later in SIGPLAN Notices is slightly bowdlerized, and I wanted to see what the fig leaves covered. The answer was not very titillating; the unmentionable word in Dijkstra’s text turned out to be “IBM.” Dijkstra’s original version is here; the SIGPLAN Notices redaction (fee or subscription needed) is here.

Note 3: Burstall did not take up the challenge.

Note 4: Please allow me some ranting and raving of my own. Am I the only one who chafes at this inside-out style of function definition? The statement fusc(2n+1) = fusc(n) + fusc(n+1) describes a relation, but it’s not an algorithm for calculating anything. When we are given “2n+1″ as a parameter of a function, we still have to figure out what n must be before we can start computing. This layer of obfuscation (to borrow a term) is an invitation to error and misinterpretation. The present instance is a notably treacherous case, since the sum mentioned in the definition, fusc(n) + fusc(n+1), happens to be close to the right answer, but not quite on the mark.

Note 5: For the benefit of the illisperate, here’s the same algorithm in pseudo-Pascal:

function fusc(n): integer;
begin
    if n=1 then fusc := 1;
    elseif even(n) then fusc := fusc(n/2);
    elseif odd(n) then fusc := fusc((n-1)/2) + fusc((n+1)/2)
end;

The (odd n) Lisp predicate and odd(n) Pascal predicate are not strictly needed, since any natural number that’s not even had better be odd; but I’ve included the predicates to make the logic of the procedure a little clearer.

Note 6: A few pages further along in the Rantings and Ravings we come to “More About the Function ‘fusc’ (A Sequel to EWD570),” where we learn that Dijkstra too eventually looked it up in the Encyclopedia of Integer Sequences, which in those days was a book.

Note 7: Don’t take my word for it. The proof is given by Neil Calkin and Herbert S. Wilf in “Recounting the Rationals,” 2000, American Mathematical Monthly 107:360–363.

Note 8: Stern, M. A. 1858. Ueber eine zahlentheoretische Funktion. Journal für die Reine und angewandte Mathematik, 55:193–220. Moritz Abraham Stern was Gauss’s first doctoral student, and eventually succeeded him in the chair of mathematics at Göttingen. In the opening paragraphs of the paper, Stern gives credit for the basic idea to Gotthold Eisenstein (a fact pointed out to me by Donald E. Knuth).

Note 9: Brocot was Achille Brocot, a French clockmaker, whose interest in this number-theoretical function was highly pragmatic: He used it to help calculate gear ratios.

Note 10: I tried both the evens and the odds series at the Encyclopedia of Integer Sequences. No matches.

Note 11: Question: Who was (is) Jacobsthal? The EIS listing offers no clue, and the three or four references I followed also fail to identify Jacobsthal or cite any works by an author of that name. On the other hand, I did come across the following tidbit in Eric W. Weisstein’s MathWorld:

Amazingly, when interpreted in binary, the Jacobsthal numbers J(n+2) give the nth iteration of applying the rule 28 cellular automaton to initial conditions consisting of a single black cell (E. W. Weisstein, Apr. 12, 2006).

Note 12: I realize I am straying perilously close to the sort of thing that will get me blacklisted by parental-control filters.

Note n: You’d think that footnotes would be a breeze in HTML. What could be more hypertexty? Would it were so.

Refrigeration by filtering

Friday, May 5th, 2006

It’s no secret that the way to win fame and fortune in physics is to invent a better refrigerator. Michael Faraday and James Prescott Joule and J. J. Thomson (Lord Kelvin) were all thinkers or tinkerers in refrigeration; the Kelvinator brand alludes to the last of those pioneers. Einstein and Leo Szilard held dozens of patents on refrigerator designs. And workers in cryogenics have won at least 10 Nobel prizes, starting with Heike Kammerlingh Onnes in 1913.

Here’s the latest cool idea on how to chill out: Let the hot atoms in a fluid escape through nanopores that block the lower-energy atoms. William J. Mullin of the University of Massachusetts in Amherst and Neal Kalechofsky of Oxford Instruments America Inc. suggests three ways this might work. The abstract of their recent paper:

We consider the possibility of adding a stage to a dilution refrigerator to provide additional cooling by “filtering out” hot atoms. Three methods are considered: 1) Effusion, where holes having diameters larger than a mean-free path allow atoms to pass through easily; 2) Particle waveguide-like motion using very narrow channels that greatly restrict the quantum states of the atoms in a channel. 3) Wall-limited diffusion through channels, in which the wall scattering is disordered so that local density equilibrium is established in a channel. We assume that channel dimensions are smaller than the mean-free path for atom-atom interactions. The particle waveguide and the wall-limited diffusion methods using channels on order of the deBroglie wavelength give cooling. Recent advances in nano-filters give this method some hope of being practical.

The plan sounds a little like Maxwell’s demon without the demon. The nanopores don’t have to be opened and shut for individual atoms; they discriminate naturally between various energy states of the atoms. Or at least that’s what Mullin’s and Kalechofsky’s calculations suggest; the crucial experiment has yet to be performed.

arXiv link: Theory of cooling by flow through narrow pores.

Update 2006-05-16: Oops. Lord Kelvin = William Thompson. Lord Kelvin ≠ J. J. Thompson.