Archive for February, 2010

Herbert R. J. Grosch, 1918-2010

Friday, February 26th, 2010

Today I learned of the death of Herb Grosch, proud provocateur and mischief-maker of the computing industry. Anybody who ever knew Herb, however slightly or briefly, has a story to tell, so here’s mine.

Grosch held senior positions at major household-name companies (IBM, General Electric) as well as in the U.S. government; he also spent time at MIT and Columbia and was editor of Computerworld. But through it all he cultivated the role of the outsider, or even the outcast; he saw himself as a lone wolf; at IBM they labeled him a “wild duck.” He boasted that he was the second scientist ever hired at IBM (after Wallace Eckert) and, more important, was the first employee with facial hair. Even when he was an insider he was an outsider. Grosch was elected president of the Association for Computing Machinery–but as a dissident candidate, opposing the slate annointed by the nominating committee.

The free-spirited maverick is a celebrated figure in American culture, but not always a well-rewarded one. Wild ducks don’t get tenure, or a pension.

My brief encounter with Herb came ten years ago, when I was working on an article (PDF) about the uses of ternary notation in computing. I read somewhere that Grosch had proposed a ternary architecture for the Whirlwind, Jay Forrester’s enormous early electronic computer. The idea of a base-three machine probably seems weirder today than it did in the early 1950s, but all the same the proposal was not adopted, and most histories of the Whirlwind project say little or nothing about the ternary option. I had no luck finding a copy of Grosch’s memo on the subject, so I tried getting in touch with him directly. With the help of friends I tracked him down, though not in the first place I looked. He was in Riga, Latvia.

Grosch couldn’t supply a copy of the memo; most of his papers, he said, were buried in a landfill in Switzerland. The best he could do was point me to a passage in his autobiography, summarizing a conversation with Forrester:

I said what might be genuinely gainful would be to store a ternary digit in each core, and calculate in base-three rather than binary fashion. There were materials–some kinds of permalloy, as I remember–that had north, south and neutral stable magnetic states. I told him I had taught my Poughkeepsie evening classes at IBM about a special kind of base-three arithmetic I called “signed ternary,” in which zero was in the middle of the number range. In this curious system there was no need for algebraic signs, no problem about the sign of zero, and you rounded perfectly by dropping digits.

Jay being a stiff type, I refrained from calling the ternary digits “tits,” a name which had been the source of much boyish amusement in the Poughkeepsie classes.

By the way, I’m still looking for a copy of that memo. Here’s the reference: Grosch, H. J. R. 1952. Signed ternary arithmetic. Memorandum M-1496, Digital Computer Laboratory, MIT.

Herb Grosch in Barcelona, circa 2001

Of course I had to ask Herb how and why he’d found his way to Latvia. It was a long story, involving a small NSF grant, archives of Datamation magazine, a collection of Soviet-era computer hardware in an attic at Latvia University, and a romance that Herb hoped would develop into his fifth marriage. But the grant ran out, the marriage plans faltered, and Herb was heading back to U.S. in dire need of employment. His hopes, he said, were “not much above fast-foodery: editing, or tech writing, or even clerking at Barnes and Noble.” He was 82 at the time.

In the end, he did not have to stoop quite so far as editing or tech writing, or Barnes and Noble; he became Adjunct Distinguished Professor of Computer Science at the University of Nevada Las Vegas. But that posting didn’t last long. A few years later he turned up in Toronto, teaching the history of computing, but by then I’d lost touch with him.

In a “Dear Everybody” message some years ago, Grosch wrote:

I begin this letter on the fourth of six recent Binary Days, 01.11.01…. The next Binary Day will be nine years from now, 01.01.10, and I will write again then if I survive. Shorter recipient list, I assume!

Herb did make it until 01.01.10. He died January 25th. For more on his life and works, see the ACM obituary and a web site at Columbia maintained by Frank da Cruz.

The teetotaler’s walk

Monday, February 22nd, 2010

Writing about Pólya’s recurrence theorem led me to pick up Gerry Alexanderson’s book The Random Walks of George Pólya. He tells a sweet anecdote about the origin of the theorem.

The random walk is sometimes called the drunkard’s walk, but Pólya’s musings on the subject were not inspired by a disorienting pub crawl. On the contrary, the idea came to him while he was living at the Kurhaus Zürichberg, which Alexanderson notes was founded as a temperance hotel. (It seems to be a somewhat different establishment now, and probably beyond the means of a Privatdozent.)

Alexanderson describes the incident that led Pólya to the recurrence theorem:

There was a particular wooded area near the hotel where he enjoyed taking extended walks while thinking about mathematics. He would carry pencil and paper so he could jot down ideas as they came to him. Some students also lived at the Kurhaus and Pólya got to know some of them. One day while out on his walk he encountered one of these students strolling with his fiancée. Somewhat later their paths crossed again and even later he encountered them once again. He was embarrassed and worried that the couple would conclude that he was somehow arranging these encounters. This caused him to wonder how likely it was that walking randomly through paths in the woods, one would encounter others similarly engaged. This led to one of his most famous discoveries, his 1921 paper on random walk, a phrase used for the first time by Pólya.

It’s interesting that the question raised by those embarrassing woodland encounters differs from the usual statement of the recurrence theorem, which talks about a single walker’s return to the point of origin. The answer is the same, however. In an appendix to the Alexanderson book, K. L. Chung mentions an easy proof of this equivalence: Suppose two walkers begin together at A and run into each other again at B. Nothing about the statistics of a random walk changes if you reverse the direction of all the steps. Thus the probability of the rencontre at B is the same as that of one walker going from A to B and then back to A again.

The illustration below shows the paths of two random walkers that meet three times (green circles) after leaving the origin, supporting the plausibility of Pólya’s claim that he was not stalking those young lovers in the woods.

rencontres-1.png

(What’s not so plausible is that any non-inebriated walker would follow a truly random path.)

Gruenberger’s prime path

Tuesday, February 16th, 2010

Fred Gruenberger may well have been the first blogger on computational topics. When he was writing, back in the 1970s, there was no RSS, and so he distributed his musings in a monthly newsletter called Popular Computing. A typical issue was 16 or 20 typewritten pages–stapled, folded, stamped and delivered by mail. It was always worth reading.

Gruenberger had been working and playing with computers since the 1940s. For a long stretch he was at the RAND Corporation, the famous think tank in Santa Monica. Later he taught at Cal State Northridge. In addition to Popular Computing he was involved in the startup of Datamation magazine and published at least a dozen books. I haven’t been able to learn much about his later years; he died in 1998.

A slogan that appeared in some issues of Popular Computing proclaimed: “The way to learn computing is to compute.” I took this advice to heart, although I was hampered by a total lack of hardware. Later on I acquired a programmable calculator, which helped on some of the problems and exercises.

Problem 149, from Popular Computing Vol. 4 No. 12, December 1976

The problem reproduced above appeared in the December 1976 issue of Popular Computing (Vol. 4, No. 12). At the time, I made no attempt to work this one out, but evidently the problem seemed interesting enough to be worth filing away. When I came upon the old clipping recently, I gave it a closer look and realized I have no idea how to answer Gruenberger’s question, though the impediment now is not lack of hardware.

Gruenberger asks us to trace a planar path whose steps are indexed by the odd integers starting at 3. For each number N we turn right 90 degrees before taking a step if N is a prime congruent to 1 mod 6; we turn left 90 degrees before moving one unit if N is a prime congruent to −1 mod 6; otherwise we continue straight ahead in whatever direction we happen to be facing.

In his typewriter graphics, Gruenberger plotted the trajectory from N=3 through 97. Below I continue the path through N=199.

trail201b.png

But something’s amiss here. Gruenberger wrote:

Eventually the path will cross itself, so that the cell containing 111 will also contain 147. Similarly, one cell will contain both 91 and 179.

Those two self-intersections are nowhere to be found in the diagram. When I first noticed this discrepancy, I assumed I must have made a mistake somewhere. (This eagerness to blame myself is not mere knee-jerk humility; I have years of experience to back it up.) Eventually, though, I concluded that it was Gruenberger who had made the wrong turn. I believe he mistakenly went left at 127, as shown in the brown trail below:

trail201b-error.png

The brown continuation of the red path includes the two coincidences mentioned in Gruenberger’s problem statement. But the left turn at N=127 is incorrect, because 127 is a prime equal to (6×21)+1, and thus it should specify a right turn. The error is of no great consequence, but it does reveal something interesting: Gruenberger must have been plotting these paths by hand. Most likely he wrote a program to compute the series of residue classes, then traced out the trajectory on squared paper.

Setting aside this anomaly, Gruenberger was quite right that the path does intersect itself. Here’s the trail continued through N=1,001:

trail1001b.png

And if that’s not tangled enough, here’s what it looks like at N=10,001:

trail10001c.png

Gruenberger asks for “a list of the contents of those cells containing more than one number, arranged in the order of the smallest number in the cell.” It’s not hard to identify some cells that belong on such a list. The table below includes all multiply-occupied cells discovered when tracing the path up to N=1,001, sorted as Gruenberger requests:

                   x    y    values of N
                 -11   28    (137 337)
                 -15   27    (147 683)
                 -16   27    (149 349 685)
                 -18   26    (155 355)
                 -19   27    (159 691)
                 -19   28    (161 693)
                 -19   29    (163 695)
                 -17   31    (171 319)
                 -18   32    (175 315)
                 -19   32    (177 701)
                 -20   32    (179 703)
                 -22   31    (185 769)
                 -23   31    (187 771)
                 -24   31    (189 773)
                 -30   41    (245 269)
                 -30   42    (247 271)
                 -27   40    (281 733)
                 -26   40    (283 735)
                 -26   37    (289 725)
                 -23   35    (299 715)
                 -22   35    (301 761)
                 -21   35    (303 759)
                 -20   35    (305 757)
                 -17   27    (351 687)
                 -18   27    (353 689)
                 -17   24    (361 673)
                 -16   24    (363 675)
                 -15   24    (365 677)
                 -17   21    (379 667)
                 -17   22    (381 669)
                 -17   23    (383 671)
                 -20   22    (391 631)
                 -20   21    (393 633)
                 -20   20    (395 635)
                 -20   19    (397 637)
                 -22   19    (401 593)
                 -22   18    (403 591)
                 -22   17    (405 589)
                 -22   16    (407 587)
                 -27   15    (419 575)
                 -27   14    (421 573)
                 -28   14    (423 819)
                 -29   14    (425 549)
                 -32   14    (431 539)
                 -32   13    (433 537)
                 -26   10    (563 831)
                 -26   11    (565 829)
                 -27   13    (571 823)
                 -28   18    (607 811)
                 -22   32    (707 767)
                   4   -6    (923 971)
                   4   -7    (925 969)
                   4   -8    (927 967)
                   4   -9    (929 989)
                   5   -9    (931 991)

Is this list the answer to Gruenberger’s question? No, it’s not, because there’s no reason to stop at an arbitrary limit such as N=1,001. Indeed, the list above is not even a prefix of the complete answer. The smallest value of N appearing in the list is 137, but the trail will eventually revisit cells occupied by smaller values of N. For example, continuing the experiment to N=10,001 reveals a bunch of intersections quite close to the beginning of the path, including a site that’s visited five times:

                   x    y    values of N
                   1    0    (5 1621)
                   1    1    (7 1623)
                   2    1    (9 4725)
                   3    1    (11 1263)
                   3    2    (13 1265)
                   5    3    (19 1635)
                   6    3    (21 1637)
                   7    4    (25 7537)
                   7    5    (27 7319 7539)
                   7    6    (29 7505 7541)
                   6    6    (31 1643 7323 7503 7543)
                   6    7    (33 1645 7325)
                   6    8    (35 1647 7327)
                   6    9    (37 1649 7329)

One point still missing from this list is the origin–the site at x=0, y=0, N=3. Does the path ever revisit its starting point? If so, at what value (or values) of N does it come back home? Since I don’t know the answer to this question, I guess I’ll have to leave it as an exercise for the reader.

I suspect that the problem Gruenberger meant to pose (or thought he was posing) was to generate a list of self-intersection sites arranged in their natural order of occurrence–that is, the order in which the crossings are created when you construct the path starting from the origin. This natural-order list is not at all the same as a list “arranged in the order of the smallest number in the cell.” The natural-order list is easy to generate step by step. All you need to do is obey the left/right/straight rules, plot the resulting sequence of positions on the xy lattice, and leave behind a trail of breadcrumbs so you can check at each step to see if the site has been visited before. This task is a matter of straightforward computation–just the kind of assignment that Gruenberger favored. The natural-order list begins:

                   x    y    values of N
                 -30   41    (269 245)
                 -30   42    (271 247)
                 -18   32    (315 175)
                 -17   31    (319 171)
                 -11   28    (337 137)
                 -16   27    (349 149)
                 -18   26    (355 155)
                 -32   13    (537 433)
                 -32   14    (539 431)
                 -29   14    (549 425)
                 -27   14    (573 421)

Thus the prime path first crosses itself when N=269, a value that shares the same coordinates as N=245, namely x=−30, y=41. There are 56 such crossings up to N=1,001, and 112,988 self-intersections up to N=106.

*     *     *

There is a wilder, conjectural answer to Gruenberger’s challenge–which I’m pretty sure he did not have in mind. It goes like this: Maybe the complete list of revisited values of N is simply the list of all N. In other words, maybe the Gruenberger prime path fills up the entire lattice of integers, crossing over itself everywhere many times.

In 1921 George Pólya published a celebrated proof that a random walk on the lattice of integers is recurrent in one or two dimensions, though not in higher dimensions. Recurrent means that the walk returns to each point along its length with probability 1, and indeed visits every point in its domain infinitely often. Is it possible that the prime path is also recurrent?

Pólya’s theorem is one of those mind-expanding results that seem impossible on first acquaintance, and then inevitable, and finally just so amazing that you want to go kiss a mathematician. I have to confess that I’ve never gotten all the way through Pólya’s original paper (it’s not long, but it’s in German). On the other hand, I can highly recommend a little book by Peter Doyle and Laurie Snell, Random Walks and Electric Networks, which gives several alternative proofs of the theorem; it was published in the MAA’s Carus Monograph series, and there’s a postprint available on the arXiv.

The key insight underlying Pólya’s result, as I understand it, is this: If you never revisit a former home, then you must be spending eternity somewhere else, and you can do that only if your universe has enough somewhere elses that you’ll never run out of new territories to visit. Suppose that, some eons after starting your journey, you find yourself at distance r from the origin. If you’re living in a one-dimensional universe, then there are just two places you could be at that moment, namely at +r or −r. It doesn’t matter how far you run; there are still just two points at any given distance from the origin. In two dimensions, a fugitive at distance r has a little more room to maneuver; the number of available points grows in proportion to r, forming a circle of radius r. But this is still not enough room to get lost in. Only in three dimensions or more is there a nonzero probability of escape. In three dimensions, the space available at radius r is proportional to r2. In this three-dimensional world, the volume of empty space grows faster than a random walker’s expected distance from home.

What does all this have to do with Gruenberger’s prime path? Well, it’s no secret that the distribution of prime numbers looks convincingly random–if you look at it in just the right way. And in particular the distribution of primes in various residue classes, such as 6K+1 and 6K−1, seems to behave at least approximately like a random variable. All this suggests we might consider viewing the Gruenberger prime trail as if it were a random walk through the two-dimensional lattice of integers. Because the space is two-dimensional, it’s a good guess that the walk should be recurrent.

The original recurrence results of Pólya refer to a simple random walk, where at each step the walker chooses randomly among the available directions and then moves one unit in that direction. For example, in the two-dimensional lattice of integers there are four possible directions: north, south, east, west. The simple random walk is not the best model of the Gruenberger process, which is more like a nonreversing random walk–a path where on each step the walker can turn left or turn right or go straight ahead but can never make a 180-degree about-face. We can further refine the random-walk model of the Gruenberger process by biasing the choice made at each step to reflect the changing abundance of prime numbers. Primes grow scarcer as their magnitude increases; in the vicinity of a given value of N, the probability that a randomly chosen number is prime is approximately 1/log N. Since the Gruenberger path goes straight for all composite numbers and turns only when N is prime, the trail will have longer and longer straight segments, and rarer turns, as N increases. A random walk can mimic this behavior by choosing an action at each step according to this logic:

    if random(1.0) > 2/log(N)
       then go straight
       elseif randomboolean()
           then turn left
           else turn right

(The proportion of primes is given as 2/log(N) rather than 1/log(N) because the Gruenberger process is defined on odd numbers only, which immediately eliminates half of the composites.)

One way to compare the various kinds of random walks is to measure the root-mean-square displacement–the distance from the origin to the final position of the walker, averaged over many realizations of the random process. For a simple random walk, the RMS displacement for an N-step walk converges to \(\sqrt{N}\); for the nonreversing random walk the average displacement is \(\sqrt{2N}\). The biased random walk based on the distribution of primes also appears to yield an RMS distance proportional to the square root of the number of steps; numerically the curve looks something like \(\sqrt{10N}\). I’m not entirely sure that’s the true form of the curve, but the geometric details don’t really make much difference. If I understand correctly, all three of these random processes should be recurrent in the sense of Pólya.

rms-graph5.png

Does the same reasoning apply to the Gruenberger prime path? There are two sides to this question.

The naysayer points out that Pólya’s theorem applies to random walks, but there’s nothing truly random about the sequence of primes. After all, we have a straightforward, deterministic algorithm for generating primes, as well as an efficient algorithm for testing whether any given integer is prime or composite. The essence of a random process is that every time you run it you get a different result, but there’s only one sequence of prime numbers, and so the Gruenberger prime path will come out exactly the same every time. According to this view of things, the kind of probabilistic reasoning that goes into the proof of Pólya’s recurrence theorem is out of bounds here. For randomness to make any sense, you need to average over some ensemble of independent instances. For example, you could average over the 50 salmon-pink paths in the graph below, which represent 50 independent realizations of a biased random walk; you can’t average over the prime path itself (green), because there’s only that one path.

random-prime-paths-51.png

The yeasayer retorts that a single path is all you need–if the path is infinitely long. Indeed, the salmon-colored trails above could be interpreted not as 50 distinct runs of a random process but as 50 segments of a single long path, which repeatedly loops around through the point at x=0, y=0, wanders off in various directions, and then comes back home yet again. In essence, everything that could possibly happen in an infinite set of random paths happens somewhere within a single infinite path; all possible variety is already present there.

I’m not sure how to settle this dispute between Dr. Yea and Professor Nay. When an argument hinges on the nature of randomness, the meaning of infinity and patterns in the distribution of the primes, I known I’m in over my head.

So I’ll leave that deep question unresolved and say a final word about a lesser curiosity. In the Gruenberger process, we’re using the congruence classes of prime numbers mod 6 as a kind of coin flip to decide which way to turn. Is it a fair coin flip? For small values of N, it certainly doesn’t look fair:

                               6x+1      6x−1
       primes <     100         11        12
       primes <    1000         80        86
       primes <   10000        611       616
       primes <  100000       4784      4806
       primes < 1000000      39231     39265

There’s a persistent excess of −1 primes, and the imbalance seems to be getting steadily larger. As a result, the prime path has a “winding number” that reaches 8.5 at N=106; that is, the path makes eight and a half net counterclockwise revolutions. Does the windup continue with still larger N? I gather that the definitive answer is “Yes and No.” For more see the masterful paper by Andrew Granville and Greg Martin cited above.

[Correction 2010-02-19: reflected the accent on Pólya.]

Yet another spam update

Sunday, February 7th, 2010

spam-2008-02-to-2010-01.png

It seems the great spam tide of 2009 is ebbing. The graph records numbers of spam messages per month received by my email accounts; obviously this is just a personal tally and your milage may vary. The percentage of Russian-language spam in my inbox has fallen off somewhat, from more than half last fall to less than 40 percent in January. (For context see my earlier bit-player spam reports: Aug 2009, May 2009, Mar 2009, Nov 2008, Oct 2008, Jun 2008. Also two American Scientist columns: Jul-Aug 2007 and May-Jun 2003.)

I would also like to draw attention to a recent article on the economics of spam: “Spamalytics: An Empirical Analysis of Spam Marketing Conversion,” by Chris Kanich, Christian Kreibich, Kirill Levchenko, Brandon Enright, Geoffrey M. Voelker, Vern Paxson and Stefan Savage, Communications of the ACM 52(9):99-107 (available online if you or your library is a subscriber). (Preprint link, thanks to Arvind Narayanan in the comments.) Kanich et al. adopt a clever and mildly controversial strategy: They parasitize an existing spam botnet and alter outgoing emails so that embedded links point back to the investigators’ own web servers, rather than those of the spammers. They write:

Using this methodology, we have documented three spam campaigns comprising over 469 million emails. We identified how much of this spam is successfully delivered, how much is filtered by popular antispam solutions, and, most importantly, how many users “click-through” to the site being advertised (response rate) and how many of those progress to a “sale” or “infection” (conversion rate).

Of the 469 million messages, 347 million were part of a campaign advertising pharmaceuticals. The bottom line: 28 “sales” (no money actually changed hands, and no products were delivered) with an average purchase price of about $100. This is a conversion rate of less than 1 in 10 million, which leaves some doubt about the profitability of the operation. Kanich et al. note that spam-for-hire services would charge roughly $25,000 to send 350 million emails, an order of magnitude more than the revenue generated in this campaign. Yet the mere continued existence of spam argues that the actual costs must be much lower. Kanich et al. speculate that the spammers are a vertically-integrated enterprise: They own both the botnet and the pharmacy.

Finally, I want to update my comment on comment spam. Three months ago I installed the Akismet filter to intercept spam comments submitted to bit-player. The filter has been performing fairly well, blocking about 200 spam comments so far, letting a dozen or so slip through, and falsely imprisoning a couple of legitimate comments. (Advice to commenters: Avoid ALL CAPS.)

Looking through the archive of impounded messages, I am more perplexed than ever about the social and economic basis of this phenomenon. The URLs that constitute the payload of the spam point to a weird miscellany of topics. Someone out there thinks that the readers of this blog are interested in hypnotism, in bathroom faucets and shower enclosures, in garage doors and currency conversion. Most of all we are a musical bunch: Fully a third of the spam comments promote merchandise related to pianos.

Unlike email spam, the comment spams are not generated by an automaton; there are living human beings at the other end of this communication channel. And apparently the spam writers are not hired merely for high-speed, wholesale captcha-solving. In many cases there is ample evidence that the commenter has read the article and understood it, and may even have something interesting to say about it.

It occurs to me that this circumstance gives me an opportunity to open a dialogue. So I now address myself directly to the comment spammers: If you are reading this post because you’ve been hired to plant comments and URLs here, or if you’re doing it for your own commercial gain, I have a proposition for you. Please leave a comment attached to this post, explaining what you’re doing and why–and, if possible, for whom. Tell us what a successfully planted link is worth, and how much you get out of it. If your comment is sufficiently interesting and informative, and if it seems genuine, I’ll let it through the Akismet filter, and you’ll have your chance to sell my readers a piano or a garage door. (This offer applies only to comments attached to this article, and I’ll be the sole judge of what’s interesting, informative and genuine.)

If you’d rather communicate privately, send me an email at brian@bit-player.org.

A molecular millisecond

Saturday, February 6th, 2010

It was not quite a century ago that we got our first glimpse of molecules. William Lawrence Bragg, with a little help from his dad, figured out how to get molecules to sit still long enough for a portrait. First you had to crystallize the substance, then shoot x-rays through the carefully mounted crystal, then record the lace-doily pattern of diffracted rays on photographic film.

After all that lab work came the really hard part: analyzing the pattern of bright dots in order to reconstruct the positions of atoms in three-dimensional space. This was a difficult inverse problem, something like deducing the shape of a musical instrument from the sounds it emits. (The EDSAC, the first working stored-program computer, was put to work deciphering x-ray diffraction patterns circa 1950.)

Finally it came time to build a model of the molecule–and in those days a model occupied physical rather than virtual reality. In the 1970s I visited Max Perutz, who had by then spent more than 30 years working out the structure of the hemoglobin molecule. His offices were cluttered with modeling artifacts: stacked sheets of transparent plastic, marked with hand-drawn contour maps of electron density, lumpy clay and plaster extrusions showing the overall form of the protein at low resolution, and the now-familiar tinker-toy assemblies of balls and sticks.

It was hard-won knowledge, and I thought it heroic science. I still do. And yet everyone knew all along–Perutz more emphatically than anyone else–that those rigid, static models of proteins were highly misleading. In the living cell, biological macromolecules do not sit immobile like bronze statues. They are machines with moving parts; they continually flex and wiggle, mesh and then disengage, spin, flap, bend, stretch; all day long they do a hyperkinetic hokey-pokey.

I have now seen a remarkable performance of that molecular dance. In a talk at Harvard earlier this week David E. Shaw showed two videos, each portraying about a millisecond in the life of a single protein molecule. A millisecond may not sound like much, but the video was created by computing atomic motions at roughly one step per femtosecond. That’s 1012 steps in all. (If you included all the steps in the video, and displayed them at 60 frames per second, the show would go on for 500 years.)

Shaw was once a computer scientist at Columbia, then he went off to make some billions on Wall Street. (He was introduced to the Harvard audience as “King Quant.”) He has now turned to computational molecular biology, setting up his own lab and building a series of special-purpose computers designed for molecular-dynamics simulations. The machines are called Anton, in honor of Leeuwenhoek. Shaw’s group has built eight of them so far, each with 512 processors. A kiloprocessor model is expected to come on line in a few weeks.

The basic idea behind the computations is simple. Start with the initial positions and velocities of all the atoms. Calculate the force that each atom exerts on every other atom, and the resulting acceleration. Wash, rinse, repeat. For a system of N atoms, the naive version of this algorithm has performance proportional to N2; this quadratic growth is a bit of a problem, because the model includes not only several hundred atoms in the protein itself but also up to 50,000 atoms in the surrounding solvent. So Anton takes some shortcuts. The big one is to do a full accounting of pairwise interactions only for atoms within a limited radius; the distribution of more-distant atoms is remapped to a mesh of discrete points. But even after this winnowing of the problem, the calculation of pairwise forces remains the principal bottleneck. It is solved by throwing hardware at it: 32 × 512 parallel pipelines implemented on custom silicon. There’s more on Anton’s architecture and algorithms here; the Shaw Research web site lists lots of other publications as well, but most of them are not accessible without payment.

As far as I can tell, the videos of proteins in motion are not yet available anywhere, and that’s really too bad. They might well be the next dance sensation on YouTube. Watching them in the lecture hall, I was so bedazzled that I neglected to note the identity of the molecules. One was an ion channel, a protein that spans the width of a membrane and controls the passage of some specific ion (potassium, I think, in this case). We watched the six polypeptide strands twisting closed like the blades of a camera iris, shutting off the channel. Another simulation showed an even more dramatic reconfiguration. For many microseconds of biological time, and perhaps half a minute of wall-clock time, the protein sat nervously quivering and fidgeting, hunched up in a compact globule, with occasional minor adjustments to various loops and corners. And then suddenly the whole molecule opened up like a flower blooming; a moment later it closed again. If I understand correctly what Shaw was telling us, the existence of this alternative state had been known from experimental evidence, but the transformation had never been seen before. And, as he remarked early in the talk, “seeing what it looks like” brings a level of understanding that would be hard to achieve by more analytic methods.

Which brings me to my one gripe. The truth is, we still don’t know what a protein really looks like, and we never will, because “looking” is not a well-defined notion for objects smaller than the wavelength of light. Color, for example, is just not meaningful in this realm, and surface texture is also problematic. Thus schemes for depicting molecules are necessarily a matter of convention. It’s worth giving some careful thought to those conventions, choosing graphic forms that convey as much as possible about what we do know without inviting spurious inferences about what we don’t know (such as color and texture).

Some of Shaw’s illustrations use a ribbon-and-sheet scheme invented 30 years ago by Jane Richardson, which still seems to work well for showing the overall architecture of a protein. But other diagrams and videos use a ball-and-stick model to represent atomic detail, and this strikes me as a less-happy choice. Watching that jiggling assembly of balls and sticks (black for carbon, red for oxygen, etc.), I kept seeing a shiny, brittle, plastic model of a protein rather than the protein itself. Surely there are better graphic devices.

Update: Thanks to Ron Dror of D. E. Shaw Research for pointing out an error in my description of the Anton algorithm: Distant charges are not mapped to a continuum distribution but to a mesh of discrete points. (I’ve made a correction above.) Ron also notes that an article on Anton in Communications of the ACM is available in the CACM digital edition.