A molecular millisecond

6 February 2010

It was not quite a century ago that we got our first glimpse of molecules. William Lawrence Bragg, with a little help from his dad, figured out how to get molecules to sit still long enough for a portrait. First you had to crystallize the substance, then shoot x-rays through the carefully mounted crystal, then record the lace-doily pattern of diffracted rays on photographic film.

After all that lab work came the really hard part: analyzing the pattern of bright dots in order to reconstruct the positions of atoms in three-dimensional space. This was a difficult inverse problem, something like deducing the shape of a musical instrument from the sounds it emits. (The EDSAC, the first working stored-program computer, was put to work deciphering x-ray diffraction patterns circa 1950.)

Finally it came time to build a model of the molecule–and in those days a model occupied physical rather than virtual reality. In the 1970s I visited Max Perutz, who had by then spent more than 30 years working out the structure of the hemoglobin molecule. His offices were cluttered with modeling artifacts: stacked sheets of transparent plastic, marked with hand-drawn contour maps of electron density, lumpy clay and plaster extrusions showing the overall form of the protein at low resolution, and the now-familiar tinker-toy assemblies of balls and sticks.

It was hard-won knowledge, and I thought it heroic science. I still do. And yet everyone knew all along–Perutz more emphatically than anyone else–that those rigid, static models of proteins were highly misleading. In the living cell, biological macromolecules do not sit immobile like bronze statues. They are machines with moving parts; they continually flex and wiggle, mesh and then disengage, spin, flap, bend, stretch; all day long they do a hyperkinetic hokey-pokey.

I have now seen a remarkable performance of that molecular dance. In a talk at Harvard earlier this week David E. Shaw showed two videos, each portraying about a millisecond in the life of a single protein molecule. A millisecond may not sound like much, but the video was created by computing atomic motions at roughly one step per femtosecond. That’s 1012 steps in all. (If you included all the steps in the video, and displayed them at 60 frames per second, the show would go on for 500 years.)

Shaw was once a computer scientist at Columbia, then he went off to make some billions on Wall Street. (He was introduced to the Harvard audience as “King Quant.”) He has now turned to computational molecular biology, setting up his own lab and building a series of special-purpose computers designed for molecular-dynamics simulations. The machines are called Anton, in honor of Leeuwenhoek. Shaw’s group has built eight of them so far, each with 512 processors. A kiloprocessor model is expected to come on line in a few weeks.

The basic idea behind the computations is simple. Start with the initial positions and velocities of all the atoms. Calculate the force that each atom exerts on every other atom, and the resulting acceleration. Wash, rinse, repeat. For a system of N atoms, the naive version of this algorithm has performance proportional to N2; this quadratic growth is a bit of a problem, because the model includes not only several hundred atoms in the protein itself but also up to 50,000 atoms in the surrounding solvent. So Anton takes some shortcuts. The big one is to do a full accounting of pairwise interactions only for atoms within a limited radius; the distribution of more-distant atoms is remapped to a mesh of discrete points. But even after this winnowing of the problem, the calculation of pairwise forces remains the principal bottleneck. It is solved by throwing hardware at it: 32 × 512 parallel pipelines implemented on custom silicon. There’s more on Anton’s architecture and algorithms here; the Shaw Research web site lists lots of other publications as well, but most of them are not accessible without payment.

As far as I can tell, the videos of proteins in motion are not yet available anywhere, and that’s really too bad. They might well be the next dance sensation on YouTube. Watching them in the lecture hall, I was so bedazzled that I neglected to note the identity of the molecules. One was an ion channel, a protein that spans the width of a membrane and controls the passage of some specific ion (potassium, I think, in this case). We watched the six polypeptide strands twisting closed like the blades of a camera iris, shutting off the channel. Another simulation showed an even more dramatic reconfiguration. For many microseconds of biological time, and perhaps half a minute of wall-clock time, the protein sat nervously quivering and fidgeting, hunched up in a compact globule, with occasional minor adjustments to various loops and corners. And then suddenly the whole molecule opened up like a flower blooming; a moment later it closed again. If I understand correctly what Shaw was telling us, the existence of this alternative state had been known from experimental evidence, but the transformation had never been seen before. And, as he remarked early in the talk, “seeing what it looks like” brings a level of understanding that would be hard to achieve by more analytic methods.

Which brings me to my one gripe. The truth is, we still don’t know what a protein really looks like, and we never will, because “looking” is not a well-defined notion for objects smaller than the wavelength of light. Color, for example, is just not meaningful in this realm, and surface texture is also problematic. Thus schemes for depicting molecules are necessarily a matter of convention. It’s worth giving some careful thought to those conventions, choosing graphic forms that convey as much as possible about what we do know without inviting spurious inferences about what we don’t know (such as color and texture).

Some of Shaw’s illustrations use a ribbon-and-sheet scheme invented 30 years ago by Jane Richardson, which still seems to work well for showing the overall architecture of a protein. But other diagrams and videos use a ball-and-stick model to represent atomic detail, and this strikes me as a less-happy choice. Watching that jiggling assembly of balls and sticks (black for carbon, red for oxygen, etc.), I kept seeing a shiny, brittle, plastic model of a protein rather than the protein itself. Surely there are better graphic devices.

Update: Thanks to Ron Dror of D. E. Shaw Research for pointing out an error in my description of the Anton algorithm: Distant charges are not mapped to a continuum distribution but to a mesh of discrete points. (I’ve made a correction above.) Ron also notes that an article on Anton in Communications of the ACM is available in the CACM digital edition.

17 x 17: A nonprogress report

22 December 2009

The question again: Is there a four-coloring of the 17-by-17 grid in which none of the 18,496 rectangles have the same color at all four corners? As I said last time, Bill Gasarch would not have put a bounty on this problem if it had an easy solution. Over the past couple of weeks I’ve invested some 1014 CPU cycles in the search, and a few neural cycles too. I have nothing to show for the effort, except maybe a slightly clearer intuition about the nature of the problem.

If you generate a bunch of four-colored n-by-n grids at random, the average number of monochromatic rectangles per grid increases quite smoothly with n:

random-grid-stats.png

This gradual progression might lead you to suspect that the difficulty of finding or producing an n-by-n grid that is totally devoid of monochrome rectangles would also be a smooth function of n. The truth is quite different.

success-graph.png

Finding a solution is easy for square grids of any size up to 15 by 15. The task suddenly becomes very hard at size 16 by 16. As for 17 by 17, it’s much harder still–and indeed is not yet known not to be impossible. (Details on the data behind this graph: For each size class from n=8 to n=17 I started with 1,000 randomly four-colored n-by-n grids. Then I applied a simple heuristic search (the first of the algorithms listed below) to each grid, running the program for 1,000 × n2 steps. The graph records the number of times this procedure succeeded–i.e., produced a grid with no monochrome rectangles–at each grid size. Up to n=14, the search never failed; at n=16 and beyond, it never succeeded.)

This kind of sudden transition from easy to hard is a familiar feature in the realm of constraint satisfaction. Well-known intractable problems such as graph coloring and boolean satisfiability have the same structure. That doesn’t bode well for any of the simple-minded computational methods I’ve tried. Here’s a brief catalog of my failures. These are algorithms that I’m pretty sure are never going to pay off.

  • Biased random walk. Start with a randomly colored grid. Repeatedly choose a site at random, then try changing its color; accept the move if it reduces the overall number of monochrome rectangles. This is the simplest of all the algorithms. None of the more-elaborate schemes is decisively better.
  • Whack-a-mole. Find all the monochrome rectangles in the grid; choose one of them and alter the color of one corner, thereby eliminating the rectangle. In the simplest version of this algorithm, you choose the rectangle and the corner and the new color at random; in more sophisticated versions, you might evaluate the alternatives and take the one that offers the greatest benefit.
  • Steepest descent. Examine all possible moves (for the 17-by-17 grid there are 867 of them) and choose one that minimizes the rectangle count.
  • Lookahead steepest descent. Examine all possible moves, and then all possible sequels to each such move (for the 17-by-17 grid there are 751,689 two-move sequences); choose a sequence that minimizes the rectangle count. In principle this method could be extended to chains of three or more moves, but the cost soon gets out of hand. (The lookahead technique is the mirror image of backtrack search; it explores the tree of possible moves breadth-first instead of depth-first.)
  • Color-balanced search. Allow only moves that maintain the overall balance of colors in the grid. For example, in a 16-by-16 four-colored grid, color balance implies 16 sites in each color. One way to maintain balance is to make moves that swap the colors of two sites. (There is no reason to think that a rectangle-free grid will have exact color balance; on the other hand, a solution for a large grid cannot depart too far from perfect balance. Thus a color-balanced search might be an effective trick for finding a neighborhood where solutions are more common.)
  • Row-and-column-balanced search. Allow only moves that maintain the balance of colors within each row and each column of the grid. In a 16-by-16 grid, each row and column should have four sites in each of four colors. A simple way to maintain this detailed color balance is to search for “harlequin rectangles” with the color pattern \(\begin{array}{cc}a & b\\b & a\end{array}\) and permute them to \(\begin{array}{cc}b & a\\a & b\end{array}\).

Most of these techniques are greedy: At each stage the algorithm chooses an action that maximizes some measure of progress. On hard instances, a pure greedy strategy almost always fails; the search gets stuck in some local optimum. Thus it’s usually best to temper the greediness to some extent, occasionally choosing a move other than the one that yields the best immediate return. (The family of methods known as simulated annealing are more elaborate variations on this idea, based on insights from thermal physics.)

greediness-traces.png

Here we see traces of three runs of the algorithm identified above as steepest descent, with differing values of a greediness parameter m. (The grid size is 15 × 15.) At m=0 (no greediness at all), all moves are equally likely to be chosen, and the algorithm executes a random walk on the space of grid colorings. At m=1 (maximum greediness), the program always chooses the highest-ranked move, which works well until the system stumbles into a state where no move can reduce the rectangle count. A value of m=0.3 seems to be a good compromise. (I’ll say a little more below on the greediness parameter; indeed, I have a question about how best to define and implement it.)

After all this fussing with a dozen variations on local-search algorithms, I’m afraid the outlook for success is not promising. With a little patience and some tuning of parameters, any one of these algorithms can solve grids up to 15 × 15. With a lot more patience and tuning, they’ll eventually yield answers for 16 × 16. But none of the algorithms come even close to cracking the 17 × 17 barrier. Solving that one is going to require a fundamentally new idea. Perhaps someone will find an analytic approach to constructing a solution, rather than blindly searching for one. Or perhaps someone will prove that no solution exists.

On the computational front, I suspect the best hope is a family of algorithms known in various contexts as belief propagation, survey propagation and the cavity method. I’ve been hoping that friends who are expert in these techniques might swoop in and solve the problem for me, but if not I may have to give it a try myself.

In the meantime, here’s the thing about greediness (an apt subject for this time of year?). We want to define a function greedy whose arguments are a vector of alternative moves ranked from best to worst and a number m such that 0 ≤ m ≤ 1. If the greediness parameter m is 0, the function returns a random element of the vector. If m = 1, the returned value is always the first (highest-ranked) move. Otherwise, we must somehow interpolate between these behaviors. One attractive notion is to return the first element of the vector with probability m, the second choice with probability m(1 − m), and so on. Thus for m = 1/2 the series of probabilities would begin 1/2, 1/4, 1/8…. For m = 1/3 the first few values are 1/3, 2/9, 4/27….

This scheme works just fine for a vector of infinite length, but there’s a problem with shorter vectors. Consider what happens with the procedure call greedy(v=[1, 2, 3, 4], m=0.5). We have the following table of probabilities:

     1 --> 1/2
     2 --> 1/4
     3 --> 1/8
     4 --> 1/16

But on adding up those values, we come up 1/16th short of 1. What happens to the missing probability? I took an easy way out, distributing the “extra” probability equally over all the elements of the vector. The code looks something like this:

function greedy(v, m)
  for i=0 to length(v)
     if (i==length(v))
        return v[random(length(v))]
     elseif (random(1.0) < m)
        return v[i]

This procedure seems to give sensible results, but I wonder if there might be a better or more natural definition of greedy probabilities. Also, the running time for my code is logarithmic in the length of the vector (assuming m < 1). Is there a constant-time algorithm that gives the same results? (We don’t know the length of the vector in advance, so merely precomputing the table of probabilities is not an option.)

The 17×17 challenge

5 December 2009

William Gasarch is not the Clay Mathematics Institute. He isn’t paying a million bucks for proofs of famous conjectures. But Gasarch is putting up 172 of his own dollars for the solution to an intriguing little stumper. And the prize problem appears to be somewhat easier than the Riemann hypothesis or the P=NP question. (Unless it’s impossible!)

Gasarch sets forth his prize challenge in a blog post, with further background in a paper and in the slides from a talk. All of those works are well worth reading, but for those who don’t want to chase down the references, here’s the gist. Our mission, should we choose to accept it, is to color the nodes of an n-by-m grid, using only a specified number of colors, and observing a particular constraint: Nowhere in the grid may the four corners of a rectangle all have the same color. (Only rectangles with sides parallel to the x and y axes are considered.) For example, here is a four-colored 15-by-15 grid that satisfies the no-monochromatic-rectangles constraint:

grid15r0a.png

In this array of dots there are \( {15 \choose 2}^2=11025\) distinct rectangles. If you care to check through all of them one by one, you’ll find that in no case do all four corners have the same color. In contrast, here is a 16-by-16 grid that is almost but not quite rectangle-free:

grid16r2a.png

Gasarch offers his $289 bounty for any four-colored 17-by-17 grid with no monochromatic rectangles. Why is that grid of particular interest? It’s a border case. Among square grids, all those up through 16-by-16 have been shown to have rectangle-free four-colorings. For the 18-by-18 grid and all larger squares, rectangle-free four-coloring has been proved impossible. For squares larger than 18-by-18, four-coloring has been proved impossible. The status of the 17-by-17 and 18-by-18 grids remains unsettled, but Gasarch believes that both are four-colorable.

Gasarch has much more to say about the mathematics behind this problem. Here I would like to muse on some computational aspects of searching for a 17-by-17 four-coloring.

To state the obvious first, this is not a problem we can expect to solve by exhaustive enumeration. There are 4289 possible colorings of the grid. Casting out symmetries brings that down only to 2 × 4287. There’s not world enough or time for checking them all.

Testing grids at random is also hopeless in the 17-by-17 case. This nonstrategy actually works quite well for small grids. For example, you can readily find a four-coloring of an 8-by-8 grid just by generating a few hundred thousand random colorings. But the method fails for larger grids because the proportion of all colorings that are rectangle-free falls steeply with grid size. (Consider the 2-by-2 grid: There are 256 four-colorings, and all but four of them are rectangle-free.)

To make any progress toward the 17-by-17 case, we’ll have to do at least a little thinking, rather than expecting the computer to do all the work. Here’s one idea that’s very easy to implement: Find a monochrome rectangle somewhere within the grid, change the color of one of its corners, and repeat until you can’t find any more rectangles. This algorithm works reasonably well for grids up to about 12-by-12, but then it runs out of steam. On larger grids, changing the color of a node to eliminate one rectangle is likely to create another rectangle elsewhere (or several more of them). As a result, the system merely takes a random walk, with trendless fluctuations in the number of rectangles at any given moment. You discover a rectangle-free coloring only if the walk happens to stumble on the zero point.

I found the 15-by-15 four-coloring shown above with an algorithm that’s a little more effective even thought it’s no more sophisticated than the corner-twiddling method. The program repeatedly chooses a node at random and tries assigning it all four possible colors, tallying up the number of rectangles for each color choice. Some color or set of colors must minimize the rectangle count; from among these optimal colors the program chooses one at random and sets the node to that color before repeating the loop. This is a “greedy” method: At each step the number of rectangles can decrease or remain constant but can never increase. Greedy methods are notorious for getting stuck in local optima that are inferior to the global optimum. Maybe that’s what happens to my program when I try it on 16-by-16 and 17-by-17 grids. Or maybe the search space is just too large. In any case, when I woke up this morning and checked the results of an overnight run, I did not find a rectangle-free four-colored 17-by-17 grid awaiting me.

Of course I really didn’t have to do any algorithm analysis at all to know that I wasn’t going to win $289 and eternal fame with a day or two of idle hacking. If the problem were that easy, Gasarch and his students would have solved it for themselves long ago.

In spite of these various failures and frustrations, the grid-coloring problem still looks tantalizingly solvable. If a four-coloring of the Gasarch grid exists, it seems like we should be able to find it by some practical computation.

There are certainly lots of approaches more powerful than the blind dart game I’ve been playing. For example, if local optima are the major impediment, some variation on simulated annealing might help.

A more radical possibility is to try to construct an instance rather than merely search for it. If we assume that the four colors are represented as evenly as possible, then the 17-by-17 grid must have 72 nodes in each of three colors and 73 nodes in the fourth color. Starting from a blank grid, it’s easy to mark off 73 nodes in a single color without creating a forbidden rectangle. Adding 72 nodes of a second color is only a little harder. But then the job gets tricky. When you try to fill in a third color, you also by default choose nodes for the fourth color at the same time, and conflicts pile up in a hurry. Some kind of backtracking approach is probably needed here. Gasarch links to a paper by Elizabeth Kupin of Rutgers that explores these ideas in more detail. (If you want to prove the nonexistence of a four-coloring, this is presumably the way to go.)

Gasarch mentions two other promising avenues: integer programming (the discrete variant of linear programming) and SAT solvers–algorithms for the satisfiability problem. Having spent some time hanging out with a few master SAT solvers, I’m intrigued by the latter possibility. You can almost encode the grid-coloring problem as an instance of NAE-SAT, or not-all-equal SAT. Each node of the grid is represented by a variable that can take on any of four values. We group subsets of variables four at a time into clauses, where each clause includes the variables representing the four corners of a rectangle somewhere in the grid. For the 17-by-17 grid there are \({17 \choose 2}^2=18496\) clauses of this kind. The entire formula is satisfied if we can assign values to the 289 variables in such a way that none of the 18496 clauses has all four of its variables with the same value. After 40 years of work on SAT, there’s a highly developed technology for solving such problems. However, there’s a hitch. SAT problems are formulated in terms of boolean variables, with just two values each, but the grid-color variables have four values. Thus a further layer of encoding is likely to be needed, bringing a further explosion in the size of the problem instance.

One final hackerish note: What’s the best way to detect the presence of a monochromatic rectangle in a grid? My candidate goes like this. We encode the rows of the grid in a set of bit vectors–four vectors for each row, representing the four possible colors. For example, the red vector for a row has a 1 at each position where the corresponding node is red, and zeroes elsewhere. The blue vector has 1s for blue nodes, and so forth. Now we can detect a rectangle merely by taking the logical AND of two rows (an operation that could be a single machine instruction). A rectangle exists if and only if the result of the AND is nonzero. at least two bits are set in the resulting vector.

[Thanks to all the commenters for corrections and elaborations.]

El Farol Highway

27 November 2009

traffic-jam-9146.jpg

I got caught Wednesday night in the national pre-Thanksgiving traffic jam. As I was approaching Baltimore, an electronic signboard announced:

Delays on I-95 and I-895

Suggest I-695

Of course I immediately thought: But everyone will read the sign and take I-695, so that road will be jammed too. Then I thought: But everyone who reads the sign will realize that everyone else will also read the sign, and so they’ll not choose 695. Then I thought….

Hmm. There’s something familiar about this problem.

I’m not telling which road I took, but I can report that it was the only congestion-free segment of my trip.

The birth of the giant component

20 November 2009

The waste product of my document scanning project is a slag heap of extracted staples:

staples-in-bowl-2667.jpg

The other day I made a discovery: If you grab one of the discarded staples and lift it, the whole ball of tangled, mangled metal comes along, leaving behind only a few stragglers in the bottom of the bowl.

ball-of-old-staples-closer-2691.jpg

When I noticed this, my first thought was “Hmm, that’s funny.” My second thought was “Oh, of course: Erdős-Rényi.” And my third thought–well, I’m still working on my third thought, as well as thoughts four, five and six.

Erdős and Rényi are Paul Erdős and Alfréd Rényi, who wrote a big paper on “The Evolution of Random Graphs” 50 years ago (Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 1960, 5:17–61). Their paper wasn’t quite the debut appearance of random graphs in the mathematical literature, but it’s usually cited as the theory’s point of origin.

In one version of the Erdős-Rényi process, you start with a set of n isolated vertices and then add random edges one at a time; specifically, at each stage you choose two vertices at random from among all pairs that are not already connected, then you draw an edge between them. It turns out there’s a dramatic change in the nature of the graph when the number of edges reaches n/2. Below this threshold, the graph consists of many small, isolated components; above n/2, the fragments coalesce into one giant component that includes almost all the vertices. “The Birth of the Giant Component” was later described in greater detail in an even bigger paper–it filled an entire issue of Random Structures and Algorithms (1993, 4:233–358)–by Svante Janson, Donald E. Knuth, Tomasz Luczak and Boris Pittel.

What made me think of a connection between Erdős-Rényi graphs and my hairball of staples? What I had in mind was something like this: The long, straight, middle part of a staple corresponds to an edge of a graph, and the bent end pieces, which can grab on to each other, are the vertices. Thus a single staple is a graph consisting of two vertices connected by one edge. When two staples hook up, two of their vertices merge and we’re left with a connected graph of three vertices and two edges. Since each staple contributes one edge and at most two vertices to the graph, the number of edges must be at least half the number of vertices. Thus the graph is always over the threshold for forming a giant component, according to Erdős and Rényi.

The counting part of this analysis seems okay, but I’m afraid the rest of it doesn’t hold up very well. Whatever is going on in the staple ball, the evolution of the system is not well modeled by the Erdős-Rényi process of adding edges to a fixed set of vertices. Instead, each staple brings both an edge and two vertices. The crucial event that makes the cluster hang together is the merging of vertices when staples link hands; this merging has no counterpart in the Erdős-Rényi process.

The underlying problem here is that Erdős-Rényi graphs are purely topological–there’s no concept of distance, and any two vertices are equally likely to have an edge joining them. But the staple graph has important geometric constraints. Two vertices can be joined by an edge only if the distance between them is approximately equal to the length of a staple.

The geometric structure suggests trying a different kind of model–perhaps the kind that describes the molecular structure of liquids and solids. Water molecules, for example, are linked together by a network of hydrogen bonds; each hydrogen atom in one molecule can bond to the oxygen atom in another molecule. But the bonds cannot extend over arbitrary distances; they reach only between neighboring molecules. In the resulting three-dimensional structure the basic motif is a tetrahedron with an oxygen atom at its center and hydrogen atoms at the four corners. (There’s also a schematic two-dimensional model known as square ice.) We might imagine something similar going on with the staples, where the two bent ends can form hydrogen-bond-like links to other nearby staples.

But there’s a problem with such chemical models as well. Atoms have a fixed valence (more or less–let’s not quibble); old-staples-detail-2697.jpgin water, for example, each hydrogen atom can form a hydrogen bond with only one oxygen atom. But we have no reason to suppose that the hooked ends of a staple can attach to only one other staple. As a matter of fact, if such a restriction were enforced, then the staples could form only chains and rings, not dense clusters. In a close look at the actual clusters, it’s easy to find places where three or more staples are all hooked together at the same point. By carefully teasing apart the cluster, I have spotted vertices that appear to have a degree of at least six.

Yet another physical process that might provide a model for the staple graph is diffusion-limited aggregation. This is the mechanism responsible for the filigree pattern in the banner at the top of the bit-player web page. It is generated by sticky particles that drift at random until they wander onto the substrate or touch another particle that is already in contact (directly or indirectly) with the substrate. For staples, I suppose the drifters would be tumbling dumbbells with sticky ends–somewhat harder to simulate.

Another factor to keep in mind is that spatial dimensions are surely important here. For one thing, there’s just more room to maneuver in three dimensions, with more opportunities to glom onto a neighbor. But in the specific case of staples, there’s another reason: Confined to a plane, they have a hard time linking up:

staples-on-plate-detail-2681.jpg

Dispersed on a flat plate, they refuse to coagulate even when swirled vigorously. The reason, presumably, is that secure links form only when the staples can turn 90 degrees and interlock.

In this connection it would seem significant that these are used staples, somewhat varied in shape, with hooked ends that had been bent approximately 180 degrees in the process of stapling and that mostly retained an angle greater than 90 degrees after being pried out the papers. I wondered how shiny new staples would behave, and so I tried the experiment. (Materials and methods: 630 Stanley Bostitch chisel-point staples, model SBS191/4CP, freshly dispensed from an open-jaw Swingline stapler.)

ball-of-new-staples-crop-2700.jpg

I was mildly surprised at the result. Although the aggregation was somewhat looser and more delicate, it really wasn’t that much different. Again we witness the birth of a giant component.

Information is physical

11 November 2009

I’m still busy digitizing a lifetime’s accumulation of clippings from magazines and journals, along with heaps of old tech reports, memos, and miscellaneous other cruft. There’s something slightly eerie about the process. So far I’ve emptied out a dozen file drawers, run several hundred pounds of paper through the scanner, and created thousands of PDFs. Yet my laptop is not a gram heavier. The glib explanation is that I’m just scraping pure information off the pages, leaving behind the ink and cellulose; I’m saving the bits and recycling the atoms. But is information so readily dematerialized? One of the manila folders I have just dredged up out of a filing cabinet is bulging with publications by the late Rolf Landauer, including several papers on the theme “Information is physical!”

I first met Rolf circa 1980. I had written a brief Scientific American article about some recent developments in optical computing technologies, and Rolf called to tell me I should never do anything so reckless and foolish and tasteless again. He took a dim view of photonics. This initial encounter was not a promising start to a friendship, but we got over it. He put me on his mailing list, which meant that I got a fat envelope once or twice a year, with reprints or preprints of his own latest work and often copies of other papers he thought I should being paying attention to.

Four of the articles in my Landauer folder have very similar titles:

  • Information is Physical (Physics Today, 1991).
  • The Physical Nature of Information (Physics Letters A, 1996).
  • Information is a Physical Entity (Physica A, 1999).
  • Information is Inevitably Physical (In Feynman and Computation, 1999).

If Landauer had lived longer (he died in 1999), I like to think that the next installment in the series would have been titled even more emphatically: Information is Physical, Damn It!

In all of these essays, Landauer’s thesis is straightforward:

Information is inevitably tied to a physical representation. It can be engraved on stone tablets, denoted by a spin up or down, a charge present or absent, a hole punched in a card, or many other alternative physical phenomena. It is not just an abstract entity; it does not exist except through a physical embodiment. It is, therefore, tied to the laws of physics and the parts available to us in our real physical universe.

This notion is obvious and totally uncontroversial–except to those who think it’s totally wrong. Doubters tend to focus on mathematical entities. Surely the integers exist as abstractions, independent of stone tablets and punchcards, no? And triangles would have three sides even if all the matter in the universe were annihilated–right? When it comes to numbers like π and e, one might well argue that they can exist only as abstractions; they can never be given a complete physical representation.

Landauer did not argue strenuously for his constructivist position within mathematics itself, but he did take a hard line about mathematical methods in the physical sciences:

There is a tendency to think of mathematics as a tool which somehow existed before and outside of our physical world. Mathematics, in turn, allowed the formulation of physical laws which then run the world, much as a process control computer runs a chemical plant. Here, instead, we emphasize that information handling has to be done in the real physical world, and the laws of physics exist as instructions for information handling in that real world. It, therefore, makes no sense to invoke operations, in the laws of physics, which are not executable, at least in principle, in our real physical world.

Our accepted laws of physics invoke continuum mathematics, which is, in turn, based on the notion that any required degree of precision can be obtained by invoking enough successive operations. But our real universe is unlikely to allow an unlimited sequence of totally reliable operations. The memory size is likely to be limited, perhaps, because the universe is limited. Even in an unlimited universe it is a strong presumption to invoke the possibility of assembling an arbitrarily large organized memory structure. Furthermore, in a world full of deleterious processes including noise, corrosion, electromigration, incident alpha particles and cosmic rays, earthquakes and spilled cups of coffee, it would be unreasonable to assume that each step in an unlimited sequence of operations can be carried out infallibly.

Those alpha particles and spilled cups of coffee bring me back to my little document-scanning project–my kitchen-table version of Google Books. I am well aware that my digitized archives are not disembodied abstractions, that the information I’ve scanned from Rolf’s preprints is still physical even if it’s less tangible, and that the bits remain vulnerable to all the perils of a material world. Indeed, the thought of losing all the files I’ve scanned–now that the paper originals are beyond recall–makes me itchy to plug in the back-up drive.

But the process of replicating the bits–which is even easier than capturing them in the first place–sends my mind off on another tangent. As Rolf said, we can represent information in many physical forms: as marks on paper, as magnetized domains on a metal-coated disk, as packets of electric charge, as base pairs in a DNA molecule, as beads on an abacus. When we build machinery to process this information, we can choose among many different computing technologies: transistors, brass gears, neurons, rubber bands and tinker toys, quantum dots, even photons in optical waveguides (though Rolf despised that last possibility, and he was skeptical about the quantum dots).

Somehow, this proliferation of physical embodiments for information does not strengthen the conviction that information is subordinate to its physical representation. When we can write the same message in so many forms–everything from lines in the sand to holograms–the message itself begins to seem just as substantial as the physical medium, and perhaps more enduring. I have digital documents that began life on eight-inch floppy disks 20 years ago. The files have migrated a dozen times or more to other media: five-and-a-quarter-inch floppies, three-and-a-half-inch floppies, Zip drives, digital audio tapes, CD-ROMs, a succession of hard disks. Most of the physical objects making up that long chain of transmission have long since succumbed to coffee spills, corrosion and other hazards, or else they have simply gotten lost. Yet the data persists, a sort of standing wave in the river of hardware rushing toward obsolescence and oblivion. Under the circumstances, it can be hard to keep in mind that the information depends for its very existence on those delicate shards of matter. It goes against the grain of the whole apparatus of computer science, where automata theory, the Church-Turing thesis, and the Turing equivalence of programming languages all encourage us to think that abstractions come first, and implementation is secondary.

Euclid1703.png

These musings are not meant as an attempt to refute Landauer’s assertion. I still have to concede that I cannot record or express a pattern of bits without resorting to some physical medium, if only the gray matter in my own head. But the notion sits uncomfortably; it’s a conundrum. I wish I had a chance to chat with Rolf about it. But, sadly, Rolf Landauer is no longer physical.

Note: As far as can tell, none of the four Landauer papers I mention above are available online without payment. I am therefore taking the liberty of posting my scan of Rolf’s preprint of the information-is-a-physical-entity paper. I would also like to call attention to two recent articles about Landauer: a discussion of his contributions to solid-state physics by Bertrand I. Halperin and David J. Bergman, and a biographical memoir by Charles H. Bennett and Alan B. Fowler.

A comment on comment spam

6 November 2009

Someone out there is being paid to post comments on bit-player.org–and doubtless on tens of thousands of other blogs as well. The comments are mostly bland and inoffensive, sometimes effusive, always hastily composed. “Thanks for article..good work,” they say. “Amazing!!” “i like your article and i will be wating your net article….”

The payload attached to each of these comments is a link to a web site that someone wants to promote. Some of the sites are selling goods or services; others are billboards full of pay-per-view ads; a fair number are mysterious to me, being written in languages I don’t understand. I would not be astonished to learn that some of the sites are distributing malware.

Years ago, the first wave of comment spam was powered by scripts that flooded blogs and wikis and forums with hundreds of postings full of program-generated gibberish and long lists of links. That abuse was stopped by captchas and other simple filters, like the one I’ve been using here on bit-player. Another important defense is the “nofollow” tag, which instructs search engines to ignore links in comments, thereby eliminating the incentive of gaining PageRank points.

The comment spam arriving now is not generated by a Perl script. Somewhere in the world a person is being paid to read these very sentences, then to prove his or her humanity to the Turing-test filter, and finally to write a few words in response and sneak in a paid link. I’m both fascinated and appalled to learn that the Internet economy can support this activity. What’s the going rate for writing comment spam? Is it worth a penny to get your link briefly exposed to the vast daily readership of bit-player.org? How about a tenth of a penny?

I have a sinking feeling that the people doing this work are themselves victims of a scam, and that they’ll never see even the tenth of a penny. They have probably succumbed to a 21st-century version of the ads I used to see on matchbook covers: “Work at home! Make $500 a week stuffing envelopes in your spare time!”

Of all the ways that poor and desperate people are exploited, this is not the worst. Presumably the work is safe and sanitary, and it even rewards literacy. Some of my comment spammers would surely have interesting ideas to contribute if only they had the luxury of time.

All the same, this kind of commercial graffiti is not something I want to encourage. The available countermeasures include prohibiting all links in comments, holding all comments until a moderator approves them, or requiring commenters to register with a verifiable email address. None of these options appeals to me, but I may have to consider them if the problem persists. For now, though, I’m going to continue the human approach–manually deleting spammy comments as quickly as I can get to them. I am also closing comments on all but the 10 most-recent items on bit-player; the spammers seem to favor older posts.

I have to add that spotting comment spam is not always as easy as you might think. Consider this comment, which came in response to a story about editorial changes at Scientific American magazine:

Many times, when i read your American Scientist columns, I have asked myself that is any other country’s scientist didn’t give anything to the world?

The text of the comment is pertinent to the topic; it raises a question that’s entirely appropriate in this context; and there’s clear evidence that the author has actually been reading bit-player (and even my American Scientist columns) rather than merely spewing comments at random. This is someone I would like to be able to welcome into the community. But the link associated with the comment was an ad for a web-hosting service, and another comment from the same IP number advertised a different service. Was I wrong to hit the delete button?

You’re welcome to comment below, but without spammy links, please.

Flights of fancy

27 October 2009

starlings-closeup-2058.JPG

As I have mentioned in the past, I’m fascinated by the acrobatics of bird flocks, especially the big congregations of European starlings that gather in the evening at this time of year. Evidently I’m not the only one with such an interest. In the past few years the subject has attracted the attention of quite a large flock of scientists, including not only biologists but also various luminaries in physics, mathematics and computer science.

Below are some notes on a few of the recent papers, but first I have to mention a classic from 20 years ago:

Reynolds, Craig W. 1987. Flocks, herds, and schools: a distributed behavioral model. Computer Graphics 21(4):25–33. Author archive.

This is the paper that began the modern era of flocking studies by proposing that animals could coordinate and synchronize their movements without any need for a leader or external cues. Others were thinking along the same lines at about the same time, but it was Reynolds who attracted wide notice with his enchanting computer animations of “boids” soaring through an imaginary three-dimensional space. Each individual in the flock acts according to simple, local, fixed rules, and the synchronized maneuvers emerge spontaneously.

Reynolds suggested three particular rules that might guide the behavior of each bird:

  • Avoid collisions.
  • Try to match the speed and heading of nearby birds.
  • Move toward the center of the group in which you are flying.

Reynolds was working in computer graphics, and his ideas were soon taken up by movie studios and by the makers of video games. In a sense, his simulations only had to look right; they didn’t have to reflect what actually goes on in a starling’s head. But whether or not the birds were paying attention, students of animal behavior certainly were.

starlings-wide-2064.jpg

Much of the recent activity arises out of new field studies, conducted mainly by physicists.

Cavagna, Andrea, Irene Giardina, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale and Vladimir Zdravkovic. 2008. The STARFLAG handbook on collective animal behaviour. 1: Empirical methods. Animal
Behaviour
76:217–236. Preprint.

Cavagna, Andrea, Irene Giardina, Alberto Orlandi, Giorgio Parisi and Andrea Procaccini. 2008. The STARFLAG handbook on collective animal behaviour. 2: Three-dimensional analysis. Animal Behaviour 76:237–248. Preprint.

This group, coordinated by Andrea Cavagna and Irene Giardina of the University of Rome La Sapienza, has been photographing starling flocks near the city’s main railroad station (the Termini), which is just a few blocks from the university. Using pairs of synchronized cameras, the observers have captured stereoscopic images and then applied special image-analysis software to reconstruct the three-dimensional trajectory of each bird. Similar techniques have been tried in the past, but only with small flocks (a few dozen birds). The Italian group has traced the motions of individual birds in groups of up to 2,600. The two papers cited above give technical details on how the data were gathered and analyzed.

Ballerini, Michele, Nicola Cabibbo, Raphael Candelier, Andrea Cavagna, Evaristo Cisbani, Irene Giardina, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale and Vladimir Zdravkovic. 2008. Empirical investigation of starling flocks: a benchmark study in collective animal behaviour. Animal Behaviour 76:201–215. Preprint.

Ballerini, Michele, Nicola Cabibbo, Raphael Candelier, Andrea Cavagna, Evaristo Cisbani, Irene Giardina, Vivien Lecomte, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale and Vladimir Zdravkovic. 2008. Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proceedings of the National Academy of Science of the USA 105:1232–1237. Open access.

And here the same authors (with a few additions) report their results and conclusions. They base their interpretation on a computational model that is recognizably a descendant of the Reynolds scheme, but with one crucial modification. Reynolds and others assumed that each bird is influenced by all other birds within some fixed distance (a “metric neighborhood”); Ballerini et al. get a closer match to the data by assuming that a bird attends to the motions of a fixed number of near neighbors, regardless of distance (a “topological neighborhood”). In other words, the graph of interacting birds has nearly constant vertex degree; the typical degree is probably six or seven. The main significance of this algorithmic change is that it helps maintain the cohesion of the flock in spite of large variations in density.

Hildenbrandt, Hanno, Claudio Carere and Charlotte K. Hemelrijk. 2009. Self-organised complex aerial displays of thousands of starlings: a model. arXiv:0908.2677v1

Those same flocks at Termini have a role in this study as well; the model presented here draws on data from Ballerini et al. as well as videotapes made at Termini by Carere. (Carere is another physicist at Sapienza; Hildenbrandt and Hemelrijk are biologists at the University of Groningen.)

The model works on the same essential principles, but it differs in intellectual style and emphasis. Hildenbrandt et al. want to account for specific details of a flock’s behavior—not just the general tendency to fly in close formation but also the particular shapes of starling flocks, the maneuvers they perform, the altitudes they prefer, and so on. Reaching for this verisimilitude leads to a rather complicated model with many parameters in need of fine tuning, such as aerodynamic properties of the bird’s wing and body and banking angles in turns. Hildenbrandt et al. report some success in explaining the geometry of flocks (they tend to be horizontally flattened rather than spherical). They do less well in an attempt to account for an extra-dense layer of birds observed at the periphery of a flock.

starlings-landing-2072.jpg

Cucker, Felipe, and Steve Smale. 2007. Emergent behavior in flocks. IEEE Transactions on Automatic Control 52:852–862.

Chazelle, Bernard. 2009. Natural algorithms. Proceedings of the 20th Symposium on Discrete Algorithms, pp. 422-431. Preprint.

Chazelle, Bernard. 2009. The convergence of bird flocking. arXiv:0905.4241v1

Leaving behind the breathy wing-beats of living starlings, we enter a world of mathematical abstractions.

Cucker and Smale, peripatetic mathematicians currently at the City University of Hong Kong, take a stripped-down model of flocking and ask this question: Is it guaranteed that all the birds in the flock will eventually settle on the same velocity, and thus fly together forever? Chazelle, a theoretical computer scientist at Princeton, asks a follow-on question: If the birds do converge on the same speed and heading, how long might it take for them to do so, in the worst case?

The answer to the Cucker-Smale question turns out the be yes: Given certain preconditions and parameter values, convergence is certain. But Chazelle shows that it can take quite a while for the flock to reach consensus. For n birds adjusting their velocities in discrete steps, the upper bound is 2 ↑↑ (4 log n) steps. As I was saying just the other day, this up-arrow notation denotes an exponential tower of 2s with, in this case, 4 log2 n levels. In other words, in a flock of a thousand birds, the convergence time is roughly

\[2^{2^{2^{\cdot^{\cdot^{\cdot^2}}}}}\]

with 40 levels of exponentiation. This is a ridiculous number, far exceeding the lifetime of a starling (or of a universe, for that matter). As Chazelle notes: “Our bounds obviously say nothing about physical birds in the real world. They merely highlight the exotic behavior of the mathematical models.”

It is rather wonderful to reflect—as you stand in a field of corn stubble admiring the flocks of birds wheeling overhead in the evening sky—that these avian entertainments should be the starting point for a line of reasoning that ventures so far into the wild blue yonder of inexpressible numbers.

Lebar Bajec, Iztok, and Frank H. Heppner. 2009. Organized flight in birds. Animal Behaviour 78:777–789. Preprint.

I mention this piece last, but it would actually be a good place to start if you want a primer on flocking. Frank Heppner, a biologist at the University of Rhode Island, is one of the pioneers of flocking-and-swarming studies; here, with a mathematical colleague from the University of Ljubljana, he reviews many of the recent contributions and puts them in historical context. The review includes a discussion of the more crystalline flying formations of large birds such as geese as well as the amorphous flocks of starlings.

A Wiki proof

21 October 2009

This morning’s list of new submissions to the mathematics section of the arXiv brings a paper signed by “D. H. J. Polymath.” The name is too good to be true, of course. The paper is the first fruit of a project instigated by Timothy Gowers of Cambridge. In a blog post last January, Gowers asked “Is massively collaborative mathematics possible?” and proposed a problem that might serve as a test case. There were more than 100 responses, and soon the game was on. Discussion began in the comments section of Gowers’s blog and was later supplemented and summarized in a Wiki maintained by Michael Nielsen.

The problem under attack is known as the density Hales-Jewett theorem (hence Dr. Polymath’s initials). The ordinary version of the Hales-Jewett theorem states that if you play tic-tac-toe on a board of high enough dimension, the game can never end in a draw. When you have filled in all the boxes, some row or column or diagonal in the multidimensional grid must consist entirely of x’s or o’s—even if you’ve been trying to reach a stalemate position. The theorem also holds with more than two players and more than two symbols, provided the dimension of space is high enough. The “density” version of the theorem says that sometimes you can’t avoid a winning play even if you stop before filling in all the boxes: A grid-spanning line of solid x’s or o’s is certain to appear as soon as the density, or fraction of filled boxes, reaches a threshold level.

The density version of the Hales-Jewett theorem was proved almost 20 years ago by Hillel Furstenberg and Yitzhak Katznelson, so this is not an open problem. But the Furstenberg-Katznelson proof drew on some results from ergodic theory, a branch of mathematics that seems slightly exotic for a problem stated in such simple, finite terms. Gowers asked if there might be a purely combinatorial proof. And he focused attention specifically on the case where each row and column of the tic-tac-toe board has three positions and there are three players. In other words, the grid is a 3 × 3 × 3 ×  . . .  × 3 hypercube where each vertex is to be marked with one of three symbols.

The combinatorial proof is given in the Polymath paper, which also puts a bound on how high a dimension is needed. The bound is 2 ↑↑ O(1/δ3), where δ is the density and the double-arrow notation indicates an “exponential tower” of 2s — namely O(1/δ3) of them.

A collective pseudonym such as Polymath immediately brings to mind the famous Nicolas Bourbaki. As in the writings of that French group, the Polymath paper includes no listing of contributors. But there’s a difference: The Bourbakists were a secretive bunch, a sort of sleeper cell within mathematics, and historians have pieced together who did what only in retrospect. The Polymath group describes their style of work as “open source” mathematics. It’s all there on the web.

In the long run we’re all dead

19 October 2009

So said John Maynard Keynes, but what did he know about the long run? He was a swell who spent his mornings in bed, trading international currencies over tea and crumpets.

Yesterday was my day for the long run: I ran a marathon for the first time in my life. The race was the Baystate Marathon in Lowell, Massachusetts. I finished in 5:11:23, good enough for 1,494th place. (Complete results here.) Except for the leaden skies, the blustery wind, the temperatures falling from the 40s into the 30s, and the steady rain that later turned to “wintry mix” and then a bit of snow, it was a perfectly lovely day for a run along the Merrimack River. I had a splendid time. Really. And yet it would have been even more fun if it hadn’t gone on quite so long.

Somewhere around mile 20, I began reflecting on the fact that the leaders of the race had already finished more than an hour earlier, and by now they were probably showered and dressed and having something hot to eat. I had to ask myself: Why hadn’t I been running faster, so that I too could now be sitting somewhere comfortable and dry and warm? A mile later, as I slogged on, it dawned on me that surviving a marathon is basically an optimization problem. The essential task is to balance the pain of running faster against the suffering of staying on your feet longer. There’s a mathematical function to be minimized here. But what’s the form of that function?

Splashing on through the puddles, I didn’t make a lot of progress on that question, but several hours later, wrapped in a blanket and enjoying a grilled-cheese sandwich, I realized that a simple candidate function is just 1/t + t, where t is the total time of the run. This function clearly has the right boundary behavior. It diverges at t = 0, reflecting the common-sensical notion that running infinitely fast is infinitely painful. The function is also unbounded as t goes to infinity, since taking forever to complete the course would also be very unpleasant. In between these extremes, there must be at least one t of minimal misery.

To get quantitative results from this function, we need to plug in some constants and coefficients. Marathon times near zero are unrealistic, so we ought to replace t with ta, where a is the best plausible finishing time. Then we need a coefficient b that sets the scaling between the two kinds of discomfort. The full expression becomes:

\[f(t)=\frac{b}{(t-a)} + (t-a)\]

The value of a in this expression is somewhat arbitrary, but a good guideline might be the current world record marathon time of about 2:04. If we set a = 120 minutes, there’s no need to worry that I’ll ever do better than that (and thereby flip over onto the negative branch of the hyperbola, where everyone runs backwards). With this parameter fixed, the location of the least-arduous marathon time depends only on the value of b, which defines the cost of speed vs. the cost of endurance. If my run in Lowell was a correct solution of the optimization problem, then my personal value of b is 36,481.

marathongraph.png

This line of reasoning has a curious corollary. If I want to improve my marathon time, I don’t have to learn to run faster; all I have to do is become less tolerant of prolonged standing or walking. This impatience will shift the point of minimum pain leftward, toward higher speeds and shorter elapsed times. For example, if I can just get my value of b down to 14,400, I’ll be running four-hour marathons. I don’t think I’ve ever heard of a training program based on this principle.

Update 2009-10-20: This morning it occurred to me that the same graph, relabeled, could also serve to predict the future course of my geriatric athletic career. I’ve been running for only a few years, and I never attempted distances greater than 10K until this summer. Thus it seems reasonable to suppose that with further effort I might improve somewhat. On the other hand, I’m about 30 years beyond the age at which distance runners tend to reach their peak, which argues that I’m running against the wind, so to speak. (In the long run, Keynes was right after all.)

Here’s a fancifully relabeled version of the graph that I find rather cheering.

marathonagegraph.png

Unfortunately, there’s no good reason to believe that either of these phenomena are truly described by a law of the specific form 1/t + t. The second term could be t2, or even et. (The marathon “ranking standards” published by USA Track and Field appear to increase quadratically with age.)

Update 2009-10-23: Perhaps someone at The New York Times reads bit-player. In today’s paper there’s a story under the headline “Plodders Have a Place, but Is It in a Marathon?” Juliet Macur writes:

Purists believe that running a marathon should be just that — running the entire course at a relatively fast clip. They point out that a six-hour marathoner is simply participating in the event, not racing in it. Slow runners have disrespected the distance, they say, and have ruined the marathon’s mystique.

Thick-skinned as I am, this stings. My apologies to the purists, and to the mystique. I’ll just note that the hundreds of volunteers who put on the race in which I “participated” were marvelously supportive of us plodders. The high school students at the water stations did not abandon their posts, or lose their enthusiasm, after the fleet-footed runners passed by. And when I arrived at the finish line, hours after the winners, there were still fans applauding in the grandstand, and volunteers to wrap me in a blanket, bring me water, offer congratulations. They had been out in the rain as long as I had, and they weren’t done yet. Three cheers for them.