Tetrahedra with a twist

A year ago, when I was playing with Geomag models of sphere clusters, I stumbled upon this curious object, a helix assembled from tetrahedra linked face to face:

Geomags model of a tetrahelix

I certainly wasn’t the first to discover it. A. H. Boerdijk described the structure in 1952; notable later contributions and commentaries came from J. D. Bernal and H. S. M. Coxeter. (See references below.) Buckminster Fuller also had something to say on the subject, and he suggested a name: the tetrahelix.

As far as I can tell, the tetrahelix was not known to the ancients, although there is nothing about its geometry they could not have understood. Perhaps they overlooked it because Aristotle believed that regular tetrahedra could be packed together to fill all of three-dimensional space—a feat that would make this mere linear packing rather unimpressive by comparison. But Ari was wrong about tetrahedral tesselation; it can’t be done. The closest you can come is a packing of alternating tetrahedra and octahedra, as in the Geomag-like roof truss in the photograph below.

Roof truss in the atrium of the Washington Wardman Hotel, photographed during the 2009 Joint Mathematics Meetings.

Open metal frame of a roof truss at the Marriott Wardman Hotel in Washington.

It turns out there’s a direct connection between the roof-truss structure and the tetrahelix. One way to construct a tetrahelix is to start with a line of alternating tetrahedra and half-octahedra (the latter having exposed square faces). Then you squeeze and twist each square face until it becomes a rhombus that can be spanned by a new strut (red in the lower panel below), creating two new triangular faces. Presto! The tetrahelix appears.

A truss made of alternating tetrahedra and half-octahedra twists to become a tetrahelix.

Tetrahelix in cylinder 156x620

Remarkably enough, after all this twisting and smushing, the helix still extends along a straight line. All the vertices of all the tetrahedra lie on the surface of a right circular cylinder. The center line of that cylinder does not pass through the centers of the individual tetrahedra, but it does act as an axis of symmetry for the helix as a whole.

Tetrahelix seen end onThe symmetry in question is a screwy one: a translation combined with a rotation. Suppose you have an infinitely long tetrahelix assembled from tetrahedra of side length 1. Then if you slide the entire structure along its axis by a distance of \(1/\sqrt{10}\) (\(\approx 0.316\)), and also rotate it around the same axis through an angle of \(\arccos -2/3\) (\(\approx 131.8\) degrees), the new state is indistinguishable from the starting configuration. Note that the rotation angle is irrational. As a consequence, the helix is aperiodic. When you look along the bore of the helix, you will never see two vertices lined up one behind the other.

The tetrahelix is a chiral structure: It comes in left-handed and right-handed variants, as shown in the photograph below.

left-handed and right-handed tetrahelices

Playing with these two Geomag models, I began to wonder whether they could be joined end-to-end—or in other words whether a tetrahelix can change its handedness in midstream. The answer, as shown below, is yes. A helix peacefully curling to the left can abruptly shift to a right-handed way of life—but the price of becoming ambidextrous is a big kink in the chain.

An ambidextrous tetrahelix 2992

You can even create a tetrahelix that changes direction every third tetrahedron (the smallest possible interval). The result (below) is no longer a helix at all but a weird sort of bridge with an arched spine and two zigzag rails. If you extend the arc further, will the two ends meet to form a closed loop? I don’t have enough Geomags to answer that question.

Zigzag helix bridge

Down through the ages, quite a lot of mystificatory cruft has attached itself to the Platonic solids. Plato himself argued that the universe is built of regular polyhedra, which he identified with the classical elements. (The tetrahedron was fire, because of the sharp points.) Then there was Kepler’s Mysterium Cosmographicum, where he tried to explain the orbital diameters of the planets in terms of nested polyhedral shells.

the squeeze-purse animation: an octahedron becomes a cluster of three tetrahedra

Mouseover to animate. Failing that, click to open in a new tab or window.

Bucky Fuller contributed his own share of overwrought polyhedrolatry. In Synergetics he describes the transformation of an octahedron into a chain of three tetrahedra, as in the animation at right. His account of the geometry is straighforward, explicit and accurate. But he doesn’t stop with geometry. He attributes to this rearrangement of edges and vertices a much larger significance:

Symmetric matter has been entropically transformed into asymmetrical and directionally focused radiation: one quantum of energy has seemingly disappeared. When the radiation impinges interferingly with any other energy event in Universe, precession recurs and the three-quantum electromagnetic wave retransforms syntropically into the four-quantum octahedron of energy-as-matter. And vice versa. Q.E.D.

A few paragraphs later, Fuller introduces the tetrahelix with another inscrutable rhapsody on geometry and physics:

With this fourth, invisible tetrahedral addition the overall triple-bonded tetrahedral array becomes either rightwardly or leftwardly spiraled to produce a four-tetrahedron tetrahelix, which is a potential, event embryo, electromagnetic-circuitry gap closer. Transmission may thereafter be activated as a connected chain of the inherently four-membered, individual-link continuity. This may explain the dilemma of the wave vs the particle.

I think I understand the temptation to see deep meaning or even magic in the way the most symmetrical solids arrange themselves in space. The building blocks are simple, the patterns they form wonderfully intricate and precise. It looks as if there’s something very special about all the dimensions and angles in these structures, so that the slightest change would ruin everything—would tear the very fabric of the universe.

Well, maybe. But in the case of the tetrahelix I’d like to argue that this object—however handsome and intriguing it might be—is not a freak of nature or a miracle. Its existence does not depend on some exquisitely refined and fragile property of the tetrahedron. In support of this notion I offer the exhibit below.

helix formed from pentagonal bipyramids

Let’s call this structure a pentahelix. If you focus on the three strands of red links, they appear to follow the same paths as those in a left-handed tetrahelix. Thus you might guess that what we have here is just a tetrahelix “decorated” with some extra bumps along the perimeter. But that’s not what you’re looking at; there are no tetrahedra at all in this construction. Instead, the component parts are pentagonal dipyramids. This polyhedron looks at first glance as if it might consist of five regular tetrahedra arranged around a central axis—and in Aristotle’s world it might indeed be built that way. In our universe, however, the lengths and angles aren’t quite right. To fill 360 degrees with five equal wedges takes 72 degrees per wedge; the angle between faces in a regular tetrahedron falls short by about a degree and a half. Thus the pentagonal dipyramid is a structure fundamentally different from anything built of tetrahedra. The vertex coordinates are full of \(\sqrt{5}\), whereas the tetrahedron is all about \(\sqrt{3}\). Yet, in spite of these differences, the pentagonal shapes can be knitted together to make a fine-looking helix—slightly less symmetrical than the true tetrahelix, but a recognizable member of the same family.

Drawing of the collagen triple helix by David Goodsell

The three-stranded helix of the collagen protein. Drawing courtesy David S. Goodsell and the RCSB Protein Data Bank.

The fact is, helices seem to be very robust and easy to produce. They’re everywhere in nature. For example, the collagen protein (illustration at right), is a three-stranded helix much like both the tetrahelix and the pentahelix. Collagen genes have accumulated thousands of mutations while still producing a functional molecule; thus an immense number of different amino acid sequences (and slightly different geometries) all give rise to very similar triple helices.

Funny world we live in. If you want to design an object that will tile all of three-dimensional space, the requirements are stringent. But helices aren’t so finicky. When you stack symmetrical objects along one axis, it seems hard not to make a helix.

Update 2013-11-25: I am late to the party with this note, but for the benefit of those who have not been reading the comments, let me point out that the question I raised about the “arched bridge”—Can it be extended to form a closed ring?—was asked and answered long ago, but that fact hasn’t stopped ingenious folks from having fun with it recently.

As explained in a comment below by Stan Wagon, the question was posed in 1957 by Hugo Steinhaus in the problem section of the Polish journal Colloquium Mathematicum. (A notation there suggests the matter may have been discussed a year earlier in The New Scottish Book.) A proof that no such closed ring of tetrahedra can exist was published two years later by Stanisław Świerczkowski, also in Colloquium Mathematicum.

Not to be deterred by a mere proof, Michael Elgersma and Wagon have produced a lovely tetrahedral torus, printed in genuine 3D by Shapeways:

3d-printed tetrahedral torus by Michael Elgersma and Stan Wagon

For the secret behind this magic trick, see the not-yet-published paper Closing a Platonic Gap by Elgersma and Wagon. Comments below by Nicholas Mecholsky and Robert Mathieson include links to other constructions.

Read more about it:

Bernal, J. D. 1964. The structure of liquids. (The Bakerian Lecture, 1962.) Proceedings of the Royal Society of London, Series A, Mathematical and Physical Sciences 280(1382):299-322.

Boerdijk, A. H. 1952. Some remarks concerning close-packing of equal spheres. Philips Research Reports 7(4):303-313.

Coxeter, H. S. M. 1985. The simplicial helix and the equation tan n theta = n tan theta. Canadian Mathematical Bulletin 28 (4):385–393. PDF

Gray, R. W. Undated web page; consulted 2013-10-19. Tetrahelix data. http://www.rwgrayprojects.com/rbfnotes/helix/helix01.html

Lagarias, Jeffrey C., and Chuanming Zong. 2012. Mysteries in packing regular tetrahedra. Notices of the American Mathematical Society 59(11):1540-1549. PDF

Lord, E. A., and S. Ranganathan. 2004. The γ-brass structure and the Boerdijk–Coxeter helix. Journal of Noncrystalline Solids 334 and 335:121-125.

Sadler, Garrett, Fang Fang, Julio Kovacs, and Klee Irwin. 2013 (preprint). Periodic modification of the Boerdijk-Coxeter helix (tetrahelix). arXiv 1302.1174.

Steinhaus, H. 1957. Problem 175. Colloquium Mathematicum 4:243.

Świerczkowski, S. 1959. On chains of regular tetrahedra. Colloquium Mathematicum 6:9–10.

Zheng, Chong, Roald Hoffmann, and David R. Nelson. 1990. A helical face-sharing tetrahedron chain with irrational twist, stella quadrangula, and related matters. Journal of the American Chemical Society 112:3784-3791.

Posted in mathematics | 7 Comments

Advances in Obfuscation

Richard M. Stallman, the gnuru of the free software movement, has launched a campaign to liberate JavaScript. His call to action begins: “You may be running nonfree programs on your computer every day without realizing it—through your web browser.”

My first reaction to this announcement was bafflement. Why pick on JavaScript? I had been thinking that one of JavaScript’s notable virtues is openness. If you include a JavaScript program in a web page, then anyone who can run that program can also inspect the source code. Just use the browser’s “View Source” command, and all will be revealed.

Reading on in Stallman’s complaint, I began to see more clearly what irks him:

Google Docs downloads into your machine a JavaScript program which measures half a megabyte, in a compacted form that we could call Obfuscript because it has no comments and hardly any whitespace, and the method names are one letter long. The . . . compacted code is not source code, and the real source code of this program is not available to the user.

I have to agree that obfuscated JavaScript is annoying; indeed, I have griped about it myself. Although there’s an innocent reason for distributing the code in this “minified” form—it loads faster—some software authors doubtless don’t mind that the compression also inhibits reverse engineering.

The very idea of software obfuscation turns out to have a fascinating theoretical background, as I learned last week at a talk by Craig Gentry of IBM Research. Gentry is also the inventor of a scheme of homomorphic encryption (computing with enciphered data) that I described in a column last year. The two topics are closely related.

Suppose you have a nifty new way to compute some interesting function—factoring large integers, solving differential equations, pricing financial derivatives. You want to let the world try it out, but you’re not ready to reveal how it works. One solution is to put the program on a web server, so that users can submit inputs and get back the program’s output, but they learn nothing about how the computation is carried out. This is called oracle access, since it is rather like interacting with the oracles of ancient Greek tradition, who gave answers but no explanations.

A drawback of running the program on your own server is that you must supply the computing capacity for all visitors to the site. If you could wrap your function up as a downloadable JavaScript program, it would run on the visitor’s own computer. But then the source code would be open to inspection by all. What you want is to let people have a copy of the program but in an incomprehensible form.

The kind of obfuscation provided by a JavaScript minifier is of no help in this context. The result of the minification process is garbled enough to peeve Richard Stallman, but it would not deter any serious attempt to understand a program. The debugging facilities in browsers as well as widely available tools such as jsbeautifier offer help in reformatting the minified program and tracing through its logic. The reconstruction may be tedious and difficult, but it can be done; there is no cryptographic security in minification.

Secure obfuscation is clearly a difficult task. In ordinary cryptography the goal is to make a text totally unintelligible, so that not even a single bit of the original message can be recovered without knowledge of the secret key. Software obfuscation has the same security goal, but with a further severe constraint: The obfuscated text must still be a valid, executable program, and indeed it must compute exactly the same function as the original.

In a way, program obfuscation is the mirror image of homomorphic encryption. In homomorphic encryption you give an untrusted adversary \(A\) an encrypted data file; \(A\) applies some computational function to the data and returns the encrypted result, without ever learning anything about the content of the file. In the case of obfuscation you give \(A\) a program, which \(A\) can then apply to any data of his or her choosing, but \(A\) never learns anything about how the program works.

For many years fully homomorphic encryption was widely considered impossible, but Gentry showed otherwise in 2009. His scheme was not of practical use because the ability to compute with encrypted data came at an enormous cost in efficiency, but he and others have been refining the algorithms. During his talk at the Harvard School of Engineering and Applied Sciences last week Gentry mentioned that the time penalty has been reduced to 107. Thus a one-minute computation on plaintext takes about 200 years with encrypted data.

Will the problem of secure obfuscation also turn out to be feasible? There is a bigger hurdle in this case: not just a widespread feeling that the task is beyond our reach but an actual impossibility proof. The proof appears in a 2012 paper by seven authors: Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan, and Ke Yang (BGIRSVY). Ref: On the (im)possibility of obfuscating programs. Journal of the ACM 59(2):Article 6. Most of the work described in the 2012 article was actually done a dozen years earlier and was reported at CRYPTO 2001. Preprint. They give the following informal definition of program obfuscation:

An obfuscator \(\mathcal{O}\) is an (efficient, probabilistic) “compiler” that takes as input a program (or circuit) \(P\) and produces a new program \(\mathcal{O}(P)\) satisfying the following two conditions.

— Functionality: \(\mathcal{O}(P)\) computes the same function as \(P\).

— “Virtual black box” property: Anything that can be efficiently computed from \(\mathcal{O}(P)\) can be efficiently computed given oracle access to \(P\).

In other words, having direct access to the obfuscated source text doesn’t tell you anything you couldn’t have learned just by submitting inputs to the function and examining the resulting outputs.

BGIRSVY then prove that the black-box obfuscator \(\mathcal{O}\) cannot exist. They do not prove that no program \(P\) can be obfuscated according to the terms of their definition. What they prove is that no obfuscator \(\mathcal{O}\) can obfuscate all possible programs in a certain class. The proof succeeds if it can point to a single unobfuscatable program—which BGIRSVY then proceed to construct. I won’t attempt to give the details, but the construction involves the kind of self-referential loopiness that has been a hallmark of theoretical computer science since before there were computers, harking back to Cantor and Russell and Turing. Think of programs whose output is their own source code (although it’s actually more complicated than that).

The BGIRSVY result looks like bad news for obfuscation, but Gentry has outlined a possible path forward. Ref: 2013 preprint. Candidate indis­tinguishability obfuscation and func­tional encryption for all circuits. Preprint. His work is a collaboration with five co-authors: Sanjam Garg, Shai Halevi, Mariana Raykova, Amit Sahai, and Brent Waters (GGHRSW). They adopt a different definition of obfuscation, called indistinguishability obfuscation. The criterion for success is that an adversary who is given obfuscated versions of two distinct but equivalent programs—they both compute the same function—can’t tell which is which. This is a much weaker notion of obfuscation, but it seems to be the best that’s attainable given the impossibility of the black-box definition. An obfuscator of this kind conceals as much information about the source text as any other feasible method would.

GGHRSW introduce a “candidate” construction for an indistinguishability obfuscator, which works for all programs in an important complexity class called \(NC^{1}\). The algorithm relies on a transformation introduced almost 30 years ago by David Barrington. Ref: Barrington, David A. 1986. Bounded-width polynomial-size branching programs recognize exactly those languages in \(NC^{1}\). In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing, STOC ’86, pp. 1–5. Preprint. The class \(NC^{1}\) is defined in terms of networks of Boolean gates, but Barrington showed that every such network is equivalent to a “branching program” where the successive bits of the input determine which of two branches is chosen. The entire computation can then be expressed as the product of \(5 \times 5\) permutation matrices; the essence of the obfuscation step is to replace those matrices with random matrices that have the same product and therefore encode the same computation. There are exponentially many choices for the random matrices, and so any given program has many possible obfuscations.

Gentry calls the scheme a “candidate” because the security guarantee depends on some rather elaborate assumptions that have not yet been thoroughly checked. The team is working to put the method on a firmer foundation.

GGHRSW write that “it is not immediately clear how useful indistinguishability obfuscators would be,” but they offer some possibilities. Software vendors might use the technique for “secure patching”: fixing a flaw without revealing what’s being fixed. Or obfuscation could help in producing “crippleware”—try-out versions of a program that have certain functions disabled. In these applications indistinguishability obfuscation would make it impossible to identify the differences between two versions of a program. Watermarking, which tags each copy of a program with a unique identifier, is another likely use.

In his Harvard talk Gentry also looked a little further into the future to suggest that obfuscation might be valuable when it comes time to upload your brain to the Internet. The idea, presumably, is that you could allow others to borrow (or rent?) your cognitive abilities without disclosing all your private knowledge and personal memories. Since I already have only erratic access to my own memory, I’m not sure any further obfuscation is needed.

Two further comments.

“Use less big word.” Grafito, Russell Field, Cambridge, MA. Photographed 2013-09-20. (Mouseover to magnify.)graphito reading "use less Big word"

First, I find both homomorphic encryption and indistinguishability obfuscation to be fascinating topics, but linguistically they are somewhat awkward. The word sesquipedalian comes to mind. In the case of indis­tinguish­ability ob­fus­cation we are just two syllables short of super­cali­frag­ilistic­expiali­docious. I’m hoping that when Gentry makes his next breakthrough, he’ll chose a term that’s a little pithier. (The GGHRSW paper sometimes refers to garbled rather than obfuscated programs, which seems like a step in the right direction.)

Second, returning to the everyday world of JavaScript programs in web pages, I have to note that Richard Stallman will probably not applaud advances in obfuscation, particularly when they are applied to crippleware or watermarking. I can’t match Stallman’s idealogical zeal for open software, but I do prefer to make my own code available without artificial impediments. For example, the bit of JavaScript that creates a magnifying loupe for the photograph above is available without obfuscation. (I have not included a copyleft license, but there’s a comment reading “Please feel free to use or improve.”)

Posted in computing | 5 Comments

Who was Aleph Null?

Thirty years ago this month Scientific American launched a new column called “Computer Recreations.” I wrote the first six installments, then passed the baton to A. K. Dewdney.

Soon after the first column appeared, we received a letter pointing out that the title “Computer Recreations” had already been used by the journal Software—Practice and Experience. I was embarrassed by this revelation, not because of any worries about legal conflicts but because I should have known what other publications were doing in the same genre. I went off to the library to look at some back issues of SP&E. Here’s what I found in Vol. 1, No. 1, published a dozen years earlier, in 1971:

heading of the first

Through the rest of 1971 and into 1972 there were half a dozen more “Computer Recreations” columns attributed to \(\aleph_0\!\), followed by a couple of guest columns (the first by Paul Bratley and Jean Millo of the University of Montreal, the second by Andrew Tanenbaum of the Vrije Universiteit in Amsterdam). After that, “Computer Recreations” disappeared from the pages of Software—Practice and Experience, never to be mentioned again.

Recently I had occasion to take a fresh look at the SP&E columns, and I realized that I have never learned the true identity of the mysterious \(\aleph_0\!\). Maybe one of my readers can help unmask the author. Or maybe the real Aleph Null will step forward now to take credit for his or her work. I’m also curious about a figure called Archimedes, who appears in several of the columns as a mentor or sidekick of \(\aleph_0\!\). In one column Archimedes is said to be “a group of mathematicians.”

For a while I thought that Aleph Null and Archimedes might have been members of the Bell Labs gang that was so lively and prolific in the late 60s and early 70s—Dennis Ritchie, Ken Thompson, Brian Kernighan, Doug McIlroy, Michael Lesk and others—the crew who gave us C and Unix and so much else. But that can’t be right. One of the SP&E columns discusses Darwin, a game in which programs compete for space and survival in a computer’s memory. (Dewdney later wrote about a similar system called Core War.) Darwin was invented at Bell Labs, but the SP&E column describes the program from an outsider’s perspective: \(\aleph_0\!\) and Archimedes talk to the Bell Labs group and then build their own implementation of the system.

Jacket of A. G. Bell's 1972 book In addition to seven “Computer Recreations” columns, \(\aleph_0\!\) is also listed as the author of a 1973 book review in SP&E. The book under review is Games Playing with Computers, by A. G. Bell. For a few days this summer I entertained the notion that \(\aleph_0\!\) = A. G. Bell. The review of Bell’s book is rather harsh, but that didn’t dissuade me; after all, every author is his own worst critic. A glance at the book would surely settle the matter, I thought, but none of the libraries within easy reach of Boston could supply a copy. Eventually I bought one online (for $3.44, plus shipping). I’m afraid I found no support for my cute self-reviewing hypothesis. Stylistically, Aleph Null and A. G. Bell have nothing in common.

Software—Practice and Experience is a publication with British roots. The first editors were C. W. Barron of the University of Southampton and C. A. Lang of Cambridge. The journal’s Cambridge connections were particularly strong in those early days, with a board of editors that included Maurice V. Wilkes, who had headed the Cambridge Computer Laboratory since 1945. I now incline to the view that Aleph Null and Archimedes were part of this milieu—almost surely from Britain and probably from Cambridge. But I have made no further progress in identifying them.

If you would like to join in the guessing game, here are a few clues to consider, culled from the seven published \(\aleph_0\!\) columns:

  • In the first column Aleph Null writes of himself or herself: “I confess to a personal interest in artificial intelligence.”
  • In another column the author mentions “visiting the northern parts of the British Isles,” from which I infer that Aleph Null is not native to those northern parts.
  • Archimedes is described in one column as “myopic” and in another as “cryptomnesiac.” Who knows what that’s about?
  • There’s a PDP-7 in the picture. Archimedes seems to be adept at hacking on it. (This was a factor in my initial guess about Bell Labs, since the first version of Unix was created there on a PDP-7. But the Cambridge Computer Lab had one as well.)
  • Most of the code in the columns is Fortran. There’s also some talk of BCPL (developed at Cambridge) and POP-2 (developed at Edinburgh). Plus lots of assembler. No hint of Algol.

Regrettably, the columns themselves are impounded behind a John Wiley paywall. Here are the topics:

1971 (1). The Military Game, or Napoleon.

1971 (2). MOO (also known as Bulls and Cows; similar to Master Mind).

1971 (3). The African board game kalah.

1971 (4). Space-filling curves with a plotter.

1972 (1). Darwin.

1972 (2). “April Foolery.” Puzzles on topics such as algorithms for card shuffling.

1972 (3). More curves with a plotter.

1972 (4). [By Bratley and Millo]. Self-reproducing programs.

1973 (4). [By Tanenbaum]. The game of JOTTO.

Update 2013-09-06: Commenter Shecky R has apparently solved the mystery. Using a nifty new technology called Google, he tracked down a bibliographic entry presented as follows:

Aleph Null (C. A. Lang). “Computer Recreations: Darwin.” Software: Practice and Experience v2, Jan-Mar 1972, p93-96.

Using another nifty technology called a library, I’ve looked at the book in which this entry appears (Analyzing Computer Security: A Threat/Vulnerability/Countermeasure Approach, by Charles P. Pfleeger and Shari Lawrence Pfleeger). There’s no hint in the book of how the Pfleegers know that Aleph Null = C. A. Lang, but I have no reason to doubt the identification.

With a little more sleuthing—it’s quite a marvel, this Google thing—I have learned that C. A. Lang is Charles Lang, who went on to a distinguished career in computer-aided design and three-dimensional modeling. From David E. Weisberg:

Charles Lang graduated from Cambridge University in 1960 with a degree in engineering. After several years at Mullard Research in England, Lang enrolled as a graduate student at MIT in 1963 and worked at Project MAC for more than 18 months with Doug Ross and Steven Coons. Lang was then recruited back to Cambridge University’s Computer laboratory by Professor Maurice Wilkes.

The British Government arranged funds and resources that underwrote the activities of the Cambridge CAD Group in 1965 — the group founded by Wilkes and headed by Lang until 1975. In 1969, the group started developing solid modeling software on a Digital PDP-7 system using assembly language programming. “Unfortunately no one told us it was impossible,” says Lang. “But then Ian Braid turned up.”

Now, what about this Archimedes bloke?

Update 2013-09-08: Shecky R solved the mystery and now Barry Cipra has unsolved it, with a 1971 document that includes the statement: “It should be noted that Aleph-Null and C. A. Lang are not the same person.”

Posted in computing, meta | 9 Comments

Black and white in one dimension

At the 50th anniversary of the March on Washington I find myself writing again about racial segregation, and specifically about Thomas C. Schelling’s mathematical model of how people wind up living in monochrome neighborhoods. I have mentioned Schelling’s model several times before (e.g., 2004, 2006, 2010). Now it’s the subject of my new “Computing Science” column in the September-October American Scientist.

My reason for returning to the subject yet again is not just that the issue never goes away. It turns out there’s a new theoretical development, which to some extent alters the interpretation of the model.

In Schelling’s model individuals have a fixed intolerance threshold, τ, between 0 and 1. If the fraction of like-colored neighbors is less than τ, the resident will seek a new home elsewhere. Most often, τ is set equal to 1/2, and so each person’s preference is simply not to be in a minority. Thus a totally integrated housing pattern, in which everyone has equal numbers of black and white neighbors, would be acceptable to all. But that’s not the outcome the model predicts.

In one of my earlier articles, I described Schelling’s findings as follows:

The model suggests that extreme segregation can arise spontaneously even when no one in the population desires that outcome…. Neither race seeks to escape or exclude the other; the only racial preference is of the mildest sort—a desire not to be entirely cut off from people of one’s own group. But that desire is enough to cause the two populations to separate like oil and water.

There may be versions of the model for which those statements are true, but it’s now clear that my description is too strongly worded when applied to the specific models that Schelling studied in his first publications on the subject, circa 1970.

Schelling began with one-dimensional models, where individuals are arranged along a line. In a 1971 paper he presented his results in “typewriter diagrams,” representing the races by ‘+’ and ‘o’ symbols, with dots to mark those individuals who are unhappy with the racial composition of their neighborhood. After repeatedly applying his movement procedure (which I’ll explain below), everyone is happy but the races have separated into homogeneous blocks.

Schelling typewriter model 1971

Schelling also mentioned experiments in which the model becomes a game played with coins. Again, a random initial state evolves into a stable, somewhat segregated final state:

Schelling coins initial 900x184
Schelling coins solution900x191

The new theoretical discoveries concern one-dimensional models much like Schelling’s original. The first results were reported last year at the Symposium on the Theory of Computing by a group I shall identify as BIKK: Christina Brandt, Nicole Immorlica, Gautam Kamath and Robert Kleinberg (arXiv preprint). The findings are based on analytical methods rather than computer simulations.

The BIKK model is one-dimensional, but with the ends of the line joined to form a ring. Three parameters define an instance of the model: The population size N, the intolerance threshold τ and the neighborhood radius w, which specifies how far to the left and right each agent looks when assessing neighborhood composition. Each agent’s neighborhood includes 2w + 1 sites, including the agent itself. The BIKK analysis was confined to models with τ = 1/2, with N large and w \(\ll\) N. The system evolves through an exchange process: At each step, two unhappy agents of opposite color are chosen at random and made to switch places; with τ = 1/2, the exchange is guaranteed to make them both happy. The moves continue until no more qualifying pairs can be found.

The BIKK group was able to prove that this process always terminates; it can’t get stuck in an endless cycle where the same agents keep shuffling from place to place. Furthermore, they were able to characterize the degree of segregation in the final state: The mean length of the monochromatic runs is a polynomial function of w, the neighborhood size; it is not a function of N, the population size. This is an important distinction. It means the model with τ = 1/2 cannot account for the kind of metropolitan-scale segregation seen in many large American cities. Instead it predicts smaller, neighborhood-size clusters of each race.

The key idea behind this result is the observation that whenever a monochromatic cluster reaches a size of w + 1, it can never be invaded or broken up by the opposite color. The BIKK workers prove that many such “firewall” clusters will almost surely evolve in any large-N system, and so they cannot grow very large before they bump into one another. As Kleinberg put it in an email: “What prevents segregation from happening on a massive scale is that smaller-scale segregation happens first.” For a little more on how the proof works, see my column; for a lot more, see the BIKK paper.

As noted above, the BIKK result considers only the case of τ = 1/2. In a subsequent paper, George Barmpalias, Richard Elwes and Andy Lewis-Pye analyzed the same family of models across the entire spectrum of τ values, from 0 to 1. They discovered an intriguing phase diagram, which includes regions of more severe segregation (i.e., longer monochromatic runs) both for higher values of τ and also for certain ranges of lower values. The latter result seems paradoxical: A greater tolerance for racial diversity leads to a more segregated society. But the explanation is not so hard to fathom: Agents willing to accept minority status are less likely to assemble into monochromatic “firewalls”; with a lower concentration of firewalls, the few that do develop have room to grow larger before they collide.

What does all this tell us about race relations among real people in real cities? The mismatch between the predicted and the observed scale of segregated neighborhoods calls for some explanation. Maybe the prevalence of huge, city-scale racial ghettos means we are much more intolerant than we choose to believe—that the true value of τ in the American population is significantly greater than 1/2. Or, following the result of Barmpalias et al., maybe τ is somewhat less than 1/2, putting us into the “paradoxical” regime. Another possible explanation lies in the one-dimensionality of the solved models; two-dimensional versions of the Schelling model may have significantly different behavior. (Extending the analytic methods to the two-dimensional case looks difficult.)

I can pile on one more hypothesis: The specific models analyzed in these two papers are “zero temperature” models, without any sort of thermodynamic noise. Two agents can exchange places only if the move is strictly beneficial to both of them; unfavorable or even neutral moves never take place. The zero temperature assumption may not be the best approximation to the real world, where people have imperfect knowledge and sometimes act outside the bounds of strict rationality. Perturbed models that admit some small probability of unfavorable actions seem to predict quite different global behavior. But, again, including nonzero temperature effects makes analysis harder.

Finally, one might also entertain the possibility that an emotionally frought and historically complex social institution is unlikely to yield all its secrets to a very simple, abtract, mathematical model.

There’s one further point to make about the model itself. The prediction of self-limiting segregation in the τ = 1/2 case came as a surprise to me, but it should not have. A glance at Schelling’s own diagrams (including the one reproduced above) shows that the small scale of the racially divided neighborhoods could have been guessed as early as 1971—but Schelling did not call attention to this fact. It was apparently first noted about five years ago by Dejan Vinkovic and Alan Kerman and then by Dietrich Stauffer and S. Solomon.

The BIKK model is very similar to Schelling’s original scheme, but not quite identical. The main difference is that instead of exchanging pairs of unhappy agents, Schelling insert diagramSchelling moved each unhappy agent to the nearest position where it would be happy. The displaced agent intruded itself between two other sites, with all the intervening agents shifting left or right to make room. Sometimes such a move would alter the status of other agents in the old or the new neighborhood, so that a previously happy individual would become discontented, or an unhappy one would become satisfied without having to move. As a model of residential segregation, this mechanism is rather peculiar; households don’t elbow their way in between neighboring houses. Nevertheless, the model has one important property that distinguishes it from all the later variants: It is wholly deterministic. Nothing about the model is random except for the initial distribution of the agents. Given a specific initial configuration, the evolution of the system can be predicted with certainty.

As far as I know, Schelling’s deterministic model has not been studied much since he worked on it more than 40 years ago. (One reason might be that although it’s easy to implement the model with pencil and paper or with coins, it becomes quite a mess when you try to encode the rules in a computer program.) I was curious about whether this model also predicted self-limiting segregation—in other words, whether the mean run length is a function of w or of N. Some experiments provide pretty clear evidence:

For each value of w from 1 through 30 the dots give the mean length of monochromatic runs averaged over 1,000 repetitions. In all cases the population size N is 1,000 × w.

mean length of monochromatic runs as a function of w

The mean run length appears to have a simple linear relation to w. This is consistent with the BIKK group’s findings for their random-exchange version of the model. They have proved that the mean run length of monochromatic blocks grows no faster than \(w^2\), and they believe the true rate of growth is probably linear in \(w\).

Posted in mathematics, social science | 3 Comments

The Mother Ditch

I was in Santa Fe for a couple of weeks. When I went out for my morning run, I followed a paved pathway along the “river” (a parched, sandy channel, even in the monsoon season). I found my way back to town on another path, the Acequia Madre Trail, which follows the course of what I took to be a small creek (also dry). Later I learned that the Acequia Madre is not a creek but an irrigation ditch. Indeed, the name means “mother ditch”: It is the primary artery of an irrigation network built 400 years ago when the Spanish first came to this valley.

The Acequia Madre de Santa Fe. Pronounced “ah-SAY-keyah.” As my wife Ros explains: The stress is always on the next-to-last syllable, but in this case the last syllable has two syllables.

The Acequia Madre de Santa Fe, near the rail yards and the School for the Deaf

Cris Moore, one of my hosts at the Santa Fe Institute, told me that water is run through the ditch at least once in a year in a legal ritual that maintains the right of landowners to claim a share of the flow. (Cris learned this while “serving two consecutive four-year sentences” on the Santa Fe City Council.)

onions at the Crawford stand at the Santa Fe Farmer's Market, 2013-08-17

I learned more about acequias from Stanley Crawford, who grows various kinds of alliums and writes various kinds of books on a small farm northeast of Santa Fe, halfway to Taos. I met Stanley and his wife Rose Mary at the Santa Fe farmer’s market when I stopped to photograph a heap of their immaculate onions—as hefty, round and polished as bocce balls. I left their booth carrying away a couple of Stanley’s books. One is a novel, which I remember seeing but not reading in its original 1972 edition: Log of the S.S. The Mrs. Unguentine. The other book, which I began reading on the flight home to Boston, is Mayordomo: Chronicle of an Acequia in Northern New Mexico.

The book recounts one year of Crawford’s tenure as mayordomo of the Acequia de la Jara, a ditch serving two dozen small farms in the foothills of the Sangre de Cristo mountains. (This is a work of nonfiction, but names of people and places are altered.) The year begins with the spring cleaning, when all the parciantes—those entitled to a share of the water—must either show up to help clear the ditch of silt, sand and vegetation or else pay a worker to take their place. With a crew of 17, the process takes two days, plus part of a third day to “bring the water down” from a diversion dam upriver. The mayordomo awaits the stream’s arrival on his own land:

Pitchfork in hand, I walk up the winding channel to meet the descending water. By the time it gets here the flow will be pushing a large roll of debris, and I will walk it back down through my place and into the next if need be, to fork out as much as I can…. All is quiet. This is a strange wait, as if for a train on tracks that curve out of sight behind a hill…. I am about to walk a little further up when a brown and grey tongue slips into view around a bend and rolls toward me, its dry leaves hissing softly, twigs snapping, approaching like some creature, a giant snake that has assumed a somewhat disorderly disguise.

For the rest of the season, gravity does most of the work, but there are floods and droughts to cope with, plus muskrats and beavers, leaks and plugged channels. Most of all there are the landowners to cope with, for the acequia is not only a technological artifact; it is a social and political and cultural institution as well. The affairs of the ditch are governed by a three-member commission, elected by all the parciantes; the commissioners in turn appoint the mayordomo. The mayordomo is responsible not just for keeping the water flowing but also for equitably distributing it among the parciantes. And what is equitable? At the Acequia de la Jara a fair share is not to be defined by statute or contract but rather by history, by unwritten customs and habits, and by ongoing face-to-face negotiations. The process is not friction-free, but it seems to work.

The water in an irrigation ditch is a shared resource, like the unfenced common pastures that so famously became so tragic in early-modern England. In fact, the irrigation ditch is even more vulnerable to selfish exploitation than a pasture that’s equally available to all: Those on the upper reaches of the ditch have it within their power to divert all the flow into their own fields, depriving those below. Yet they choose not to do so. What explains their forbearance?

At a zanja near Heber, CA, headgates regulate flow into farmers' fields.

At a zanja near Heber, CA, on the Mexican border, headgates regulate flow into farmers’ fields.

Some years ago I spent a few days exploring the irrigation works of the Imperial Valley in southern California, another place where water is distributed through public ditches—though out there they call the ditches zanjas rather than acequias, and the ditch boss is not a mayordomo but a zanjero. The Imperial Valley is a site of intensive, industrialized, large-scale agriculture; the growers have big bucks at stake. And water is the limiting resource: Every drop must be allocated and accounted for. The headgate controlling flow into each field is opened and closed on a precise schedule. Padlocks on the gates discourage cheating. In this social setting, the zanjero becomes a referee and an enforcer—an agent of a higher power. The rules of the game are rigidly legalistic.

Back on the Acequia de la Jara, attitudes are more casual—or at least they were in the era Crawford is describing, circa 1985. There are no corporate farms here; indeed, some of the growers are really gardeners rather than farmers. Most years, there’s more than enough water to go around. And even when some form of rationing is needed, the arrangement is informal. Neighbors meet, discuss, compromise. They “divide the ditch,” giving all the water to the upper farms during half the week, then shifting the flow to the lower farms. Even in the driest periods, everybody gets “a little water for the chickens.”

The continuing success of this small-scale self-organizing system—which has apparently been flourishing in the back country of New Mexico for hundreds of years—gives me hope for the future of democracy. The community’s problem-solving ability is especially impressive given the recent ethnic diversification, as long-tenured Hispanics have had to absorb an influx of undocumented Anglos, with concomitant differences along dimensions of language, religion, wealth, and education.

A whole school of history associates irrigated agriculture with totalitarian governments. The underlying notion is that organizing the labor to tame a river requires a large-scale bureaucracy and a top-down power structure, often imposing slavery or peonage on part of the population. The historian Karl Wittfogel wrote of “oriental despotism” in the irrigation-based civilizations of Mesopotamia, Egypt and the Indus valley; according to his “hydraulic theory of the state,” irrigation and repression are causally linked. The early history of the acequias in northern New Mexico may well have followed the same pattern; it would be surprising if the Spanish conquistadors and missionaries did not rely on the forced labor of the indigenous Pueblo communities to dig the first ditches. Nevertheless, the little band of farmers (and writers) along the Acequia de la Jara has made something more cheering from this dark legacy: a social collaboration that appears to meet their own needs, along with some beautiful onions and garlic.

Posted in off-topic, social science, technology | 2 Comments

The wisterium of memory

huge droopy pinkish purplish blooms at the Davis Square T station in Somerville MA, 2011-05-21

Every spring it’s the same story. Those pendulous clusters of pinkish purplish blossoms come droopling off a pergola in a little park around the corner, and I spend the next half hour desperately searching my memory for the word. I know it’s in there, if only because I’ve been through this before, last year and the year before and the year before that. It’s in there but it’s stuck in one of the crevices of my cerebral cortex, and I can’t pry it loose. What is the name of that flower?

This year I took notes—looking over my own shoulder as I ransacked my mind for the missing word. The first term that came to me was hydrangea: obviously wrong but at least hitting the right general category of flowering plants. Verbena is a little closer—the flowers notes recording the struggle to find the missing word of verbena are approximately the right color—but the strange thing is, I didn’t know that (or I didn’t know that I knew that) until I googled it just now. I do know what forsythia looks like, and I know it has almost nothing in common with the plant I’m trying to name, but my brain offers it up anyway.

The next entry in the list, purple flower, reveals the futility of trying to force the issue. That’s not my brain talking to me; that’s me talking to my brain, ordering it to give me the word I want. Vine may be in the same category, although the initial v could be significant (see below).

There’s a whole campfire story behind the next entry. Three boy scouts, their troop leader and two weeks’ worth of camping gear are crammed into a Volkswagen Beetle, circa 1962. As we putter down a back-country road, the scoutmaster leans his head out the window, inhales with exaggerated gusto, and exclaims, “Smell that honeysuckle!” All through the summer camp session we mocked him mercilessly for this excess of enthusiasm, but he has had his revenge: The phrase is tattooed on my left temporal lobe. It shows up every year when I’m entangled in this crazy quest to name the flower that’s not a hydrangea, not a verbena, not a forsythia, and not a honeysuckle either.

Southern: Me getting bossy with my brain again. When I lived in the Carolinas, I had a tree out front that was overgrown by the purple-flowered vine I can’t name.

v, v, v… Suddenly I’m sure the word starts with v, and this is the first moment when I feel like I’m making progress, that I have the scent, I’m on the trail. A v on the tip of my tongue, but not verbena, not vine

Not vertiginous either.

And hydrangea—what are you doing here again? I already told you to go home; you’re not wanted.

Wissahickon Creek runs along the northwest boundary of the city of Philadelphia, toward Germantown and Chestnut Hill. There was a period in my youth when the road that winds along the creek was the route I took to visit a girlfriend. Late one rainy night I drove my 59 Chevy through deep water covering the road and soon had a white-knuckle moment when I discovered I had no brakes. It’s not surprising that I would remember all this, but what could it have to do with my unnamed flower?

And then, all of a sudden, the neuron finally fires. Wissahickon is not about the creek or the road or the girlfriend or the wet brakes; it’s all about the word itself. Not v but wwis, wis, wis… I’ve got it now: wisteria! What a relief. (Until next spring.)

What is it about this genus Wisteria that so troubles my mind? There are scads of other plants I can’t identify, but that’s because I truly don’t know their names; no amount of rummaging in the attic of memory will ever succeed in retrieving what I never possessed. On the other hand, there’s a whole garden full of flowers—hydrangea, verbena, forsythia—that come effortlessly to mind, even when they are not wanted. Wisteria is different: something I own but can’t command.

Freud would doubtless opine that I must have suppressed the memory for some dark reason. I can play at that game. Wisteria is supposedly named for Caspar Wistar, notable Philadelphia physician circa 1800. He was inspired to study medicine after tending the wounded at the Battle of Germantown in 1777—an event that took place within cannon range of Wissahickon Creek. (Surely that can’t be a coincidence!) Dr. Wistar is also commemorated in the Wistar Institute of Anatomy and Biology in Philadelphia, a place I used to visit when I was a boy of 12 or 13. In that benighted age, impressionable children were allowed to wander freely through the Wistar’s collection of anatomical specimens, where various pickled body parts were sealed in gallon jars. What formaldehyde-soaked vision of horror am I trying so hard not to remember?

The trouble with this story is that until a few days ago I knew nothing of the connection between wisteria and Wistar. Not that Dr. Freud would let me off the hook for such a flimsy reason as that.

Of course wisteria is not the only word that my mind seems to be censoring. There’s that actress whose name I can never remember, although I have no trouble recalling that she’s the sister of Warren Beatty. And, along the same lines, the actor I can’t name, even though I can tell you he is the brother of James Arness. Can I find a childhood trauma to account for this distinctive histrionic sibling suppression complex?

Francis Crick wrote of our attempts to understand the brain: “We are deceived at every level by our introspection.” I believe it. I don’t think that looking over my own shoulder and taking notes as I free associate on the theme of pendulous purple flowers is going to resolve the mystery of memory. I’m fascinated by the process all the same.

In my latest American Scientist column I write on some other approaches to the question of what’s in Brian’s brain.

Posted in science | 6 Comments

A little theorem

Often I wish that I knew more mathematics and understood it more deeply. But then I wouldn’t have the pleasure of discovering afresh things that other people have known for years. (In some cases hundreds of years.)

Last week, in a discussion of Fermat primes and the Hilbert curve, ShreevatsaR remarked:

BTW, you only need to try primes that are factors of \((4^k – 1)\) for some \(k\)…. Considering powers of 4 up to 32 and prime factors greater than 65537, this means just the primes 87211, 131071, 174763, 178481, 262657, 524287, 2796203, 3033169, 6700417, 15790321, 715827883, 2147483647.

In response I asked (innocently, if skeptically):

Are there primes other than 2 that are known not to be factors of \((4^k – 1)\) for some \(k\)?

ShreevatsaR immediately replied: No, every odd prime must divide some \((4^k – 1)\). And he gave a proof based on the pigeonhole principle. The primes he had listed in his earlier comment are just those that happen to divide \((4^k – 1)\) for an unusually small value of \(k\).

In the middle of the night I had a little epiphany: This is just Fermat’s Little Theorem in disguise. One version of the Little Theorem says: If \(p\) is a prime and \(a\) is a natural number, then either \(p\) divides \(a\) or \(p\) divides \(a^{(p-1)} – 1\). To get back to ShreevatsaR’s statement, just observe that for any prime \(p\) other than 2, \(p-1\) is even, and so we can introduce an integer \(k = \frac{(p-1)}{2}\), making \(4^{k} – 1\) equivalent to \(2^{(p-1)} – 1\).

My previous encounters with Fermat’s Little Theorem have been in the context of primality testing. If you have some natural number \(n\) and you want to find out if it’s a prime, calculate \(2^{(n-1)} – 1 \bmod n\); if the result is anything other than zero, \(n\) is composite. (Unfortunately, the converse statement is not to be relied upon: If \(2^{(n-1)} – 1 \bmod n = 0\), it does not always follow that \(n\) is prime—but that’s a story for another time.)

ShreevatsaR’s comment brought to light something I had never thought about: We know that a prime \(p\) divides \(2^{(p-1)} – 1\), but \(p\) might also divide some smaller number \(2^{(m-1)} – 1\) with \(m < p\). I went searching for the smallest such \(m\) for all primes less than 10,000. Here are the results:

Mouseover to magnify.graph of the least m such that p divides 2^(m-1) - 1 for all primes p less than 10000

Each dot in the graph represents a prime \(p\). The horizontal coordinate of the dot is the magnitude of \(p\); the vertical coordinate is the least \(m\) such that \(p\) | \(2^{(m-1)} – 1\). Of the 1228 primes shown, 470 lie along the diagonal, indicating that the least \(m\) is in fact equal to \(p\). Another 348 dots lie along a line of slope \(\frac{1}{2}\): For each of these primes, \(p\) divides \(2^{\frac{(p-1)}{2}} – 1\) as well as \(2^{(p-1)} – 1\). The smallest such \(p\) is 7, which divides \(2^3 – 1 = 7\) as well as \(2^6 – 1 = 63\). It’s easy to pick out other sets of points on lines of slope \(\frac{1}{3}\), \(\frac{1}{4}\) and so on. Toward the bottom of the graph, where the least \(m\) gets smaller, the points are sparse and patterns are more difficult to perceive, but the alignments are still present.

The procedure effectively divides the primes into classes distinguished by the value of \(r=\frac{p-1}{m-1}\):

r=1:  3, 5, 11, 13, 19, 29, 37, 53, 59, 61, 67, 83, 101, 107...
r=2:  7, 17, 23, 41, 47, 71, 79, 97, 103, 137, 167, 191, 193...
r=3:  43, 109, 157, 229, 277, 283, 307, 499, 643, 691, 733...
r=4:  113, 281, 353, 577, 593, 617, 1033, 1049, 1097, 1153...
r=5:  251, 571, 971, 1181, 1811, 2011, 2381, 2411, 3221...
r=6:  31, 223, 433, 439, 457, 727, 919, 1327, 1399, 1423...

There is nothing new or original about all this. Gauss wrote about it in Disquisitiones Arithmeticae in 1801. The primes in the first list are those for which the multiplicative order of 2 mod p is \(p-1\); in other words, the set of residues 2 mod p repeats with period \(p-1\), the largest possible. In Gauss’s terminology, 2 is a primitive root of these primes. In 1927 Emil Artin conjectured that there are infinitely many primes in this set. For the second series the multiplicative order of 2 mod p is \(\frac{p-1}{2}\), for the third group it is \(\frac{p-1}{3}\), and so on. The OEIS has more on each of these series.

Nothing new, but I found the graph a useful aid to understanding. (I would not be surprised to learn that the graph isn’t new either.)

Trivia: What is the largest value of \(r\) encountered in this set of primes? Well, 8191 divides \(2^{(14 – 1)} – 1\). As a matter of fact, 8191 is equal to \(2^{(14 – 1)} – 1\). Thus we have:

\[r = \frac{p-1}{m-1} = \frac{8190}{13} = 630 .\]

Three more of the primes listed by ShreevatsaR are also of the form \(2^{n} – 1\). On the assumption that we have a boundless supply of such primes, it would appear there is no limit to the value of \(r\). [Update: Please see the comments, with illuminating contributions by ShreevatsaR, Gerry Myerson and (via Stack Exchange) David Speyer.]

Posted in mathematics | 3 Comments

Mapping the Hilbert curve

In 1877 the German mathematician Georg Cantor made a shocking discovery. He found that a two-dimensional surface contains no more points than a one-dimensional line.

Thus begins my latest column in American Scientist, now available in HTML or PDF. The leading actor in this story is the Hilbert curve, which illustrates Cantor’s shocking discovery by leaping out of the one-dimensional universe and filling up a two-dimensional area. David Hilbert discovered this trick in 1891, building on earlier work by Giuseppe Peano. The curve is not smooth, but it’s continuous—you can draw it without ever lifting the pencil from the paper—and when elaborated to all its infinite glory it touches every point in a square.

The first through fourth generations of the Hilbert curve

Above: The first through fourth stages in the construction of the Hilbert curve. To generate the complete space-filling curve, just keep going this way.

To supplement the new column I’ve built two interactive illustrations. The first one animates the geometric construction of the Hilbert curve, showing how four copies of the generation-n curve can be shrunken, twirled, flipped and reconnected to produce generation n+1. The second illustration is a sort of graphic calculator for exploring the mapping between one-dimensional and two-dimensional spaces. For any point t on the unit line segment [0, 1], the calculator identifies the corresponding point x, y in the unit square [0, 1]2. The inverse mapping is also implemented, although it’s not one-to-one. A point x, y in the square can correspond to multiple points t on the segment; the calculator shows only one of the possibilities.

The Hilbert-curve illustrations are posted in the Extras department here at bit-player.org. I encourage you to go play with them, but please come back and read on. I have some unsolved problems.

I built the Hilbert mapping calculator in part to satisfy my own curiosity. It’s fine to know that every point t can be matched with a point x, y, but which points go together, exactly? For example, what point in the square corresponds to the midpoint of the segment? The answer in this case is not hard to work out: The middle of the segment maps to the middle of the square: (t = 1/2) → (x = 1/2, y = 1/2). Shown below are a few more salient points, as plotted by the calculator:

mapping of n/12 onto the Hilbert curve, for n=0 through n=12

Rational points of the form t = n/12, for integer 0 ≤ n ≤ 12, are mapped into the unit square according to their positions along the Hilbert curve. For example, t = 3/12 = 1/4 is mapped to the point x = 0, y = 1/2, at the midpoint of the square’s left edge. Note that the Hilbert curve drawn in the background of the illustration is a fifth-stage approximation to the true curve, but the positions of the colored dots are calculated to much higher precision.

Looking at this tableau led me to wonder what other values of t correspond to “interesting” x, y positions. In particular, which fractions along the t segment project to points on the perimeter of the square? Which t values go to points along one of the midlines, either vertical or horizontal? To narrow these questions a little further, let’s just consider fractions whose denominator is a prime. (In all that follows I ignore fractions with denominator 1, or in other words the end points of the t segment.)

The diagram above already reveals that fractions with denominator 2 and 3 are members of the “interesting” class. The value t = 1/2 maps to the intersection of the two midlines. And t = 1/4, 1/3, 2/3 and 3/4 all generate x, y points that lie on the perimeter. Can we find interesting fractions with other prime factors in the denominator? Yes indeed! Here is the Hilbert mapping for fractions of the form n/5:

mapping of n/5 onto the Hilbert curve, for n=0 through n=5

Hilbert mapping for fifths. In the calculator you can generate this output by entering */5 into the input area labeled “t =”. It turns out that (t = 2/5) → (x = 1/3, y = 1), and (t = 3/5) → (x = 2/3, y = 1).

Note that 2/5 and 3/5 map to points on the upper boundary of the square. By a simple sequential search I soon stumbled upon two more “interesting” examples, namely n/13, and then n/17.

mapping of n/13 onto the Hilbert curve, for n=0 through n=13

Hilbert mapping for t = 0/13, 1/13, 2/13 … 13/13. The fractions t = 3/13, 4/13, 9/13 and 10/13 yield x, y coordinates on the perimeter of the square.

mapping of n/17 onto the Hilbert curve, for n=0 through n=17

Hilbert mapping for 17ths. The fractions t = 1/17, 4/17, 6/17 and 7/17 correspond to points on the perimeter, as well as the mirror images of these points when reflected through the vertical midline.

At this point I began to think that prime-denominator fractions yielding perimeter points must be lying around everywhere just waiting for me to gather them up. And I was not surprised by their seeming abundance. After all, there are infinitely many points on the perimeter of the square, and so there must be infinitely many corresponding t values—even if we confine our attention to the rational numbers.

When I continued the search, however, my luck ran out. In the “t=” box of the calculator I typed in all fractions of the form */p for primes p < 100; I found no points on the perimeter other than the ones I already knew about, namely those for p = 3, 5, 13 and 17.

So I automated the search. It turns out the next prime denominator yielding perimeter points is 257. Typing */257 into the calculator produces this garish display:

mapping of n/257 onto the Hilbert curve, for n=0 through n=257

Points of the form t = n/257. There are 32 points that lie along the perimeter of the square.

Beyond 257 comes an even longer barren stretch. Surveying all prime denominators less than 10,000 failed to reveal any more that produce Hilbert-curve perimeter points.

Why not search the other way? We can plug in x, y values on the perimeter and then use the inverse Hilbert transformation to find a corresponding t value. I tried that. It was easy to get a sequence of approximations to t but not so easy to deduce the limiting value.To summarize, if t = n/p is a fraction that maps to a point on the perimeter of the Hilbert curve, and if p is a prime less than 10,000, then p must be one of the set {3, 5, 13, 17, 257}. Let’s pluck 13 out of this group and set it aside for a moment. The remaining elements of the sequence look familiar. They are all numbers of the form \(2^{2^n} + 1\). In other words, they are Fermat numbers. In 1640 Pierre de Fermat examined the first five numbers in this sequence: 3, 5, 17, 257, 65537. He confirmed that each of them is a prime, and he conjectured that all further numbers formed on the same model would also be prime. So we come to the burning question of the moment: Does the fifth Fermat prime 65537 generate perimeter points on the Hilbert curve? It sure does: 512 of them.

The obvious next step is to look at larger Fermat numbers, but there’s a complication. Fermat’s claim that all such numbers are prime turned out to be overreaching a little bit; beyond the first five, every Fermat number checked so far has turned out to be composite. Still, we can see if they give us Hilbert perimeter points. The sixth Fermat number is 4294967297, and sure enough there are fractions with this number in the denominator that land on the perimeter of the Hilbert curve. For example, t = 4/4294967297 converges to x = 0, y = 1/32768. We can also check the factors of this number (which have been known since 17291732, when Euler identified them as 641 and 6700417). With a bit of effort I pinned down a few fractions with 6700417 in the denominator, such as 2181/6700417, that are on the perimeter.

It’s the same story with the seventh Fermat number, 18446744073709551617. Both this number and its largest prime factor, 67280421310721, yield perimeter points. I have not looked beyond the seventh F number. Would I be as reckless as Fermat if I were to conjecture that all such numbers generate Hilbert-curve perimeter points?

Thus the known prime denominators that give rise to perimeter points are the five Fermat primes, 3, 5, 17, 257 and 65537, as well as two prime factors of larger Fermat numbers, 6700417 and 67280421310721. Oh, and 13. I almost forgot 13. Who let him in?

Of course there are many (infinitely many) perimeter points whose associated t values do not have a prime denominator. If we run through all fractions whose denominator is less than 1,000 (when reduced to lowest terms), this is the set of denominators that maps to perimeter points:

{3, 4, 5, 12, 13, 16, 17, 20, 48, 51, 52, 63, 64, 65, 68, 80, 192, 204, 205, 208, 252, 255, 256, 257, 260, 272, 273, 320, 341, 768, 771, 816, 819, 820, 832}

It’s another curious bunch of numbers, with patterns that are hard to fathom. (The OEIS is no help.) I thought at first that the numbers might be composed exclusively of the primes I had identified earlier (as well as 2). This notion held together until I got to 63, which of course has a factor of 7. Then I thought a slightly weaker condition might hold: Other factors are allowed, but at least one factor must be drawn from the favored set. That lasted until I reached 341, which factors as 11 × 31. There must be some logic behind the selection of these numbers, but the rule escapes me. One more peculiarity: Every even power of 2 appears, but none of the odd powers.

Here is the analogous set of denominators for t values that project to the two midlines of the Hilbert curve:

{2, 4, 6, 10, 12, 16, 20, 26, 34, 48, 52, 64, 68, 80, 102, 126, 130, 192, 204, 208, 252, 256, 260, 272, 320, 410, 510, 514, 546, 682, 768, 816, 820, 832}

One difference jumps out immediately: All of these numbers are even. Yet the sets have more than half their elements in common. Every even member of the perimeter set is also a member of the midline set. Also note that again only even powers 2 are included (apart from 2 itself).

I doubt that any deep mathematical truth hinges on the question of which numbers correspond to perimeter points on the Hilbert curve. Nevertheless, having allowed myself to get sucked into this murky business, I would really like to come out of it with some glimmer of understanding. So far that eludes me.

I can explain part of what’s happening here by looking more closely at the mechanism of the mapping from t to x, y. The process is easiest to follow if t is expressed as a quaternary fraction (i.e., base 4). Thus we view t as a list of “quadits,” digits drawn from the set {0, 1, 2, 3}. Each stage of the mapping process takes a single quadit and uses it to choose one of four affine transformations to apply to the current x, y coordinates, from H0 to H3.

\[ \mathbf{H}_0 =
0 & 1/2\\
1/2 & 0
\mathbf{H}_1 =
1/2 & 0\\
0 & 1/2

\[ \mathbf{H}_2 =
1/2 & 0\\
0 & 1/2
\mathbf{H}_3 =
0 & -1/2\\
-1/2 & 0

The sequence of quadits determines a sequence of transformations, which in turn determines the mapping from t to x, y. Here is the Javascript that implements the mapping function in the online Hilbert-curve calculator:
Most of the results reported here come not from the Javascript version but from a Lisp implementation, which can take advantage of Common Lisp’s arbitrary-precision rational numbers. (Javascript has only IEEE floats.)

function hilbertMap(quadits) {
  var pt, t, x, y;
  if ( quadits.length === 0 ) {
    return new Point(1/2, 1/2);   // center of the square
  else {
    t = quadits.shift();          // get first quadit
    pt = hilbertMap(quadits);     // recursive call
    x = pt.x;
    y = pt.y;
    switch(t) {
      case 0:     // southwest, with a twist
        return new Point(y * 1/2 + 0, x * 1/2 + 0);
      case 1:     // northwest
        return new Point(x * 1/2 + 0, y * 1/2 + 1/2);
      case 2:     // northeast
        return new Point(x * 1/2 + 1/2, y * 1/2 + 1/2);
      case 3:     // southeast, with twist & flip
        return new Point(y * -1/2 + 1, x * -1/2 + 1/2);

If you’re not in a mood to pick your way through either the linear algebra or the source code, I can give a rough idea of what’s going on. With each 0 quadit, the x, y point drifts toward the southwest corner of the square. A 1 quadit steers the point toward the northwest, and likewise a 2 quadit nudges the point to the northeast and a 3 quadit to the southeast. From these facts alone you can predict the outcome of simple cases. An uninterrupted string of 0 quadits has to converge on the southwest corner of the square, which is just what’s observed for t = 0. In the same way, a quadit sequence of all 1s has to end up in the northwest corner. What t value corresponds to (0.1111111…)4? This is the quaternary expansion of 1/3, which explains why we see (t = 1/3) → (x = 0, y = 1).

The quaternary expansion of 2/5 is (0.121212…)4. Let’s trace what the Javascript code above does when it is given a finite prefix of this number. In the illustration below, the prefix is just four quadits, (0.1212)4. sequence of x, y positions in mapping t = 2/5 = quaternary 0.1212 The recursive procedure effectively works back-to-front through the sequence of quadits. The x and y coordinates are given initial values of 1/2 (in the middle of the square). The first transformation is H2, which amounts to x → (y/2 + 1/2), y → (x/2 + 1/2); after this operation, the position of the x, y point is (3/4, 3/4). The next quadit is 1, and so the H1 transformation is applied to the new x, y coordinates. H1 specifies x → (x/2), y → (y/2 + 1/2); as a result, the point moves to (3/8, 7/8). Another round of H2 followed by H1 brings the position to (11/32, 31/32). Generalizing to a more precise calculation with a longer string of quadits, it’s easy to see that the nth location in this progression will have a y coordinate equal to \(\frac{2^{n}-1}{2^{n}}\); as n goes to infinity, this expression converges to 1.

Here the quaternary expansion of 2/5 is compared with the expansions of a few other fractions that correspond to y = 1 perimeter points:

t = 2/5 = (0.12121212…)4 → (x = 1/3, y = 1)

t = 6/17 = (0.11221122…)4 → (x = 1/5, y = 1)

t = 22/65 = (0.11122211…)4 → (x = 1/9, y = 1)

t = 86/257 = (0.11112222…)4 → (x = 1/17, y = 1)

Every number t whose quaternary expansion consists exclusively of 1s and 2s projects to some point along the northern boundary of the square. It would be satisfyingly tidy if we could make an analogous statement about the other three boundary edges. For example, if all 1s and 2s head north, then quadits made up of 0s and 3s ought to wind up on the southern edge. But it’s not so simple. Consider these cases:

t = 1/17 = (0.00330033…)4 → (x = 1/5, y = 0)

t = 4/17 = (0.03300330…)4 → (x = 0, y = 2/5)

t = 13/17 = (0.30033003…)4 → (x = 1, y = 2/5)

t = 16/17 = (0.33003300…)4 → (x = 4/5, y = 0)

The behavior of 1/17 and 16/17 is what you might expect: They go south. But 4/17 and 13/17 migrate not to the bottom of the square but to the two sides. If strings of 1s and 2s are so well-behaved, what’s the matter with strings of 0s and 3s? Well, the geometric transformations associated with quadrants 1 and 2 are simple scalings and translations. The matrices for quadrants 0 and 3 are more complicated: They introduce rotations and reflections. As a result it’s not so easy to predict the trajectory of an x, y point from the sequence of quadits in the t value. In any case I have not yet found a simple and obvious general rule.

The case of t = n/13 is no less perplexing:

t = 1/13 = (0.010323010323…)4 → (x = 1/3, y = 0)

t = 4/13 = (0.103230103230…)4 → (x = 0, y = 2/3)

t = 9/13 = (0.230103230103…)4 → (x = 1, y = 2/3)

t = 12/13 = (0.323010323010…)4 → (x = 2/3, y = 0)

When I trace through these transformations, as I did above for t = 2/5, I get the right answer in the end, but I haven’t a clue to the logic that drives the trajectory. I certainly can’t look at such a sequence of quadits and predict where the point will wind up.

I am left with more questions than answers. Furthermore, I don’t know whether the questions are genuinely difficult or if I am just missing something obvious. (It wouldn’t be the first time.) Anyway, here are the two big ones:

  • Can we prove that all Fermat numbers give rise to Hilbert-curve perimeter points? And, for composite Fermat numbers, will we always find that at least one of the factors shares this property?
  • Apart from the Fermat primes and factors of composite Fermat numbers, is 13 the only “sporadic” case? If so, what’s so special about 13?

I lack the mental muscle to answer these questions. Maybe someone else will make progress.

Update 2013-04-27: After publishing this story yesterday, I extended my search for “interesting” prime denominators, covering the territory between 257 and 65537. I found one new specimen: 61681. Twenty t values with this denominator yield perimeter points on the Hilbert curve. The smallest of them is t = 907/61681 → (x = 0, y = 101/1025). The repeating unit of the quaternary expansion of this fraction is (0.00033003223330033011)4.

Update 2013-04-29: Please see the comments! Several readers have provided illuminating insights, including a finite-state machine by ShreevatsaR that recognizes the class of quaternary-digit sequences that map to the perimeter of the square. And, as a further result of ShreevatsaR’s analysis, we have yet another “interesting” prime denominator: 15790321.

Posted in computing, mathematics | 16 Comments

Flying Nonmetric Airways

It’s the nature of triangles that no one side can be longer than the sum of the other two sides: For triangle ABC, ACAB + BC. This is the triangle inequality. Euclid proved it (Book I, Proposition 20). With appropriate definitions it holds for triangles on the surface of a sphere as well as on a plane. And the everyday human experience of moving around in the universe amply confirms it. Unless you are booking a flight from Boston to Philadelphia.

map of northeastern us showing direct route from Boston to Philadelphia and a detour via Detroit

Yesterday I had to arrange such a trip on short notice. I was offered a choice of several direct flights, with the cheapest round-trip ticket priced at $562. But I could get there and back for just $276 if I went via Detroit.

I am fascinated and appalled by the economics of this situation. From one point of view it makes perfect sense: A direct flight is faster and more convenient, so of course it should command a premium price. But when you look at the cost side of the equation, it seems crazy: The Detroit detour quadruples the mileage and adds an extra takeoff and landing. Shouldn’t those flyers who long for a view of Lake Erie and a stopover in the Motor City be asked to pay extra for the privilege?

The usual explanation of this paradox is the marginal-cost argument. If the BOS-PHL route is saturated, but there are empty seats on the BOS-DTW and DTW-PHL legs, the airline is better off filling the vacant seats at any price above the marginal cost of adding a single passenger. If the price is too attractive, however, customers will defect from the premium flight. No doubt the observed price differential was determined by some optimization program running in the airline’s back office, nudging the system toward an equilibrium point.

Sound sensible? Suppose I had wanted to go to Detroit instead of Philadelphia. When I looked up the fares this morning, I found that the cheapest nonstop BOS-DTW ticket is $1,224. But I could pay as little as $578 if I were willing to make a detour through—guess where—Philadelphia. Now it appears the load factors are reversed: It’s the Beantown–Motown corridor where demand is strong, while the City of Brotherly Love takes up the slack.

If that program in the back office were trying to optimize overall system efficiency—perhaps aiming to minimize aggregate passenger travel time as well as fuel consumption, pollution, labor inputs and other resources—I have a hard time believing it would come up with a solution anything like the itinerary I’m following today.

By the way, I’m not going to Detroit. I’m making a stop in Queens instead. At least the triangle is skinnier.

Update: Totally triangular. On the first leg of the trip, we pushed away from the gate at Logan airport and the pilot immediate announced that we’d be waiting an hour or more before takeoff because of air traffic congestion at JFK. What could cause such congestion? You don’t suppose the problem might possibly be caused or exacerbated by pricing policies that encourage people flying from Boston to Philadelphia to stop in New York?

As it turned out, the delay was only 10 or 15 minutes rather than an hour, and I made my connection to the Philadelphia flight.

On the return trip, I was getting emails and phone messages from the airline as I waited barefoot in the security queue at PHL. My departure time had been pushed back by 30 minutes, again because of JFK congestion. When I got to the gate, a weary but helpful agent told me: “If you want to get home tonight, go to Cincinnati.” So I took the fat triangle, flying 1,244 miles to go just 268 (a worse ratio than the Detroit route).

Is all this just the whining of an unhappy traveler? Maybe, but I there’s an economic puzzle that I’m wanting to solve. What rational business strategy rewards a customer or a company for wasting 1,000 miles worth of fuel?

Posted in mathematics, modern life | 14 Comments

Sphere packings and Hamiltonian paths

In an American Scientist column published last November, I discussed efforts by groups at Harvard and Yale to identify arrangements of n equal-size spheres that maximize the number of pairwise contacts between spheres. The Harvard collaboration (Natalie Arkus, Vinothan N. Manoharan and Michael P. Brenner) solved this problem for clusters of up to n = 10 spheres. The Yale group (Robert S. Hoy, Jared Harwayne-Gidansky and Corey S. O’Hern) extended the result to n = 11.

In describing some algorithmic refinements by the Yale researchers I wrote the following sentence:

The paper alluded to in this passage is: T. Biedl, E. Demaine, M. Demaine, S. Lazard, A. Lubiw, J. O’Rourke, M. Overmars, S. Robbins, I. Streinu, G. Toussaint and S. Whitesides. 2001. Locked and unlocked polygonal chains in three dimensions. Discrete and Computational Geometry 26:269–281.

For example, they took advantage of a curious fact proved by Therese Biedl, Erik Demaine and others: Any valid packing of spheres has a continuous, unbranched path that threads from one sphere to the next throughout the structure, like a long polymer chain.

A few weeks ago Robert Connelly of Cornell wrote me to point out that this statement is erroneous in two ways. First, Biedl and Demaine and their colleagues did not prove (or even attempt to prove) the “curious fact” I mentioned. Second, the fact is not a fact. It’s simply not true that all “valid packings” have a continuous unbranched path from sphere to sphere—known as a Hamiltonian path, after William Rowan Hamilton.

What’s a “valid packing”? The Harvard and Yale groups focused on arrangements of spheres that meet two criteria: A cluster must have at least 3n – 6 sphere-to-sphere contacts overall, and every sphere must touch at least three other spheres. Clusters that satisfy these conditions are termed “minimally rigid.” Connelly showed that not all such clusters have a Hamiltonian path. The demonstration is direct. Working with Erik Demaine and Martin Demaine, he produced a counterexample—a 16-sphere minimally rigid cluster that can be proved to have no Hamiltonian path. It later emerged there is a smaller example, with just 14 spheres.

The concept of a Hamiltonian path comes from graph theory rather than geometry, but it’s easy to translate between the two realms. The spheres of a geometric cluster correspond to the vertices of a graph; two vertices in the graph are connected by an edge if and only if the corresponding spheres are in contact. A Hamiltonian path through the graph is a route along some subset of the edges that visits each vertex exactly once. A graph that has such a path is said to be traceable, since you can follow the route through a diagram without ever lifting the pencil. The Hamiltonian path is not required to traverse all the edges. Nor does it have to return to its starting point. (A path that does form a closed loop is a called a Hamiltonian circuit.)

Connelly’s 16-sphere counterexample is shown in the photographs below as a skeleton built with the Geomags construction kit. At right is the corresponding graph, with edge colors that match those in the photos.

The 16-vertex graph.

two more views of the same 16-vertex cluster, from an equatorial position (left) and a polar point of view (right)

A 16-sphere cluster that has no Hamiltonian path is shown in equatorial and polar views. In this Geomags model, the sphere centers are represented by the shiny steel balls; spheres in contact are connected by a colored magnetic strut. All the struts are the same length. The graph at right preserves the pattern of connectivity of the cluster but not the geometry.

The core of this structure (yellow) is a pentagonal dipyramid with 10 triangular faces. Nine of the faces are decorated with tetrahedral caps (red); the tenth face is left unadorned. Counting the balls and struts in the model or the vertices and edges in the graph shows that the structure satisfies the criteria for minimal rigidity: The 16 spheres are linked by 3 × 16 – 6 = 42 bonds, and every sphere touches at least three others.

How do we know that the graph has no Hamiltonian path? In general, answering this question is a difficult task (indeed, it’s NP-complete). But this particular graph was designed specifically to make the problem easy. Connelly explains why there can be no unbranched path that threads through all the vertices:

The reason is a simple counting argument. Suppose there is a Hamiltonian path. Since there are 16 vertices in all, the Hamiltonian path must have 15 edges. Each of the 9 additional vertices is adjacent to 2 edges in the Hamiltonian path, except possibly for the 2 end points of the path, which each correspond to 1 edge of the path. This is a disjoint set of at least 2 × 9 – 2 = 16 edges, one more than there are in the path, a contradiction.

The smaller, 14-sphere counterexample is constructed in the same way, but instead of a pentagonal dipyramid the core of the structure is a square dipyramid—an object better known as an octahedron. All eight faces are capped with tetrahedra, making a stellate octahedron.

the 14-vertex non-Hamiltonian graph

Stellate octahedron in equatorial and polar views

The argument for the impossibility of a Hamiltonian path takes exactly the same form as in Connelly’s example. Any path that reaches all eight of the stellate vertices must include at least 2 × 8 – 2 = 14 edges, but a Hamiltonian path in this graph has only 13 edges.

sphere cluster: 14 spheres in a stellated octahedral configuration

The illustration at left is another representation of the stellate octahedron, this time as a packing of equal-size spheres. (If you hover your mouse over the image, it might spin.)

The procedure for creating these non-Hamiltonian structures was devised by the geometer Victor Klee; the resulting figures are now called Kleetopes. The 14-node octahedral example, which Klee mentions in Branko Grünbaum’s book Convex Polytopes, is the smallest of an infinite series. You might think you could produce an even smaller example by starting with a triangular dipyramid, which has five vertices and six triangular faces. Adding six tetrahedral caps yields a cluster with 11 vertices in all. However, this structure does have Hamiltonian paths. (On the other hand, it has no Hamiltonian circuit.)

When we give up the erroneous assumption that all minimally rigid arrangements have Hamiltonian paths, what is the status of the two searches for high-contact-number sphere packings? The following assessment is based on conversations and correspondence with several of the participants, but the wording and the conclusions are mine; others might well see it differently.

First of all, the Harvard group did not rely on any assumptions about Hamiltonian paths, so their enumeration of clusters up to n = 10 is entirely unaffected.

The Yale group adopted the Hamiltonian-path assumption as a way of containing the combinatorial explosion when they extended the search to n = 11. The computational burden can be measured in terms of the number of graph adjacency matrices, \(\bar{A}\), that need to be examined. Hoy, Harwayne-Gidansky and O’Hern wrote:

Robert S. Hoy, Jared Harwayne-Gidansky and Corey S. O’Hern. 2012. Structure of finite sphere packings via exact enumeration: Implications for colloidal crystal nucleation. Physical Review E 85:051403. Link to author preprint.

All macrostates can be found with greatly reduced computational effort through appropriate selection of “topological” constraints on the elements of \(\bar{A}\). Biedl et al. . . . proved that all connected sphere packings admit linear polymeric paths, that is, for any valid packing, one can always permute particle indices so that the packing is fully traversed by a “polymeric” \(\bar{A}\) with \(A_{i,i+1} = 1\) for all \(i\).

This assumption that all entries on the superdiagonal of the matrix can be set equal to 1 brings a huge reduction in computational cost. For n = 11, the number of matrices to be considered is slashed by more than three orders of magnitude. But of course that means 99.9 percent of all the candidate matrices are passed over without examination. If any of those ignored matrices were valid minimally rigid packings of 11 spheres, the Yale survey would have missed them.

How much should we worry about this potential omission? It depends on your point of view. If you look upon the result as a mathematical proof classifying all minimally rigid sphere clusters, then the proof has a big gap. But, as I noted in my American Scientist column, mathematical rigor was not the highest priority in this work and was already jeopardized by the use of a numerical algorithm (Newton’s method) that is not guaranteed to converge in all cases. The original motivation for both the Harvard and Yale projects came from chemistry and physics, not from geometry and graph theory. The idea was to identify the kinds of clusters that might form nuclei of condensing crystals or aggregations of colloidal particles. The physical significance of the findings is not much compromised by doubts about mathematical rigor.

In any case, the Yale results will require revision only if there exists at least one minimally rigid cluster of 11 or fewer spheres without a Hamiltonian path—and that now seems unlikely. Any such cluster with n ≤ 10 would have been found by the Harvard survey (which, again, did not exclude untraceable graphs). Meanwhile, recent work by Miranda Holmes-Cerfon of NYU has produced a new enumeration of sphere packings up to n = 12. The full results are still unpublished but she reports no sign of clusters that were missed because of the incorrect Hamiltonian-path assumption.

The Holmes-Cerfon survey is based on a different methodology. Instead of generating vast numbers of adjacency matrices and testing them to see which ones correspond to valid packings, she begins with a known cluster and tries to transform it into all other valid packings with the same number of spheres. The transformations consist of all possible movements of a single bond that maintain the conditions for minimal rigidity. This process is computationally less arduous, although it does rely on the assumption that all valid configurations can be reached by some series of single-bond moves.

When I first heard about the search for clusters that maximize contact number, I was surprised to learn that such simple questions remain unanswered. How could it be that we know so little about clusters of just a dozen spheres? I’m still intrigued, and now I have another question to noodle over: How come we know so little about Hamiltonian paths in small graphs? Is Klee’s 14-vertex example the smallest for minimally rigid structures? There can’t be any efficient way of answering this question (unless P = NP), but is it beyond our ability to close the gap between n = 11 and n = 14?

Acknowledgments: I’m grateful to Bob Connelly both for bring this matter to my attention and for helping me understand the issue. I have also benefitted from very helpful conversations and emails with Natalie Arkus, Michael P. Brenner, Karoly Bezdek, Miranda Holmes-Cerfon, Robert Hoy, David Wales and Zorana Zeravcic. (But if I still don’t have the story straight, it’s not their fault!)

Posted in computing, mathematics, physics | 1 Comment