A little theorem

Often I wish that I knew more mathematics and understood it more deeply. But then I wouldn’t have the pleasure of discovering afresh things that other people have known for years. (In some cases hundreds of years.)

Last week, in a discussion of Fermat primes and the Hilbert curve, ShreevatsaR remarked:

BTW, you only need to try primes that are factors of \((4^k – 1)\) for some \(k\)…. Considering powers of 4 up to 32 and prime factors greater than 65537, this means just the primes 87211, 131071, 174763, 178481, 262657, 524287, 2796203, 3033169, 6700417, 15790321, 715827883, 2147483647.

In response I asked (innocently, if skeptically):

Are there primes other than 2 that are known not to be factors of \((4^k – 1)\) for some \(k\)?

ShreevatsaR immediately replied: No, every odd prime must divide some \((4^k – 1)\). And he gave a proof based on the pigeonhole principle. The primes he had listed in his earlier comment are just those that happen to divide \((4^k – 1)\) for an unusually small value of \(k\).

In the middle of the night I had a little epiphany: This is just Fermat’s Little Theorem in disguise. One version of the Little Theorem says: If \(p\) is a prime and \(a\) is a natural number, then either \(p\) divides \(a\) or \(p\) divides \(a^{(p-1)} – 1\). To get back to ShreevatsaR’s statement, just observe that for any prime \(p\) other than 2, \(p-1\) is even, and so we can introduce an integer \(k = \frac{(p-1)}{2}\), making \(4^{k} – 1\) equivalent to \(2^{(p-1)} – 1\).

My previous encounters with Fermat’s Little Theorem have been in the context of primality testing. If you have some natural number \(n\) and you want to find out if it’s a prime, calculate \(2^{(n-1)} – 1 \bmod n\); if the result is anything other than zero, \(n\) is composite. (Unfortunately, the converse statement is not to be relied upon: If \(2^{(n-1)} – 1 \bmod n = 0\), it does not always follow that \(n\) is prime—but that’s a story for another time.)

ShreevatsaR’s comment brought to light something I had never thought about: We know that a prime \(p\) divides \(2^{(p-1)} – 1\), but \(p\) might also divide some smaller number \(2^{(m-1)} – 1\) with \(m < p\). I went searching for the smallest such \(m\) for all primes less than 10,000. Here are the results:

Mouseover to magnify.graph of the least m such that p divides 2^(m-1) - 1 for all primes p less than 10000

Each dot in the graph represents a prime \(p\). The horizontal coordinate of the dot is the magnitude of \(p\); the vertical coordinate is the least \(m\) such that \(p\) | \(2^{(m-1)} – 1\). Of the 1228 primes shown, 470 lie along the diagonal, indicating that the least \(m\) is in fact equal to \(p\). Another 348 dots lie along a line of slope \(\frac{1}{2}\): For each of these primes, \(p\) divides \(2^{\frac{(p-1)}{2}} – 1\) as well as \(2^{(p-1)} – 1\). The smallest such \(p\) is 7, which divides \(2^3 – 1 = 7\) as well as \(2^6 – 1 = 63\). It’s easy to pick out other sets of points on lines of slope \(\frac{1}{3}\), \(\frac{1}{4}\) and so on. Toward the bottom of the graph, where the least \(m\) gets smaller, the points are sparse and patterns are more difficult to perceive, but the alignments are still present.

The procedure effectively divides the primes into classes distinguished by the value of \(r=\frac{p-1}{m-1}\):

r=1:  3, 5, 11, 13, 19, 29, 37, 53, 59, 61, 67, 83, 101, 107...
r=2:  7, 17, 23, 41, 47, 71, 79, 97, 103, 137, 167, 191, 193...
r=3:  43, 109, 157, 229, 277, 283, 307, 499, 643, 691, 733...
r=4:  113, 281, 353, 577, 593, 617, 1033, 1049, 1097, 1153...
r=5:  251, 571, 971, 1181, 1811, 2011, 2381, 2411, 3221...
r=6:  31, 223, 433, 439, 457, 727, 919, 1327, 1399, 1423...

There is nothing new or original about all this. Gauss wrote about it in Disquisitiones Arithmeticae in 1801. The primes in the first list are those for which the multiplicative order of 2 mod p is \(p-1\); in other words, the set of residues 2 mod p repeats with period \(p-1\), the largest possible. In Gauss’s terminology, 2 is a primitive root of these primes. In 1927 Emil Artin conjectured that there are infinitely many primes in this set. For the second series the multiplicative order of 2 mod p is \(\frac{p-1}{2}\), for the third group it is \(\frac{p-1}{3}\), and so on. The OEIS has more on each of these series.

Nothing new, but I found the graph a useful aid to understanding. (I would not be surprised to learn that the graph isn’t new either.)

Trivia: What is the largest value of \(r\) encountered in this set of primes? Well, 8191 divides \(2^{(14 – 1)} – 1\). As a matter of fact, 8191 is equal to \(2^{(14 – 1)} – 1\). Thus we have:

\[r = \frac{p-1}{m-1} = \frac{8190}{13} = 630 .\]

Three more of the primes listed by ShreevatsaR are also of the form \(2^{n} – 1\). On the assumption that we have a boundless supply of such primes, it would appear there is no limit to the value of \(r\). [Update: Please see the comments, with illuminating contributions by ShreevatsaR, Gerry Myerson and (via Stack Exchange) David Speyer.]

Posted in mathematics | 3 Comments

Mapping the Hilbert curve

In 1877 the German mathematician Georg Cantor made a shocking discovery. He found that a two-dimensional surface contains no more points than a one-dimensional line.

Thus begins my latest column in American Scientist, now available in HTML or PDF. The leading actor in this story is the Hilbert curve, which illustrates Cantor’s shocking discovery by leaping out of the one-dimensional universe and filling up a two-dimensional area. David Hilbert discovered this trick in 1891, building on earlier work by Giuseppe Peano. The curve is not smooth, but it’s continuous—you can draw it without ever lifting the pencil from the paper—and when elaborated to all its infinite glory it touches every point in a square.

The first through fourth generations of the Hilbert curve

Above: The first through fourth stages in the construction of the Hilbert curve. To generate the complete space-filling curve, just keep going this way.

To supplement the new column I’ve built two interactive illustrations. The first one animates the geometric construction of the Hilbert curve, showing how four copies of the generation-n curve can be shrunken, twirled, flipped and reconnected to produce generation n+1. The second illustration is a sort of graphic calculator for exploring the mapping between one-dimensional and two-dimensional spaces. For any point t on the unit line segment [0, 1], the calculator identifies the corresponding point x, y in the unit square [0, 1]2. The inverse mapping is also implemented, although it’s not one-to-one. A point x, y in the square can correspond to multiple points t on the segment; the calculator shows only one of the possibilities.

The Hilbert-curve illustrations are posted in the Extras department here at bit-player.org. I encourage you to go play with them, but please come back and read on. I have some unsolved problems.


I built the Hilbert mapping calculator in part to satisfy my own curiosity. It’s fine to know that every point t can be matched with a point x, y, but which points go together, exactly? For example, what point in the square corresponds to the midpoint of the segment? The answer in this case is not hard to work out: The middle of the segment maps to the middle of the square: (t = 1/2) → (x = 1/2, y = 1/2). Shown below are a few more salient points, as plotted by the calculator:

mapping of n/12 onto the Hilbert curve, for n=0 through n=12

Rational points of the form t = n/12, for integer 0 ≤ n ≤ 12, are mapped into the unit square according to their positions along the Hilbert curve. For example, t = 3/12 = 1/4 is mapped to the point x = 0, y = 1/2, at the midpoint of the square’s left edge. Note that the Hilbert curve drawn in the background of the illustration is a fifth-stage approximation to the true curve, but the positions of the colored dots are calculated to much higher precision.

Looking at this tableau led me to wonder what other values of t correspond to “interesting” x, y positions. In particular, which fractions along the t segment project to points on the perimeter of the square? Which t values go to points along one of the midlines, either vertical or horizontal? To narrow these questions a little further, let’s just consider fractions whose denominator is a prime. (In all that follows I ignore fractions with denominator 1, or in other words the end points of the t segment.)

The diagram above already reveals that fractions with denominator 2 and 3 are members of the “interesting” class. The value t = 1/2 maps to the intersection of the two midlines. And t = 1/4, 1/3, 2/3 and 3/4 all generate x, y points that lie on the perimeter. Can we find interesting fractions with other prime factors in the denominator? Yes indeed! Here is the Hilbert mapping for fractions of the form n/5:

mapping of n/5 onto the Hilbert curve, for n=0 through n=5

Hilbert mapping for fifths. In the calculator you can generate this output by entering */5 into the input area labeled “t =”. It turns out that (t = 2/5) → (x = 1/3, y = 1), and (t = 3/5) → (x = 2/3, y = 1).

Note that 2/5 and 3/5 map to points on the upper boundary of the square. By a simple sequential search I soon stumbled upon two more “interesting” examples, namely n/13, and then n/17.

mapping of n/13 onto the Hilbert curve, for n=0 through n=13

Hilbert mapping for t = 0/13, 1/13, 2/13 … 13/13. The fractions t = 3/13, 4/13, 9/13 and 10/13 yield x, y coordinates on the perimeter of the square.

mapping of n/17 onto the Hilbert curve, for n=0 through n=17

Hilbert mapping for 17ths. The fractions t = 1/17, 4/17, 6/17 and 7/17 correspond to points on the perimeter, as well as the mirror images of these points when reflected through the vertical midline.

At this point I began to think that prime-denominator fractions yielding perimeter points must be lying around everywhere just waiting for me to gather them up. And I was not surprised by their seeming abundance. After all, there are infinitely many points on the perimeter of the square, and so there must be infinitely many corresponding t values—even if we confine our attention to the rational numbers.

When I continued the search, however, my luck ran out. In the “t=” box of the calculator I typed in all fractions of the form */p for primes p < 100; I found no points on the perimeter other than the ones I already knew about, namely those for p = 3, 5, 13 and 17.

So I automated the search. It turns out the next prime denominator yielding perimeter points is 257. Typing */257 into the calculator produces this garish display:

mapping of n/257 onto the Hilbert curve, for n=0 through n=257

Points of the form t = n/257. There are 32 points that lie along the perimeter of the square.

Beyond 257 comes an even longer barren stretch. Surveying all prime denominators less than 10,000 failed to reveal any more that produce Hilbert-curve perimeter points.

Why not search the other way? We can plug in x, y values on the perimeter and then use the inverse Hilbert transformation to find a corresponding t value. I tried that. It was easy to get a sequence of approximations to t but not so easy to deduce the limiting value.To summarize, if t = n/p is a fraction that maps to a point on the perimeter of the Hilbert curve, and if p is a prime less than 10,000, then p must be one of the set {3, 5, 13, 17, 257}. Let’s pluck 13 out of this group and set it aside for a moment. The remaining elements of the sequence look familiar. They are all numbers of the form \(2^{2^n} + 1\). In other words, they are Fermat numbers. In 1640 Pierre de Fermat examined the first five numbers in this sequence: 3, 5, 17, 257, 65537. He confirmed that each of them is a prime, and he conjectured that all further numbers formed on the same model would also be prime. So we come to the burning question of the moment: Does the fifth Fermat prime 65537 generate perimeter points on the Hilbert curve? It sure does: 512 of them.

The obvious next step is to look at larger Fermat numbers, but there’s a complication. Fermat’s claim that all such numbers are prime turned out to be overreaching a little bit; beyond the first five, every Fermat number checked so far has turned out to be composite. Still, we can see if they give us Hilbert perimeter points. The sixth Fermat number is 4294967297, and sure enough there are fractions with this number in the denominator that land on the perimeter of the Hilbert curve. For example, t = 4/4294967297 converges to x = 0, y = 1/32768. We can also check the factors of this number (which have been known since 17291732, when Euler identified them as 641 and 6700417). With a bit of effort I pinned down a few fractions with 6700417 in the denominator, such as 2181/6700417, that are on the perimeter.

It’s the same story with the seventh Fermat number, 18446744073709551617. Both this number and its largest prime factor, 67280421310721, yield perimeter points. I have not looked beyond the seventh F number. Would I be as reckless as Fermat if I were to conjecture that all such numbers generate Hilbert-curve perimeter points?

Thus the known prime denominators that give rise to perimeter points are the five Fermat primes, 3, 5, 17, 257 and 65537, as well as two prime factors of larger Fermat numbers, 6700417 and 67280421310721. Oh, and 13. I almost forgot 13. Who let him in?

Of course there are many (infinitely many) perimeter points whose associated t values do not have a prime denominator. If we run through all fractions whose denominator is less than 1,000 (when reduced to lowest terms), this is the set of denominators that maps to perimeter points:

{3, 4, 5, 12, 13, 16, 17, 20, 48, 51, 52, 63, 64, 65, 68, 80, 192, 204, 205, 208, 252, 255, 256, 257, 260, 272, 273, 320, 341, 768, 771, 816, 819, 820, 832}

It’s another curious bunch of numbers, with patterns that are hard to fathom. (The OEIS is no help.) I thought at first that the numbers might be composed exclusively of the primes I had identified earlier (as well as 2). This notion held together until I got to 63, which of course has a factor of 7. Then I thought a slightly weaker condition might hold: Other factors are allowed, but at least one factor must be drawn from the favored set. That lasted until I reached 341, which factors as 11 × 31. There must be some logic behind the selection of these numbers, but the rule escapes me. One more peculiarity: Every even power of 2 appears, but none of the odd powers.

Here is the analogous set of denominators for t values that project to the two midlines of the Hilbert curve:

{2, 4, 6, 10, 12, 16, 20, 26, 34, 48, 52, 64, 68, 80, 102, 126, 130, 192, 204, 208, 252, 256, 260, 272, 320, 410, 510, 514, 546, 682, 768, 816, 820, 832}

One difference jumps out immediately: All of these numbers are even. Yet the sets have more than half their elements in common. Every even member of the perimeter set is also a member of the midline set. Also note that again only even powers 2 are included (apart from 2 itself).


I doubt that any deep mathematical truth hinges on the question of which numbers correspond to perimeter points on the Hilbert curve. Nevertheless, having allowed myself to get sucked into this murky business, I would really like to come out of it with some glimmer of understanding. So far that eludes me.

I can explain part of what’s happening here by looking more closely at the mechanism of the mapping from t to x, y. The process is easiest to follow if t is expressed as a quaternary fraction (i.e., base 4). Thus we view t as a list of “quadits,” digits drawn from the set {0, 1, 2, 3}. Each stage of the mapping process takes a single quadit and uses it to choose one of four affine transformations to apply to the current x, y coordinates, from H0 to H3.

\[ \mathbf{H}_0 =
\left(
\begin{matrix}
0 & 1/2\\
1/2 & 0
\end{matrix}
\right)
\left(
\begin{matrix}
x\\
y
\end{matrix}
\right)
+
\left(
\begin{matrix}
0\\
0
\end{matrix}
\right)
\qquad
\mathbf{H}_1 =
\left(
\begin{matrix}
1/2 & 0\\
0 & 1/2
\end{matrix}
\right)
\left(
\begin{matrix}
x\\
y
\end{matrix}
\right)
+
\left(
\begin{matrix}
0\\
1/2
\end{matrix}
\right)
\]

\[ \mathbf{H}_2 =
\left(
\begin{matrix}
1/2 & 0\\
0 & 1/2
\end{matrix}
\right)
\left(
\begin{matrix}
x\\
y
\end{matrix}
\right)
+
\left(
\begin{matrix}
1/2\\[0.3em]
1/2
\end{matrix}
\right)
\qquad
\mathbf{H}_3 =
\left(
\begin{matrix}
0 & -1/2\\
-1/2 & 0
\end{matrix}
\right)
\left(
\begin{matrix}
x\\
y
\end{matrix}
\right)
+
\left(
\begin{matrix}
1\\
1/2
\end{matrix}
\right)
\]

The sequence of quadits determines a sequence of transformations, which in turn determines the mapping from t to x, y. Here is the Javascript that implements the mapping function in the online Hilbert-curve calculator:
Most of the results reported here come not from the Javascript version but from a Lisp implementation, which can take advantage of Common Lisp’s arbitrary-precision rational numbers. (Javascript has only IEEE floats.)

function hilbertMap(quadits) {
  var pt, t, x, y;
  if ( quadits.length === 0 ) {
    return new Point(1/2, 1/2);   // center of the square
  }
  else {
    t = quadits.shift();          // get first quadit
    pt = hilbertMap(quadits);     // recursive call
    x = pt.x;
    y = pt.y;
    switch(t) {
      case 0:     // southwest, with a twist
        return new Point(y * 1/2 + 0, x * 1/2 + 0);
      case 1:     // northwest
        return new Point(x * 1/2 + 0, y * 1/2 + 1/2);
      case 2:     // northeast
        return new Point(x * 1/2 + 1/2, y * 1/2 + 1/2);
      case 3:     // southeast, with twist & flip
        return new Point(y * -1/2 + 1, x * -1/2 + 1/2);
    }
  }
}

If you’re not in a mood to pick your way through either the linear algebra or the source code, I can give a rough idea of what’s going on. With each 0 quadit, the x, y point drifts toward the southwest corner of the square. A 1 quadit steers the point toward the northwest, and likewise a 2 quadit nudges the point to the northeast and a 3 quadit to the southeast. From these facts alone you can predict the outcome of simple cases. An uninterrupted string of 0 quadits has to converge on the southwest corner of the square, which is just what’s observed for t = 0. In the same way, a quadit sequence of all 1s has to end up in the northwest corner. What t value corresponds to (0.1111111…)4? This is the quaternary expansion of 1/3, which explains why we see (t = 1/3) → (x = 0, y = 1).

The quaternary expansion of 2/5 is (0.121212…)4. Let’s trace what the Javascript code above does when it is given a finite prefix of this number. In the illustration below, the prefix is just four quadits, (0.1212)4. sequence of x, y positions in mapping t = 2/5 = quaternary 0.1212 The recursive procedure effectively works back-to-front through the sequence of quadits. The x and y coordinates are given initial values of 1/2 (in the middle of the square). The first transformation is H2, which amounts to x → (y/2 + 1/2), y → (x/2 + 1/2); after this operation, the position of the x, y point is (3/4, 3/4). The next quadit is 1, and so the H1 transformation is applied to the new x, y coordinates. H1 specifies x → (x/2), y → (y/2 + 1/2); as a result, the point moves to (3/8, 7/8). Another round of H2 followed by H1 brings the position to (11/32, 31/32). Generalizing to a more precise calculation with a longer string of quadits, it’s easy to see that the nth location in this progression will have a y coordinate equal to \(\frac{2^{n}-1}{2^{n}}\); as n goes to infinity, this expression converges to 1.

Here the quaternary expansion of 2/5 is compared with the expansions of a few other fractions that correspond to y = 1 perimeter points:

t = 2/5 = (0.12121212…)4 → (x = 1/3, y = 1)

t = 6/17 = (0.11221122…)4 → (x = 1/5, y = 1)

t = 22/65 = (0.11122211…)4 → (x = 1/9, y = 1)

t = 86/257 = (0.11112222…)4 → (x = 1/17, y = 1)

Every number t whose quaternary expansion consists exclusively of 1s and 2s projects to some point along the northern boundary of the square. It would be satisfyingly tidy if we could make an analogous statement about the other three boundary edges. For example, if all 1s and 2s head north, then quadits made up of 0s and 3s ought to wind up on the southern edge. But it’s not so simple. Consider these cases:

t = 1/17 = (0.00330033…)4 → (x = 1/5, y = 0)

t = 4/17 = (0.03300330…)4 → (x = 0, y = 2/5)

t = 13/17 = (0.30033003…)4 → (x = 1, y = 2/5)

t = 16/17 = (0.33003300…)4 → (x = 4/5, y = 0)

The behavior of 1/17 and 16/17 is what you might expect: They go south. But 4/17 and 13/17 migrate not to the bottom of the square but to the two sides. If strings of 1s and 2s are so well-behaved, what’s the matter with strings of 0s and 3s? Well, the geometric transformations associated with quadrants 1 and 2 are simple scalings and translations. The matrices for quadrants 0 and 3 are more complicated: They introduce rotations and reflections. As a result it’s not so easy to predict the trajectory of an x, y point from the sequence of quadits in the t value. In any case I have not yet found a simple and obvious general rule.

The case of t = n/13 is no less perplexing:

t = 1/13 = (0.010323010323…)4 → (x = 1/3, y = 0)

t = 4/13 = (0.103230103230…)4 → (x = 0, y = 2/3)

t = 9/13 = (0.230103230103…)4 → (x = 1, y = 2/3)

t = 12/13 = (0.323010323010…)4 → (x = 2/3, y = 0)

When I trace through these transformations, as I did above for t = 2/5, I get the right answer in the end, but I haven’t a clue to the logic that drives the trajectory. I certainly can’t look at such a sequence of quadits and predict where the point will wind up.


I am left with more questions than answers. Furthermore, I don’t know whether the questions are genuinely difficult or if I am just missing something obvious. (It wouldn’t be the first time.) Anyway, here are the two big ones:

  • Can we prove that all Fermat numbers give rise to Hilbert-curve perimeter points? And, for composite Fermat numbers, will we always find that at least one of the factors shares this property?
  • Apart from the Fermat primes and factors of composite Fermat numbers, is 13 the only “sporadic” case? If so, what’s so special about 13?

I lack the mental muscle to answer these questions. Maybe someone else will make progress.


Update 2013-04-27: After publishing this story yesterday, I extended my search for “interesting” prime denominators, covering the territory between 257 and 65537. I found one new specimen: 61681. Twenty t values with this denominator yield perimeter points on the Hilbert curve. The smallest of them is t = 907/61681 → (x = 0, y = 101/1025). The repeating unit of the quaternary expansion of this fraction is (0.00033003223330033011)4.


Update 2013-04-29: Please see the comments! Several readers have provided illuminating insights, including a finite-state machine by ShreevatsaR that recognizes the class of quaternary-digit sequences that map to the perimeter of the square. And, as a further result of ShreevatsaR’s analysis, we have yet another “interesting” prime denominator: 15790321.

Posted in computing, mathematics | 15 Comments

Flying Nonmetric Airways

It’s the nature of triangles that no one side can be longer than the sum of the other two sides: For triangle ABC, ACAB + BC. This is the triangle inequality. Euclid proved it (Book I, Proposition 20). With appropriate definitions it holds for triangles on the surface of a sphere as well as on a plane. And the everyday human experience of moving around in the universe amply confirms it. Unless you are booking a flight from Boston to Philadelphia.

map of northeastern us showing direct route from Boston to Philadelphia and a detour via Detroit

Yesterday I had to arrange such a trip on short notice. I was offered a choice of several direct flights, with the cheapest round-trip ticket priced at $562. But I could get there and back for just $276 if I went via Detroit.

I am fascinated and appalled by the economics of this situation. From one point of view it makes perfect sense: A direct flight is faster and more convenient, so of course it should command a premium price. But when you look at the cost side of the equation, it seems crazy: The Detroit detour quadruples the mileage and adds an extra takeoff and landing. Shouldn’t those flyers who long for a view of Lake Erie and a stopover in the Motor City be asked to pay extra for the privilege?

The usual explanation of this paradox is the marginal-cost argument. If the BOS-PHL route is saturated, but there are empty seats on the BOS-DTW and DTW-PHL legs, the airline is better off filling the vacant seats at any price above the marginal cost of adding a single passenger. If the price is too attractive, however, customers will defect from the premium flight. No doubt the observed price differential was determined by some optimization program running in the airline’s back office, nudging the system toward an equilibrium point.

Sound sensible? Suppose I had wanted to go to Detroit instead of Philadelphia. When I looked up the fares this morning, I found that the cheapest nonstop BOS-DTW ticket is $1,224. But I could pay as little as $578 if I were willing to make a detour through—guess where—Philadelphia. Now it appears the load factors are reversed: It’s the Beantown–Motown corridor where demand is strong, while the City of Brotherly Love takes up the slack.

If that program in the back office were trying to optimize overall system efficiency—perhaps aiming to minimize aggregate passenger travel time as well as fuel consumption, pollution, labor inputs and other resources—I have a hard time believing it would come up with a solution anything like the itinerary I’m following today.

By the way, I’m not going to Detroit. I’m making a stop in Queens instead. At least the triangle is skinnier.


Update: Totally triangular. On the first leg of the trip, we pushed away from the gate at Logan airport and the pilot immediate announced that we’d be waiting an hour or more before takeoff because of air traffic congestion at JFK. What could cause such congestion? You don’t suppose the problem might possibly be caused or exacerbated by pricing policies that encourage people flying from Boston to Philadelphia to stop in New York?

As it turned out, the delay was only 10 or 15 minutes rather than an hour, and I made my connection to the Philadelphia flight.

On the return trip, I was getting emails and phone messages from the airline as I waited barefoot in the security queue at PHL. My departure time had been pushed back by 30 minutes, again because of JFK congestion. When I got to the gate, a weary but helpful agent told me: “If you want to get home tonight, go to Cincinnati.” So I took the fat triangle, flying 1,244 miles to go just 268 (a worse ratio than the Detroit route).

Is all this just the whining of an unhappy traveler? Maybe, but I there’s an economic puzzle that I’m wanting to solve. What rational business strategy rewards a customer or a company for wasting 1,000 miles worth of fuel?

Posted in mathematics, modern life | 14 Comments

Sphere packings and Hamiltonian paths

In an American Scientist column published last November, I discussed efforts by groups at Harvard and Yale to identify arrangements of n equal-size spheres that maximize the number of pairwise contacts between spheres. The Harvard collaboration (Natalie Arkus, Vinothan N. Manoharan and Michael P. Brenner) solved this problem for clusters of up to n = 10 spheres. The Yale group (Robert S. Hoy, Jared Harwayne-Gidansky and Corey S. O’Hern) extended the result to n = 11.

In describing some algorithmic refinements by the Yale researchers I wrote the following sentence:

The paper alluded to in this passage is: T. Biedl, E. Demaine, M. Demaine, S. Lazard, A. Lubiw, J. O’Rourke, M. Overmars, S. Robbins, I. Streinu, G. Toussaint and S. Whitesides. 2001. Locked and unlocked polygonal chains in three dimensions. Discrete and Computational Geometry 26:269–281.

For example, they took advantage of a curious fact proved by Therese Biedl, Erik Demaine and others: Any valid packing of spheres has a continuous, unbranched path that threads from one sphere to the next throughout the structure, like a long polymer chain.

A few weeks ago Robert Connelly of Cornell wrote me to point out that this statement is erroneous in two ways. First, Biedl and Demaine and their colleagues did not prove (or even attempt to prove) the “curious fact” I mentioned. Second, the fact is not a fact. It’s simply not true that all “valid packings” have a continuous unbranched path from sphere to sphere—known as a Hamiltonian path, after William Rowan Hamilton.

What’s a “valid packing”? The Harvard and Yale groups focused on arrangements of spheres that meet two criteria: A cluster must have at least 3n – 6 sphere-to-sphere contacts overall, and every sphere must touch at least three other spheres. Clusters that satisfy these conditions are termed “minimally rigid.” Connelly showed that not all such clusters have a Hamiltonian path. The demonstration is direct. Working with Erik Demaine and Martin Demaine, he produced a counterexample—a 16-sphere minimally rigid cluster that can be proved to have no Hamiltonian path. It later emerged there is a smaller example, with just 14 spheres.

The concept of a Hamiltonian path comes from graph theory rather than geometry, but it’s easy to translate between the two realms. The spheres of a geometric cluster correspond to the vertices of a graph; two vertices in the graph are connected by an edge if and only if the corresponding spheres are in contact. A Hamiltonian path through the graph is a route along some subset of the edges that visits each vertex exactly once. A graph that has such a path is said to be traceable, since you can follow the route through a diagram without ever lifting the pencil. The Hamiltonian path is not required to traverse all the edges. Nor does it have to return to its starting point. (A path that does form a closed loop is a called a Hamiltonian circuit.)

Connelly’s 16-sphere counterexample is shown in the photographs below as a skeleton built with the Geomags construction kit. At right is the corresponding graph, with edge colors that match those in the photos.

The 16-vertex graph.

two more views of the same 16-vertex cluster, from an equatorial position (left) and a polar point of view (right)

A 16-sphere cluster that has no Hamiltonian path is shown in equatorial and polar views. In this Geomags model, the sphere centers are represented by the shiny steel balls; spheres in contact are connected by a colored magnetic strut. All the struts are the same length. The graph at right preserves the pattern of connectivity of the cluster but not the geometry.

The core of this structure (yellow) is a pentagonal dipyramid with 10 triangular faces. Nine of the faces are decorated with tetrahedral caps (red); the tenth face is left unadorned. Counting the balls and struts in the model or the vertices and edges in the graph shows that the structure satisfies the criteria for minimal rigidity: The 16 spheres are linked by 3 × 16 – 6 = 42 bonds, and every sphere touches at least three others.

How do we know that the graph has no Hamiltonian path? In general, answering this question is a difficult task (indeed, it’s NP-complete). But this particular graph was designed specifically to make the problem easy. Connelly explains why there can be no unbranched path that threads through all the vertices:

The reason is a simple counting argument. Suppose there is a Hamiltonian path. Since there are 16 vertices in all, the Hamiltonian path must have 15 edges. Each of the 9 additional vertices is adjacent to 2 edges in the Hamiltonian path, except possibly for the 2 end points of the path, which each correspond to 1 edge of the path. This is a disjoint set of at least 2 × 9 – 2 = 16 edges, one more than there are in the path, a contradiction.

The smaller, 14-sphere counterexample is constructed in the same way, but instead of a pentagonal dipyramid the core of the structure is a square dipyramid—an object better known as an octahedron. All eight faces are capped with tetrahedra, making a stellate octahedron.

the 14-vertex non-Hamiltonian graph

Stellate octahedron in equatorial and polar views

The argument for the impossibility of a Hamiltonian path takes exactly the same form as in Connelly’s example. Any path that reaches all eight of the stellate vertices must include at least 2 × 8 – 2 = 14 edges, but a Hamiltonian path in this graph has only 13 edges.

sphere cluster: 14 spheres in a stellated octahedral configuration

The illustration at left is another representation of the stellate octahedron, this time as a packing of equal-size spheres. (If you hover your mouse over the image, it might spin.)

The procedure for creating these non-Hamiltonian structures was devised by the geometer Victor Klee; the resulting figures are now called Kleetopes. The 14-node octahedral example, which Klee mentions in Branko Grünbaum’s book Convex Polytopes, is the smallest of an infinite series. You might think you could produce an even smaller example by starting with a triangular dipyramid, which has five vertices and six triangular faces. Adding six tetrahedral caps yields a cluster with 11 vertices in all. However, this structure does have Hamiltonian paths. (On the other hand, it has no Hamiltonian circuit.)


When we give up the erroneous assumption that all minimally rigid arrangements have Hamiltonian paths, what is the status of the two searches for high-contact-number sphere packings? The following assessment is based on conversations and correspondence with several of the participants, but the wording and the conclusions are mine; others might well see it differently.

First of all, the Harvard group did not rely on any assumptions about Hamiltonian paths, so their enumeration of clusters up to n = 10 is entirely unaffected.

The Yale group adopted the Hamiltonian-path assumption as a way of containing the combinatorial explosion when they extended the search to n = 11. The computational burden can be measured in terms of the number of graph adjacency matrices, \(\bar{A}\), that need to be examined. Hoy, Harwayne-Gidansky and O’Hern wrote:

Robert S. Hoy, Jared Harwayne-Gidansky and Corey S. O’Hern. 2012. Structure of finite sphere packings via exact enumeration: Implications for colloidal crystal nucleation. Physical Review E 85:051403. Link to author preprint.

All macrostates can be found with greatly reduced computational effort through appropriate selection of “topological” constraints on the elements of \(\bar{A}\). Biedl et al. . . . proved that all connected sphere packings admit linear polymeric paths, that is, for any valid packing, one can always permute particle indices so that the packing is fully traversed by a “polymeric” \(\bar{A}\) with \(A_{i,i+1} = 1\) for all \(i\).

This assumption that all entries on the superdiagonal of the matrix can be set equal to 1 brings a huge reduction in computational cost. For n = 11, the number of matrices to be considered is slashed by more than three orders of magnitude. But of course that means 99.9 percent of all the candidate matrices are passed over without examination. If any of those ignored matrices were valid minimally rigid packings of 11 spheres, the Yale survey would have missed them.

How much should we worry about this potential omission? It depends on your point of view. If you look upon the result as a mathematical proof classifying all minimally rigid sphere clusters, then the proof has a big gap. But, as I noted in my American Scientist column, mathematical rigor was not the highest priority in this work and was already jeopardized by the use of a numerical algorithm (Newton’s method) that is not guaranteed to converge in all cases. The original motivation for both the Harvard and Yale projects came from chemistry and physics, not from geometry and graph theory. The idea was to identify the kinds of clusters that might form nuclei of condensing crystals or aggregations of colloidal particles. The physical significance of the findings is not much compromised by doubts about mathematical rigor.

In any case, the Yale results will require revision only if there exists at least one minimally rigid cluster of 11 or fewer spheres without a Hamiltonian path—and that now seems unlikely. Any such cluster with n ≤ 10 would have been found by the Harvard survey (which, again, did not exclude untraceable graphs). Meanwhile, recent work by Miranda Holmes-Cerfon of NYU has produced a new enumeration of sphere packings up to n = 12. The full results are still unpublished but she reports no sign of clusters that were missed because of the incorrect Hamiltonian-path assumption.

The Holmes-Cerfon survey is based on a different methodology. Instead of generating vast numbers of adjacency matrices and testing them to see which ones correspond to valid packings, she begins with a known cluster and tries to transform it into all other valid packings with the same number of spheres. The transformations consist of all possible movements of a single bond that maintain the conditions for minimal rigidity. This process is computationally less arduous, although it does rely on the assumption that all valid configurations can be reached by some series of single-bond moves.


When I first heard about the search for clusters that maximize contact number, I was surprised to learn that such simple questions remain unanswered. How could it be that we know so little about clusters of just a dozen spheres? I’m still intrigued, and now I have another question to noodle over: How come we know so little about Hamiltonian paths in small graphs? Is Klee’s 14-vertex example the smallest for minimally rigid structures? There can’t be any efficient way of answering this question (unless P = NP), but is it beyond our ability to close the gap between n = 11 and n = 14?

Acknowledgments: I’m grateful to Bob Connelly both for bring this matter to my attention and for helping me understand the issue. I have also benefitted from very helpful conversations and emails with Natalie Arkus, Michael P. Brenner, Karoly Bezdek, Miranda Holmes-Cerfon, Robert Hoy, David Wales and Zorana Zeravcic. (But if I still don’t have the story straight, it’s not their fault!)

Posted in computing, mathematics, physics | 1 Comment

Recursive driveling

If there’s anything inaner than turning literature into drivel, it’s turning drivel into drivel. I’ve added a “recurse” button to the drivel generator. It feeds the output back to the input, like xeroxing a xerox.

What happens when this process is repeated many times? Before I tried the experiment, two quite different outcomes both seemed plausible. On the one hand, driveling is a recombinatory process that stirs up the text and thereby introduces novelty. Nth-order drivel can’t create any n-grams that don’t also appear in the source text, but those n-grams are jammed together in new ways. Hodgepodge words and phrases can be expected to proliferate as the recursion continues, increasing the entropy of the drivel.

The counterargument says that driveling is also a sampling process. With each round of recursion, some elements of the source text are left behind—and once lost they can never be recovered. In the long run, then, we should expect the recursive drivel to grow more monotonous, with an ever-smaller vocabulary. It’s like a biological population that steadily loses diversity for lack of new germ plasm.

By all means try the experiment for yourself. Here are the results I got when I repeatedly recycled the text of Matthew Arnold’s poem “Dover Beach.” In each round of recursion, I generated a thousand characters of drivel, which became the source text for the next round. In the snippets below, I show only the first line of the output for each round. The integer in front of each line is the level of recursion.

second-order drivel
0   Hating To trand is of Engles shor low, to Hat gone Fine shichocland,
1   p ful, Gles drawither lonce, Gles thdrand lon so of sh Only he again;
2   an, th Only heith re by th re Fret ebbland long Thering and is drawits
3   nly he And, thelas of dar turn, Wits of drawith turn, up fliffs ong
4   e and by herin; And, the and by th turn, we and brin; And by heit up
5   Wit up flike and, to-night up flike Fing pebblas onch’s of Fin; And, to-
6   tretretretretretretretretretretretretretretretretretretretretretretret

fourth-order drivel
0   d flight Gleams, So various, so beautiful, so beautiful, so beautiful, so
1   beautiful, so beautiful, so beautiful, so beautiful, so beautiful, so

sixth-order drivel
0   his mind then again begin, and round earth’s shore Lay like the sound a
1   rue To one another! for the world. Ah, love, let us be true To one
2   , let us be true To one another! for the world. Ah, love, let us be true
3   rue To one another! for the world. Ah, love, let us be true To one

eighth-order drivel
0   g ago Heard it on the Aegean, and it brought Into his mind the turbid
1   r the world. Ah, love, let us be true To one another! for the world. Ah,
2   ve, let us be true To one another! for the world. Ah, love, let us be true
3   world. Ah, love, let us be true To one another! for the world. Ah, love,

In every case, the process quickly reaches a fixed point—and a rather boring one at that. The banana phenomenon is doubtless a major factor in what we’re seeing here; it would be interesting to rerun the experiment with an algorithm immune to that flaw. Also important are finite-size effects. I would like to believe that the outcome would be different if we could generate infinite streams of drivel from an infinitely long source text. The trouble is, I can’t really imagine what an infinitely long source text would look like. If an endless lyric by Matthew Arnold is not trivially repetitious (“so beautiful, so beautiful, so beautiful”) then it has to be some sort of enumeration of all possible combinations of n-grams. In either case, it seems rather drivelish, even without algorithmic help.

Posted in computing, linguistics | 9 Comments

Driveling

“The fine art of turning literature into drivel” is a specialty of mine. I’ve been doing it for 30 years. Here is a specimen of drivel that I extracted from Walter Benjamin’s 1936 essay “The Work of Art in the Age of Mechanical Reproduction”:

Artistic receptivity which, by means of gas masks, terrifying megaphones, flame throwers, and beaches which face the public present conditions. The camera, with a theology of art. We do not deny that individual reactionary manner by the unarmed eye. A bird’s-eye view best captured by a clock.

If the rhetoric of megaphones and flame throwers is too strident for your taste, try some Edgar Allan Poe, made even more breathless than usual by the algorithmic mixmaster:

And the faintly rappiness of the accompare, name a tappiness: his holier odorous and unwonted heart a spirits hallowed fast all! While even in the Naiad from its purple curtain rustle throbbing bride In her brow What wild weird clime that her head, Repenting from the headlong– Her world of moan

Maybe you can guess whose n-grams these are:

These frequencies of memory. The programmer for banana phenomenon. Am I the original text, but writing familiarity. It is nonsense, but nevertheless it shows only those days, so I had to the rhetoric of megaphones, flame them down one another corresponding on typewriters. In second-order drivel.

Yes: That’s a self-referential pastiche of the very document you are reading at this moment. And don’t bother telling me that the scrambled version is more fun than the original; I already know that.


I’m going to say a bit more about the algorithms behind this silliness, but first I invite you to go make your own drivel. The program that generates this mish-mash runs as a web app. You can mangle a few cultural treasures that I’ve looted from Project Gutenberg and elsewhere. And you may also be able to drivelize texts of your own choosing. (For this you’ll need an up-to-date web browser.) Have fun, but come back when you’re done.


So how do we transform literature into drivel? The simplest strategy is just to choose letters randomly and independently from the source text and write them down one after the other. Call this zeroth-order drivel:

,ca adigaigjr nre hs n eveel’adnwbtfs s!notm samhhd cdseghs xhi annm no,eghkg ne ttidpatlgtrirTefgsuw g ehilehn:tosiceerlI”u loaotiiuom aou

The source text in this case is Alexander Pushkin’s novel-in-verse Eugene Onegin (in an English translation by Charles H. Johnson). The drivel mimics the letter frequencies of the original—lots of e’s and t’s, fewer f’s and v’s—but captures no information at all about the sequence in which the symbols appear in the original.

After zeroth-order drivel comes first-order drivel, which takes each character of the source text and tallies the frequencies of all the symbols that might possibly follow it. These frequencies yield a table of probabilities for the next drivel character. Suppose at some point in the driveling process we have just generated a v. Then we must find out what characters follow v in the source text, and how often each of them appears. It turns out there are 1,415 v’s in the Onegin text, and they are followed by 15 different characters:

   pair     count   proportion
    ve       974      0.6883
    vi       193      0.1364
    va       102      0.0721
    vo        72      0.0509
    vg        24      0.0170
    v         11      0.0078
    v,        10      0.0071
    vy         8      0.0057
    vs         5      0.0035
    vu         4      0.0028
    v'         3      0.0021
    vn         3      0.0021
    vé         3      0.0021
    v;         2      0.0014
    vá         1      0.0007

Thus the next character of the drivel stream will be an e with probability 0.69, an i with probability 0.14, and so on. Whichever symbol is chosen from this list will become the seed for the next pass through the first-order driveling process. If it’s an a, for example, then in the next round the algorithm will look at all the characters that can follow a (there are 35 of them), and make a choice according to the relative frequencies.

First-order drivel is a huge improvement over the zeroth-order stuff, but nevertheless it shows only the vaguest glimmers of linguistic structure:

andilielderonlyo oouadion t, honin Itheatishend aricivee d porg ad eandd t this. ad k: withiofou hene tone Gontrs ted)— ct), he touthet t h, te g owsin d.

Common digraphs such as th have begun to show up, and every now and then we get a fully formed word, such as this. But for the most part it’s still monkeys pounding on typewriters.

In second-order drivel, symbols are taken two at a time, and probabilities are calculated for all possible successors to each pair. The result is something you could almost read aloud (perhaps with a Scots accent):

The of hise fing flink waime ing, My rectearks the gold thaught worns haverne’s peave st ands he bre drink alloveremid, ane why ned, heady; ifer grens, evill

With third-order drivel, probabilities are calculated from triples of letters:

We meetness pressons, and of his longuisencess he giddles, bence wing; Tatyana ladiesteps, the pain in to the servings, or sere nevent I catch oth

Here we enjoy brief moments of seeming lucidity, as letters condense into words, and sometimes the words organize themselves into phrases—but then the protodiscourse dissolves back into longuisencess again.

algorithm for third-order drivel

How to turn literature into drivel: The four panels show the third-order drivel algorithm in action with letter frequencies derived from Pushkin’s Eugene Onegin. The initial seed (far left) is the three-letter sequence dri. Out of the seven characters that are observed to follow dri in the Pushkin text—shown with their frequencies in parentheses—the algorithm makes a weighted random choice, which in this case is the letter v. In the next round, the seed sequence is riv, and the chosen letter is e. Then the pattern ive leads to selection of an l, and finally vel is followed by a word space (denoted #). It’s worth noting that the complete word drivel does not appear in the text of Onegin.

The drivel examples given at the beginning of this article come from higher-order drivelers, with probabilities calculated from strings of six, seven or eight characters. At this level there’s no doubt we’re in the realm of language rather than mere alphabet soup. Even though there’s no sign of grammatical structure or meaning, what comes out of the program is immediately recognizable as English. If you know what to look for, the writerly tics of individual authors begin to show through. This is what intrigued me when I first began driveling back in 1983. In a “Computer Recreations” column I wrote:

What is remarkable is that the product of this simple exercise sometimes has a haunting familiarity. It is nonsense, but not undifferentiated nonsense; rather it is Chaucerian or Shakespearian or Jamesian nonsense. Indeed, with all semantic content eliminated, stylistic mannerisms become the more conspicuous. It makes one wonder: Just how close to the surface are the qualities that define an author’s style?


The program that generated my 1983 drivel was written in Microsoft BASIC and ran on a PC with two floppy drives but no hard disk. There was no Project Gutenberg in those days, so I had to type in the texts myself. I spent a long, bleary night transcribing the “Ithaca” chapter from James Joyce’s Ulysses. (The chapter, also known as Molly Bloom’s soliloquy, is a marathon run-on sentence.) Once the keyboarding was done, the program would grind out drivel at a rate of one or two characters per minute.

The first version of the program was built around matrices of precomputed probabilities. Given an alphabet of s symbols, a first-order drivel program needs an s × s matrix, like this one for a 28-symbol alphabet:

The 1983 article used a different numbering scheme for drivel orders, calling this a second-order table.

Hayes drivel matrix 640x418

The chart is based on the third act of Hamlet. Find the row corresponding to the current symbol in the drivel process, then scan across the columns to read off the probabilities for all possible successor symbols. (The last two symbols are the apostrophe and the word space, represented as ‘#’.) Note that this is a stochastic matrix: The entries in each row sum to exactly 1.

Moving on to second-order drivel, the matrix expands to s2 rows, which accommodate all the two-symbol sequences from aa through ##, each row having s columns. In the case of a 28-symbol alphabet, that comes to about 22,000 matrix elements, which seemed like Big Data back then. The next stage, a third-order matrix, would have more than 600,000 elements, which was more than I could squeeze into 256 kilobytes of memory. The array wouldn’t even fit on disk (maximum capacity 320 kilobytes).

I gave some thought to sparse-matrix methods, but then I had a better idea.

The fact is, all the information that could be incorporated into any frequency table, however large, is present in the original text, and there it takes its most compact form…. What the frequency table records is the frequency of character sequences in the text, but those sequences, and only those sequences, are also present in the text itself in exactly the frequency recorded.

The algorithm suggested by this observation searches through the full source text to create a new, one-dimensional frequency table for each seed pattern. Instead of precomputing and storing the probabilities associated with every possible seed, we regenerate them on the fly, as needed. The saving of space comes at the expense of wasting time, since the same search is repeated for every occurrence of an n-gram pattern. Various methods of hashing, caching or memoization could have fine-tuned the tradeoff between space and time, but I didn’t know that then. A few correspondents pointed it out after the article appeared.

Several other correspondents mentioned another shortcut, which I’m going to call the Shannon algorithm. In a followup column I wrote:

Bobby Bryant, James W. Butler, Ronald E. Diel, William P. Dunlap and Jim Schirmer pointed out still another algorithm that is not only faster than the one I gave but also appreciably simpler. It eliminates frequency tables entirely. When a letter is to be selected to follow a given sequence of characters, a random position in the text is chosen as the starting point for a serial search. Instead of tabulating all instances of the sequence, however, the search stops when the first instance is found [wrapping around to the beginning if necessary], and the next character is the selected one. If the distribution of letter sequences throughout the text is reasonably uniform, the results should closely approximate those given by a frequency table.

The caveat at the end of that paragraph merits a further comment. When the distribution of sequences is not uniform, the failure can be spectacular. In HAKMEM item 176 Bill Gosper named it the banana phenomenon. Suppose you are running the Shannon algorithm on a 100,000-character text that includes a single instance of the word banana and no other instances of the trigrams ana and nan. Let the seed pattern be ana. When you start a sequential search from a random point in the string, you have 99,998 chances to come upon the first ana and only two chances to reach the second instance.

In spite of this pitfall, I have adopted the Shannon algorithm for the online drivel generator; it’s just too easy and elegant to pass up. But it can indeed wander into strange blind alleys. For a live demonstration of what can go wrong, try some third-order driveling with the file named “Hayes-banana.txt”.

Why do I call this the Shannon algorithm? Because Claude Shannon described it in 1948, in “A Mathematical Theory of Communication.” His implementation involved opening a book at a random page. He gave several examples of convincing drivel he produced by this pencil-and-paper method.

Shannon wasn’t the only predecessor. Even earlier—a full century ago—A. A. Markov wrote his paper on “An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains.” It was a recent encounter with this paper that brought me back to driveling 30 years after my first adventure. As noted here a few weeks ago, I have recently given a talk on the early history of Markov chains. The video (FLV, MP4) and audio (MP3) are now online, and so are the slides (HTML). Finally, my latest American Scientist column is on the same theme.


There remains a personal question: Why does this kind of goofiness amuse me so? Am I the only one susceptible to its charms? Perhaps it is a vice that I should keep to myself, like an inordinate fondness for puns or limericks. But, the fact is, I glimpse something both comic and poetic in some of this drivel. The fourth-order drivel below is based on the banana file. In some sense I wrote this, and yet I can take very little credit for its imagery, its cadences, its sheer cleverness for clues, theorems and eyeglasses.

Figuring for banana or Mississississississississing for me, I have made it right, new particles, the University of sciency, but writing for its sheer clues, the University of the algorithm is widely admired for clues, theorems. As for banana or Missing link, the source of the back page of the source of the University of the University of the Holy Grail, the University of various permutations, and F’s of the Holy Grail, the University of the University of my chosen pattern word algorithm has not easy. Indeed, I spend of the University of the University of sciency, but writing for newspaper, M’s and F’s of theorems. As for newspaper, now of theorems. As for newspaper, M’s and eyeglasses. The detective search of the Northwest Passage, the Holy Grail, theorems. As for its sheer cleverness as a lot of one another, now of sciency, but writing a computer programmer for banana or Missississississississing link, the Holy Grail, theorems. As for its sheer clues, theorems. As for banana or Missississississing for bargains, the University of the University of the University of the Holy Grail, the source of the plumber for clues, the Northwest Passage, the University of the Holy Grail, theorems. As for me, I have made it right, the algorithm was invented by Robert S. Boyer and eyeglasses.

Posted in computing, linguistics | 8 Comments

Joshua Trees and Toothpicks


After the Joint Mathematics Meetings in San Diego last month, I took a day off for some botanical and mathematical tourism. I drove up to Joshua Tree National Park, in the high desert beyond the San Bernardino Mountains.

Joshua tree silhouette, Hidden Valley, Joshua Tree National Park

The park’s namesake is the cheerfully odd Yucca brevifolia. According to a park brochure and web site, the Joshua tree used to be a lily, but now it’s an agave. The same brochure explains a little about the growth and development of the plant. Initially, a single stalk grows straight upward and eventually produces a flower at the apex. If that flower is fertilized, the fruiting process destroys the meristem—the actively growing tissue at the tip of the stem. The stalk then bifurcates, producing two limbs of roughly equal size, each with a new apical meristem. The branches grow in unpredictable directions. When the two new growth tips ultimately flower and fruit, they too bifurcate, creating four apices.

Joshua trees with 1 2 4 8 branches

Joshua trees with 20, 21, 22 and 23 terminal tufts of green, bristly leaves.

What could be geekier than that? It’s a symmetrical binary tree, straight out of a computer science textbook. (Except that the textbook would turn the tree upside down.) Abstracting away all the shaggy biological details, a topological model of an idealized Joshua tree would look like this:

Duke window with binary tree pattern 0987

The same motif adorns the windows of a building at Duke University.

Joshua tree 10

In this diagram I have taken the liberty of filling in the unseen, underground portions of the plant by assuming symmetry: I give the root system the same branching structure as the above-ground parts. When this figure is turned 90 degrees, it is known as the H curve; in this orientation I guess it must be the I curve.

Before going any further with this story, I have to admit that the graph-theoretical structure of Yucca brevifolia is not quite as precise and regular as I first thought it might be. Not all the trees are strictly binary. As the sun came up in the park I soon spotted some trifurcations. Later in the day I found even odder branching patterns.

Trifurcated Joshua trees 1315 and 1414

So once again nature tries to implement an algorithm, and before long her mind begins to wander, producing doodles that are nowhere to be found in the specification. But that’s all right. My mind was wandering, too. As I trekked through the groves of Joshua trees, I kept reverting to thoughts of “toothpick trees.” These are structures I had learned about the day before from Neil J. A. Sloane, the sequencemeister of the Online Encyclopedia of Integer Sequences. I had run into Neil on the exhibit floor at the San Diego meeting, where the OEIS Foundation had a booth.

I sensed a connection between Joshua trees and toothpick trees. In the idealized H curve shown above, the branches added in each generation are a little smaller than their parents. The shrinkage factor is \(1/\sqrt{2}\), which I applied to both the length and the thickness of the branches. What happens if the branches don’t shrink? They begin colliding with one another after just three generations. That’s the situation in a toothpick tree. It is formed on essentially the same pattern as the H curve, but without the shrinkage factor; instead we adopt a rule that whenever a branch touches another branch, the colliding tip is “sterilized” and no new branches grow there.

Here is a more precise description from the paper by David Applegate, Omar E. Pol and Sloane that introduced the idea of toothpick trees:

We start with an infinite sheet of graph paper and an infinite supply of line segments of length 1, called “toothpicks.” At stage 1, we place a toothpick on the y-axis and centered at the origin. Each toothpick we place has two ends, and an end is said to be “exposed” if this point on the plane is neither the end nor the midpoint of any other toothpick.

At each subsequent stage, for every exposed toothpick end, we place a toothpick centered at that end and perpendicular to that toothpick. The toothpicks placed at odd-numbered stages are therefore all parallel to the y-axis, while those placed at even-numbered stages are parallel to the x-axis.

I have cobbled together a JavaScript-and-SVG program for assembling and disassembling toothpick trees up to stage n = 128. At this stage the number of toothpicks is 10,923, which is a lot of toothpicks if you buy them in boxes of 250. On the other hand, it’s a whole lot smaller than the 2128–1 branches of the full binary tree.

The interactive version of this illustration relies on “inline SVG,” that is, Scalable Vector Graphics included directly within an HTML document. If you’re seeing a static illustration, without any buttons to click, I’m afraid your browser doesn’t support this feature. In my tests the program works in recent versions of Chrome, Safari, Firefox and Opera; it’s very unlikely to work in RSS readers. There is a stand-alone version of the interactive illustration here. And David Applegate has another “movie version” based on different programming technology.

The successive toothpick totals form sequence A139250 in Sloane’s OEIS. The notes accompanying that listing point out lots of interesting facts about the pattern and the process that generates it. If you run the animation, you can’t help noticing the distinctive behavior as n approaches each integer power of 2, or the repeating pattern in which a square block has a smaller square “ear” at each corner. And no toothpick after the initial one three crosses either the x or the y axis. (When the x or y coordinate is a power of 2, toothpicks from opposite sides meet at points along the axis, but they do not cross it.)

The notes also mention a conjecture, which I assume remains open:

Conjecture: Consider the rectangles in the sieve (including the squares). The area of each rectangle (A=b*c) and the edges (b and c) are powers of 2, but at least one of the edges (b or c) is <= 2.

The rectangles at issue in this conjecture are “open” rectangles, with no toothpicks or parts of toothpicks inside of them. I’ve become curious about more general squares and rectangles, defined as any axis-aligned quadrilateral whose perimeter is traced by an unbroken chain of toothpicks or half-toothpicks, regardless of what’s in the interior. Here are a few squares discovered in a small sample of the toothpick pattern:

Squares in toothpick graph

Matrix of rectangles

The unit of distance in this compilation is half of a toothpick, since that’s the smallest square that can possibly appear. I have highlighted squares with side lengths of 1, 2, 3, 4, 6, 7 and 8 units. Is there a square of side length 5? How about 9 and 17? Looking for rectangles more generally rather than just squares, I enlisted the help of Ros. The black dots in the matrix at right represent all the rectangles we were able to find by hand and eye. (I have not yet written a program to search more systematically.) The red circles at 5-by-5 and 9-by-9 are vacancies that seem particularly intriguing. Do those squares exist anywhere in the toothpick tree? If not, is there some simple argument to explain why?


Into the sun 1337

Just between us pedants, I should mention that neither Joshua trees nor toothpick trees are actually trees. The Joshua tree isn’t woody; the toothpick tree has cycles.

Joshua trees and toothpick trees, natural trees and mathematical trees: They are very different, but I’m fond of them both.

A day spent in the desert sun admiring the idiosyncrasies of Yucca brevifolia feels quite unlike a day spent coding H curves or toothpick trees in JavaScript, or poring over printouts searching for 5-by-5 squares. But I wouldn’t want to have to choose between those activities. And I’m particularly pleased when I can make a connection between them, however tenuous and fanciful.

Posted in biology, mathematics, photography | 6 Comments

100 Years of Markov Chains

On January 23, 1913, the Russian mathematician Andrei Andreyevich Markov addressed the Imperial Academy of Sciences in St. Petersburg, reading a paper titled “An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains.” The idea he introduced that day is the mathematical and computational device we now know as a Markov chain.

Markov chain banner

On January 23, 2013, the Institute for Applied Computational Science of the Harvard School of Engineering and Applied Sciences will celebrate the centenary of this event. If you are in the Boston area and would like to attend, consider this your invitation. See the announcement for details of when and where. There will be three talks:

First Links in the Markov Chain: Poetry and Probability.
Brian Hayes, American Scientist magazine

From Markov to Pearl: Conditional Independence as a Driving Principle
for Probabilistic Modeling.

Ryan P. Adams, SEAS Computer Science

Applications of Markov Chains in Science.
Pavlos Protopapas, Harvard-Smithsonian Center for Astrophysics and SEAS

Markov’s 1913 paper was not his first publication on “samples in chains”; he had written on the same theme as early as 1906. So why celebrate now? Well, for one thing, it’s too late to do it in 2006. But there is another reason: It was the 1913 paper that was widely noticed, both in Russia and abroad, and that inspired further work in the decades to come. The earlier discussions were abstract and technical, giving no hint of what the new probabilistic method might be good for; in 1913 Markov demonstrated his technique with a novel and intriguing application—analyzing the lexical structure of Alexander Pushkin’s poem Eugene Onegin. Direct extensions of that technique now help to identify genes in DNA and generate gobbledygook text for spammers.

Sticklers for calendrical accuracy might raise another question about the timing of this event. On January 23 it will not yet be 100 years since Markov spoke in St. Petersburg. In 1913 Russia had not yet adopted the Gregorian calendar; when the nation did so in 1918, it skipped ahead by 13 days. If you are troubled by this calendrical lacuna, you may want to organize your own symposium on February 5.

Update: Coverage of Markov Day in the Harvard Gazette.

Update 2013-02-24: Recordings of the talk are now available. Audio: MP3. Video: FLV, MP4. The slides are also on the web: HTML. Finally, my latest American Scientist column covers much of the same material: HTML, PDF.

Posted in computing, mathematics | 3 Comments

The Flyover States

contour plowing 01047

On a flight from Boston to San Diego yesterday I had a window seat on the shady side. The weather was clear, and over much of the continent a dusting of snow combined with low winter sun angles to highlight subtle features of the landscape. It was like flying over a painting—maybe a Mondrian, maybe something by Chuck Close.

The photo above comes from central Kansas, just west of Manhattan. I don’t entirely understand what I’m seeing here. The conspicuous double wavy lines in the fields near the center of the image look like contour plowing, but the scale is all wrong; they are wider than the roads. I think they are either ditches to promote drainage or berms to prevent it. The structures are visible on Google Maps, but there they lack the chiaroscuro effect of snow and sun that gives them such drama here. What accounts for the color differences between fields?

somewhere over southwest Illinois

The image above is from 40 minutes earlier in the flight, over southwestern Illinois. Here the rectilinear grid of midwestern agriculture is a mere overlay on dendritic natural drainage patterns, again with bright highlighting. Apparently the snow has been swept off of flat land and gathered on slopes. The overall effect reminds me of military camouflage.

center-pivot irrigators

Western Kansas or southeastern Colorado: The landscape has gone all dotty with center-pivot irrigators. Always a cheering sight. If we’re going to turn the country into a checkerboard of quarter-section fields, we might as well put checkers on them. That’s probably the Cimarron River running through the middle of the frame, but I haven’t been able to identify the exact spot.

northern New Mexico

Above, more painterly drainage patterns, but this time the paintbrush is entirely in the hands of nature. I believe this is northern New Mexico, not far from Farmington.

housing developments in Scottsdale, AZ

At right, we’re back on the grid again—sort of. Those tiny chiclets packed together so tighly are residential neighborhoods in Scottsdale, Arizona. They can’t escape the rectilinear pattern defined by the major streets, but inside those squares they do their best to imitate the swirly lanes and cul de sacs of a suburban housing development.

Finally, below, orthagonality reasserts itself with a vengeance in the Imperial Valley in southern California, the desert basin where most of the Colorado River winds up. At the top is the Great Oops called the Salton Sea—or what’s left of it.

Imperial Valley and Salton Sea

Posted in photography | 7 Comments

Dante’s Infernet

In a few days bit-player will celebrate its seventh birthday. (The first published post was dated January 9, 2006.) The original design for the web site was thrown together in haste, and I’ve long been meaning to give it a makeover. I’ve also had a hankering to get away from a managed hosting service and set up my own server. Over the past six weeks I’ve finally done both of those things. Here’s a brief account of the blog’s rebirth. (If you’re reading this via the RSS feed, you might want to take a glance at the web site.)

Lasciate ogne speranza, voi ch’intrate

Dore Inferno the wood of suicides

Gustave Doré’s illustration for Canto 13 of Dante’s Inferno, where the narrator and his guide Virgil enter the wood of the suicides and spendthrifts. Source: Wikipaintings.

the circles of the infernetAfter a few weeks of playing sysadmin, I’ve concluded that the modern world of internet computing is organized just like Dante’s Inferno, with concentric circles of torment that get progressively deeper and darker as you travel toward the center.

The outer suburbs are not such a bad place to live. The people there speak HTML and CSS and JavaScript, and they amuse themselves by writing cute little programs that display words and pictures on other peoples’ screens. The worst punishment these souls suffer is being made to write <!--[if IE 6]> over and over again all day long.

As you descend into the inner circles of the computational underworld, however, the light fades; you find yourself in a maze of twisty little passages, all alike; in the darkness around you confused voices cry out in strange tongues; dæmons roam the woods.

In the case of bit-player, what lies immediately below the sunny stratum of HTML-CSS-JavaScript is the WordPress blogging platform. Beneath WordPress is the programming language PHP, which builds complete web pages from fragments of HTML. Next is a MySQL database, which has a programming language of its own. Deeper still, we come to the Apache web server. (I’ve only recently learned the story behind the name Apache. According to Brian Behlendorf, it’s a play on “a patchy server”; although the Apache Foundation FAQ pooh-poohs this idea, it has the ring of truth.) Under Apache lies the Linux operating system.

The Infernet has still deeper levels, although my Dantean tour didn’t spend much time at the bottom of the pit. I had to make a brief visit to the land of DNS—the domain name system—where the denizens speak knowingly of A records and CNAMEs and such. I never had a need to skate on the frozen lake of TCP/IP.

Google Maps image of building at Bank and Halsey Streets in Newark NJ, with lots of generators and chillers on the roof

Maybe my server is here? (This is 165 Halsey Street in Newark, where the rooftop is chockablock with chillers and generators.)

If you’re reading this, then in some sense my crazy project succeeded. I have a shiny new virtual server (from Linode) running Linux and Apache and MySQL and PHP and WordPress. It spits out pages that web browsers seem to recognize as valid HTML and CSS and JavaScript. And when you ask the Internet for one of those pages, the series of tubes apparently knows how to find my little server and retrieve what you want. (That’s more than I can do. The machine is said to be in Newark, NJ, but I have no more specific coordinates.)


Now that (most of) the work is done, I’m happy enough with the outcome. I’m having fun designing new ways to waste cpu cycles (yours as well as mine—the program animating the front-page banner runs in your browser, not on my server). As for the deeper levels—well, I can’t really say I’m getting a lot of personal fulfillment out of tweaking settings in /etc/apache2/hpptd.conf and running sudo chmod -r 755 on directories, but I suppose these are things that every educated person in the 21st century is supposed to know about.

Next time, though, I may do it differently. Rather than move WordPress to a new server (a body transplant), I’ll keep the same server and replace WordPress with something else (a head transplant).

I hasten to add that I don’t blame WordPress for my troubles. It’s a marvel of our age: Fill in a few blanks, push a few buttons, and Presto! you’re a publisher. But there’s a price to be paid for the push-button interface. Simplicity on the surface leads to gnarly complexity inside. In the course of my redesign, I wanted to do a number of things for which WordPress has no built-in push button. I was soon hacking my way through dense thickets of other peoples’ code.

I’m well aware that such slash-and-burn programming is frowned upon in the WordPress community. (“Every time you hack core, God kills a kitten.”) I could make excuses. I could claim that none of the 1,656 themes offered on the WordPress web site matched my exact needs, and neither did any of the 22,956 plugins. I could argue that my way of doing things is intrinsically better than anything the hundreds (thousands?) of WordPress developers have come up with over the past 10 years.

No, I don’t believe those things either.

The simple truth is, I’m an incorrigible do-it-yourselfer. Call it a character flaw, or a way of life. But I suppose that if I really want to go it alone, I should go all the way, and build the next bit-player from scratch.

Posted in computing, meta | 4 Comments