Archive for the ‘mathematics’ Category

The teetotaler’s walk

Monday, February 22nd, 2010

Writing about Pólya’s recurrence theorem led me to pick up Gerry Alexanderson’s book The Random Walks of George Pólya. He tells a sweet anecdote about the origin of the theorem.

The random walk is sometimes called the drunkard’s walk, but Pólya’s musings on the subject were not inspired by a disorienting pub crawl. On the contrary, the idea came to him while he was living at the Kurhaus Zürichberg, which Alexanderson notes was founded as a temperance hotel. (It seems to be a somewhat different establishment now, and probably beyond the means of a Privatdozent.)

Alexanderson describes the incident that led Pólya to the recurrence theorem:

There was a particular wooded area near the hotel where he enjoyed taking extended walks while thinking about mathematics. He would carry pencil and paper so he could jot down ideas as they came to him. Some students also lived at the Kurhaus and Pólya got to know some of them. One day while out on his walk he encountered one of these students strolling with his fiancée. Somewhat later their paths crossed again and even later he encountered them once again. He was embarrassed and worried that the couple would conclude that he was somehow arranging these encounters. This caused him to wonder how likely it was that walking randomly through paths in the woods, one would encounter others similarly engaged. This led to one of his most famous discoveries, his 1921 paper on random walk, a phrase used for the first time by Pólya.

It’s interesting that the question raised by those embarrassing woodland encounters differs from the usual statement of the recurrence theorem, which talks about a single walker’s return to the point of origin. The answer is the same, however. In an appendix to the Alexanderson book, K. L. Chung mentions an easy proof of this equivalence: Suppose two walkers begin together at A and run into each other again at B. Nothing about the statistics of a random walk changes if you reverse the direction of all the steps. Thus the probability of the rencontre at B is the same as that of one walker going from A to B and then back to A again.

The illustration below shows the paths of two random walkers that meet three times (green circles) after leaving the origin, supporting the plausibility of Pólya’s claim that he was not stalking those young lovers in the woods.

rencontres-1.png

(What’s not so plausible is that any non-inebriated walker would follow a truly random path.)

Gruenberger’s prime path

Tuesday, February 16th, 2010

Fred Gruenberger may well have been the first blogger on computational topics. When he was writing, back in the 1970s, there was no RSS, and so he distributed his musings in a monthly newsletter called Popular Computing. A typical issue was 16 or 20 typewritten pages–stapled, folded, stamped and delivered by mail. It was always worth reading.

Gruenberger had been working and playing with computers since the 1940s. For a long stretch he was at the RAND Corporation, the famous think tank in Santa Monica. Later he taught at Cal State Northridge. In addition to Popular Computing he was involved in the startup of Datamation magazine and published at least a dozen books. I haven’t been able to learn much about his later years; he died in 1998.

A slogan that appeared in some issues of Popular Computing proclaimed: “The way to learn computing is to compute.” I took this advice to heart, although I was hampered by a total lack of hardware. Later on I acquired a programmable calculator, which helped on some of the problems and exercises.

Problem 149, from Popular Computing Vol. 4 No. 12, December 1976

The problem reproduced above appeared in the December 1976 issue of Popular Computing (Vol. 4, No. 12). At the time, I made no attempt to work this one out, but evidently the problem seemed interesting enough to be worth filing away. When I came upon the old clipping recently, I gave it a closer look and realized I have no idea how to answer Gruenberger’s question, though the impediment now is not lack of hardware.

Gruenberger asks us to trace a planar path whose steps are indexed by the odd integers starting at 3. For each number N we turn right 90 degrees before taking a step if N is a prime congruent to 1 mod 6; we turn left 90 degrees before moving one unit if N is a prime congruent to −1 mod 6; otherwise we continue straight ahead in whatever direction we happen to be facing.

In his typewriter graphics, Gruenberger plotted the trajectory from N=3 through 97. Below I continue the path through N=199.

trail201b.png

But something’s amiss here. Gruenberger wrote:

Eventually the path will cross itself, so that the cell containing 111 will also contain 147. Similarly, one cell will contain both 91 and 179.

Those two self-intersections are nowhere to be found in the diagram. When I first noticed this discrepancy, I assumed I must have made a mistake somewhere. (This eagerness to blame myself is not mere knee-jerk humility; I have years of experience to back it up.) Eventually, though, I concluded that it was Gruenberger who had made the wrong turn. I believe he mistakenly went left at 127, as shown in the brown trail below:

trail201b-error.png

The brown continuation of the red path includes the two coincidences mentioned in Gruenberger’s problem statement. But the left turn at N=127 is incorrect, because 127 is a prime equal to (6×21)+1, and thus it should specify a right turn. The error is of no great consequence, but it does reveal something interesting: Gruenberger must have been plotting these paths by hand. Most likely he wrote a program to compute the series of residue classes, then traced out the trajectory on squared paper.

Setting aside this anomaly, Gruenberger was quite right that the path does intersect itself. Here’s the trail continued through N=1,001:

trail1001b.png

And if that’s not tangled enough, here’s what it looks like at N=10,001:

trail10001c.png

Gruenberger asks for “a list of the contents of those cells containing more than one number, arranged in the order of the smallest number in the cell.” It’s not hard to identify some cells that belong on such a list. The table below includes all multiply-occupied cells discovered when tracing the path up to N=1,001, sorted as Gruenberger requests:

                   x    y    values of N
                 -11   28    (137 337)
                 -15   27    (147 683)
                 -16   27    (149 349 685)
                 -18   26    (155 355)
                 -19   27    (159 691)
                 -19   28    (161 693)
                 -19   29    (163 695)
                 -17   31    (171 319)
                 -18   32    (175 315)
                 -19   32    (177 701)
                 -20   32    (179 703)
                 -22   31    (185 769)
                 -23   31    (187 771)
                 -24   31    (189 773)
                 -30   41    (245 269)
                 -30   42    (247 271)
                 -27   40    (281 733)
                 -26   40    (283 735)
                 -26   37    (289 725)
                 -23   35    (299 715)
                 -22   35    (301 761)
                 -21   35    (303 759)
                 -20   35    (305 757)
                 -17   27    (351 687)
                 -18   27    (353 689)
                 -17   24    (361 673)
                 -16   24    (363 675)
                 -15   24    (365 677)
                 -17   21    (379 667)
                 -17   22    (381 669)
                 -17   23    (383 671)
                 -20   22    (391 631)
                 -20   21    (393 633)
                 -20   20    (395 635)
                 -20   19    (397 637)
                 -22   19    (401 593)
                 -22   18    (403 591)
                 -22   17    (405 589)
                 -22   16    (407 587)
                 -27   15    (419 575)
                 -27   14    (421 573)
                 -28   14    (423 819)
                 -29   14    (425 549)
                 -32   14    (431 539)
                 -32   13    (433 537)
                 -26   10    (563 831)
                 -26   11    (565 829)
                 -27   13    (571 823)
                 -28   18    (607 811)
                 -22   32    (707 767)
                   4   -6    (923 971)
                   4   -7    (925 969)
                   4   -8    (927 967)
                   4   -9    (929 989)
                   5   -9    (931 991)

Is this list the answer to Gruenberger’s question? No, it’s not, because there’s no reason to stop at an arbitrary limit such as N=1,001. Indeed, the list above is not even a prefix of the complete answer. The smallest value of N appearing in the list is 137, but the trail will eventually revisit cells occupied by smaller values of N. For example, continuing the experiment to N=10,001 reveals a bunch of intersections quite close to the beginning of the path, including a site that’s visited five times:

                   x    y    values of N
                   1    0    (5 1621)
                   1    1    (7 1623)
                   2    1    (9 4725)
                   3    1    (11 1263)
                   3    2    (13 1265)
                   5    3    (19 1635)
                   6    3    (21 1637)
                   7    4    (25 7537)
                   7    5    (27 7319 7539)
                   7    6    (29 7505 7541)
                   6    6    (31 1643 7323 7503 7543)
                   6    7    (33 1645 7325)
                   6    8    (35 1647 7327)
                   6    9    (37 1649 7329)

One point still missing from this list is the origin–the site at x=0, y=0, N=3. Does the path ever revisit its starting point? If so, at what value (or values) of N does it come back home? Since I don’t know the answer to this question, I guess I’ll have to leave it as an exercise for the reader.

I suspect that the problem Gruenberger meant to pose (or thought he was posing) was to generate a list of self-intersection sites arranged in their natural order of occurrence–that is, the order in which the crossings are created when you construct the path starting from the origin. This natural-order list is not at all the same as a list “arranged in the order of the smallest number in the cell.” The natural-order list is easy to generate step by step. All you need to do is obey the left/right/straight rules, plot the resulting sequence of positions on the xy lattice, and leave behind a trail of breadcrumbs so you can check at each step to see if the site has been visited before. This task is a matter of straightforward computation–just the kind of assignment that Gruenberger favored. The natural-order list begins:

                   x    y    values of N
                 -30   41    (269 245)
                 -30   42    (271 247)
                 -18   32    (315 175)
                 -17   31    (319 171)
                 -11   28    (337 137)
                 -16   27    (349 149)
                 -18   26    (355 155)
                 -32   13    (537 433)
                 -32   14    (539 431)
                 -29   14    (549 425)
                 -27   14    (573 421)

Thus the prime path first crosses itself when N=269, a value that shares the same coordinates as N=245, namely x=−30, y=41. There are 56 such crossings up to N=1,001, and 112,988 self-intersections up to N=106.

*     *     *

There is a wilder, conjectural answer to Gruenberger’s challenge–which I’m pretty sure he did not have in mind. It goes like this: Maybe the complete list of revisited values of N is simply the list of all N. In other words, maybe the Gruenberger prime path fills up the entire lattice of integers, crossing over itself everywhere many times.

In 1921 George Pólya published a celebrated proof that a random walk on the lattice of integers is recurrent in one or two dimensions, though not in higher dimensions. Recurrent means that the walk returns to each point along its length with probability 1, and indeed visits every point in its domain infinitely often. Is it possible that the prime path is also recurrent?

Pólya’s theorem is one of those mind-expanding results that seem impossible on first acquaintance, and then inevitable, and finally just so amazing that you want to go kiss a mathematician. I have to confess that I’ve never gotten all the way through Pólya’s original paper (it’s not long, but it’s in German). On the other hand, I can highly recommend a little book by Peter Doyle and Laurie Snell, Random Walks and Electric Networks, which gives several alternative proofs of the theorem; it was published in the MAA’s Carus Monograph series, and there’s a postprint available on the arXiv.

The key insight underlying Pólya’s result, as I understand it, is this: If you never revisit a former home, then you must be spending eternity somewhere else, and you can do that only if your universe has enough somewhere elses that you’ll never run out of new territories to visit. Suppose that, some eons after starting your journey, you find yourself at distance r from the origin. If you’re living in a one-dimensional universe, then there are just two places you could be at that moment, namely at +r or −r. It doesn’t matter how far you run; there are still just two points at any given distance from the origin. In two dimensions, a fugitive at distance r has a little more room to maneuver; the number of available points grows in proportion to r, forming a circle of radius r. But this is still not enough room to get lost in. Only in three dimensions or more is there a nonzero probability of escape. In three dimensions, the space available at radius r is proportional to r2. In this three-dimensional world, the volume of empty space grows faster than a random walker’s expected distance from home.

What does all this have to do with Gruenberger’s prime path? Well, it’s no secret that the distribution of prime numbers looks convincingly random–if you look at it in just the right way. And in particular the distribution of primes in various residue classes, such as 6K+1 and 6K−1, seems to behave at least approximately like a random variable. All this suggests we might consider viewing the Gruenberger prime trail as if it were a random walk through the two-dimensional lattice of integers. Because the space is two-dimensional, it’s a good guess that the walk should be recurrent.

The original recurrence results of Pólya refer to a simple random walk, where at each step the walker chooses randomly among the available directions and then moves one unit in that direction. For example, in the two-dimensional lattice of integers there are four possible directions: north, south, east, west. The simple random walk is not the best model of the Gruenberger process, which is more like a nonreversing random walk–a path where on each step the walker can turn left or turn right or go straight ahead but can never make a 180-degree about-face. We can further refine the random-walk model of the Gruenberger process by biasing the choice made at each step to reflect the changing abundance of prime numbers. Primes grow scarcer as their magnitude increases; in the vicinity of a given value of N, the probability that a randomly chosen number is prime is approximately 1/log N. Since the Gruenberger path goes straight for all composite numbers and turns only when N is prime, the trail will have longer and longer straight segments, and rarer turns, as N increases. A random walk can mimic this behavior by choosing an action at each step according to this logic:

    if random(1.0) > 2/log(N)
       then go straight
       elseif randomboolean()
           then turn left
           else turn right

(The proportion of primes is given as 2/log(N) rather than 1/log(N) because the Gruenberger process is defined on odd numbers only, which immediately eliminates half of the composites.)

One way to compare the various kinds of random walks is to measure the root-mean-square displacement–the distance from the origin to the final position of the walker, averaged over many realizations of the random process. For a simple random walk, the RMS displacement for an N-step walk converges to \(\sqrt{N}\); for the nonreversing random walk the average displacement is \(\sqrt{2N}\). The biased random walk based on the distribution of primes also appears to yield an RMS distance proportional to the square root of the number of steps; numerically the curve looks something like \(\sqrt{10N}\). I’m not entirely sure that’s the true form of the curve, but the geometric details don’t really make much difference. If I understand correctly, all three of these random processes should be recurrent in the sense of Pólya.

rms-graph5.png

Does the same reasoning apply to the Gruenberger prime path? There are two sides to this question.

The naysayer points out that Pólya’s theorem applies to random walks, but there’s nothing truly random about the sequence of primes. After all, we have a straightforward, deterministic algorithm for generating primes, as well as an efficient algorithm for testing whether any given integer is prime or composite. The essence of a random process is that every time you run it you get a different result, but there’s only one sequence of prime numbers, and so the Gruenberger prime path will come out exactly the same every time. According to this view of things, the kind of probabilistic reasoning that goes into the proof of Pólya’s recurrence theorem is out of bounds here. For randomness to make any sense, you need to average over some ensemble of independent instances. For example, you could average over the 50 salmon-pink paths in the graph below, which represent 50 independent realizations of a biased random walk; you can’t average over the prime path itself (green), because there’s only that one path.

random-prime-paths-51.png

The yeasayer retorts that a single path is all you need–if the path is infinitely long. Indeed, the salmon-colored trails above could be interpreted not as 50 distinct runs of a random process but as 50 segments of a single long path, which repeatedly loops around through the point at x=0, y=0, wanders off in various directions, and then comes back home yet again. In essence, everything that could possibly happen in an infinite set of random paths happens somewhere within a single infinite path; all possible variety is already present there.

I’m not sure how to settle this dispute between Dr. Yea and Professor Nay. When an argument hinges on the nature of randomness, the meaning of infinity and patterns in the distribution of the primes, I known I’m in over my head.

So I’ll leave that deep question unresolved and say a final word about a lesser curiosity. In the Gruenberger process, we’re using the congruence classes of prime numbers mod 6 as a kind of coin flip to decide which way to turn. Is it a fair coin flip? For small values of N, it certainly doesn’t look fair:

                               6x+1      6x−1
       primes <     100         11        12
       primes <    1000         80        86
       primes <   10000        611       616
       primes <  100000       4784      4806
       primes < 1000000      39231     39265

There’s a persistent excess of −1 primes, and the imbalance seems to be getting steadily larger. As a result, the prime path has a “winding number” that reaches 8.5 at N=106; that is, the path makes eight and a half net counterclockwise revolutions. Does the windup continue with still larger N? I gather that the definitive answer is “Yes and No.” For more see the masterful paper by Andrew Granville and Greg Martin cited above.

[Correction 2010-02-19: reflected the accent on Pólya.]

17 x 17: A nonprogress report

Tuesday, December 22nd, 2009

The question again: Is there a four-coloring of the 17-by-17 grid in which none of the 18,496 rectangles have the same color at all four corners? As I said last time, Bill Gasarch would not have put a bounty on this problem if it had an easy solution. Over the past couple of weeks I’ve invested some 1014 CPU cycles in the search, and a few neural cycles too. I have nothing to show for the effort, except maybe a slightly clearer intuition about the nature of the problem.

If you generate a bunch of four-colored n-by-n grids at random, the average number of monochromatic rectangles per grid increases quite smoothly with n:

random-grid-stats.png

This gradual progression might lead you to suspect that the difficulty of finding or producing an n-by-n grid that is totally devoid of monochrome rectangles would also be a smooth function of n. The truth is quite different.

success-graph.png

Finding a solution is easy for square grids of any size up to 15 by 15. The task suddenly becomes very hard at size 16 by 16. As for 17 by 17, it’s much harder still–and indeed is not yet known not to be impossible. (Details on the data behind this graph: For each size class from n=8 to n=17 I started with 1,000 randomly four-colored n-by-n grids. Then I applied a simple heuristic search (the first of the algorithms listed below) to each grid, running the program for 1,000 × n2 steps. The graph records the number of times this procedure succeeded–i.e., produced a grid with no monochrome rectangles–at each grid size. Up to n=14, the search never failed; at n=16 and beyond, it never succeeded.)

This kind of sudden transition from easy to hard is a familiar feature in the realm of constraint satisfaction. Well-known intractable problems such as graph coloring and boolean satisfiability have the same structure. That doesn’t bode well for any of the simple-minded computational methods I’ve tried. Here’s a brief catalog of my failures. These are algorithms that I’m pretty sure are never going to pay off.

  • Biased random walk. Start with a randomly colored grid. Repeatedly choose a site at random, then try changing its color; accept the move if it reduces the overall number of monochrome rectangles. This is the simplest of all the algorithms. None of the more-elaborate schemes is decisively better.
  • Whack-a-mole. Find all the monochrome rectangles in the grid; choose one of them and alter the color of one corner, thereby eliminating the rectangle. In the simplest version of this algorithm, you choose the rectangle and the corner and the new color at random; in more sophisticated versions, you might evaluate the alternatives and take the one that offers the greatest benefit.
  • Steepest descent. Examine all possible moves (for the 17-by-17 grid there are 867 of them) and choose one that minimizes the rectangle count.
  • Lookahead steepest descent. Examine all possible moves, and then all possible sequels to each such move (for the 17-by-17 grid there are 751,689 two-move sequences); choose a sequence that minimizes the rectangle count. In principle this method could be extended to chains of three or more moves, but the cost soon gets out of hand. (The lookahead technique is the mirror image of backtrack search; it explores the tree of possible moves breadth-first instead of depth-first.)
  • Color-balanced search. Allow only moves that maintain the overall balance of colors in the grid. For example, in a 16-by-16 four-colored grid, color balance implies 16 sites in each color. One way to maintain balance is to make moves that swap the colors of two sites. (There is no reason to think that a rectangle-free grid will have exact color balance; on the other hand, a solution for a large grid cannot depart too far from perfect balance. Thus a color-balanced search might be an effective trick for finding a neighborhood where solutions are more common.)
  • Row-and-column-balanced search. Allow only moves that maintain the balance of colors within each row and each column of the grid. In a 16-by-16 grid, each row and column should have four sites in each of four colors. A simple way to maintain this detailed color balance is to search for “harlequin rectangles” with the color pattern \(\begin{array}{cc}a & b\\b & a\end{array}\) and permute them to \(\begin{array}{cc}b & a\\a & b\end{array}\).

Most of these techniques are greedy: At each stage the algorithm chooses an action that maximizes some measure of progress. On hard instances, a pure greedy strategy almost always fails; the search gets stuck in some local optimum. Thus it’s usually best to temper the greediness to some extent, occasionally choosing a move other than the one that yields the best immediate return. (The family of methods known as simulated annealing are more elaborate variations on this idea, based on insights from thermal physics.)

greediness-traces.png

Here we see traces of three runs of the algorithm identified above as steepest descent, with differing values of a greediness parameter m. (The grid size is 15 &times 15.) At m=0 (no greediness at all), all moves are equally likely to be chosen, and the algorithm executes a random walk on the space of grid colorings. At m=1 (maximum greediness), the program always chooses the highest-ranked move, which works well until the system stumbles into a state where no move can reduce the rectangle count. A value of m=0.3 seems to be a good compromise. (I’ll say a little more below on the greediness parameter; indeed, I have a question about how best to define and implement it.)

After all this fussing with a dozen variations on local-search algorithms, I’m afraid the outlook for success is not promising. With a little patience and some tuning of parameters, any one of these algorithms can solve grids up to 15 &times 15. With a lot more patience and tuning, they’ll eventually yield answers for 16 &times 16. But none of the algorithms come even close to cracking the 17 &times 17 barrier. Solving that one is going to require a fundamentally new idea. Perhaps someone will find an analytic approach to constructing a solution, rather than blindly searching for one. Or perhaps someone will prove that no solution exists.

On the computational front, I suspect the best hope is a family of algorithms known in various contexts as belief propagation, survey propagation and the cavity method. I’ve been hoping that friends who are expert in these techniques might swoop in and solve the problem for me, but if not I may have to give it a try myself.

In the meantime, here’s the thing about greediness (an apt subject for this time of year?). We want to define a function greedy whose arguments are a vector of alternative moves ranked from best to worst and a number m such that 0 ≤ m ≤ 1. If the greediness parameter m is 0, the function returns a random element of the vector. If m = 1, the returned value is always the first (highest-ranked) move. Otherwise, we must somehow interpolate between these behaviors. One attractive notion is to return the first element of the vector with probability m, the second choice with probability m(1 − m), and so on. Thus for m = 1/2 the series of probabilities would begin 1/2, 1/4, 1/8…. For m = 1/3 the first few values are 1/3, 2/9, 4/27….

This scheme works just fine for a vector of infinite length, but there’s a problem with shorter vectors. Consider what happens with the procedure call greedy(v=[1, 2, 3, 4], m=0.5). We have the following table of probabilities:

     1 --> 1/2
     2 --> 1/4
     3 --> 1/8
     4 --> 1/16

But on adding up those values, we come up 1/16th short of 1. What happens to the missing probability? I took an easy way out, distributing the “extra” probability equally over all the elements of the vector. The code looks something like this:

function greedy(v, m)
  for i=0 to length(v)
     if (i==length(v))
        return v[random(length(v))]
     elseif (random(1.0) < m)
        return v[i]

This procedure seems to give sensible results, but I wonder if there might be a better or more natural definition of greedy probabilities. Also, the running time for my code is logarithmic in the length of the vector (assuming m < 1). Is there a constant-time algorithm that gives the same results? (We don’t know the length of the vector in advance, so merely precomputing the table of probabilities is not an option.)

The 17×17 challenge

Saturday, December 5th, 2009

William Gasarch is not the Clay Mathematics Institute. He isn’t paying a million bucks for proofs of famous conjectures. But Gasarch is putting up 172 of his own dollars for the solution to an intriguing little stumper. And the prize problem appears to be somewhat easier than the Riemann hypothesis or the P=NP question. (Unless it’s impossible!)

Gasarch sets forth his prize challenge in a blog post, with further background in a paper and in the slides from a talk. All of those works are well worth reading, but for those who don’t want to chase down the references, here’s the gist. Our mission, should we choose to accept it, is to color the nodes of an n-by-m grid, using only a specified number of colors, and observing a particular constraint: Nowhere in the grid may the four corners of a rectangle all have the same color. (Only rectangles with sides parallel to the x and y axes are considered.) For example, here is a four-colored 15-by-15 grid that satisfies the no-monochromatic-rectangles constraint:

grid15r0a.png

In this array of dots there are \( {15 \choose 2}^2=11025\) distinct rectangles. If you care to check through all of them one by one, you’ll find that in no case do all four corners have the same color. In contrast, here is a 16-by-16 grid that is almost but not quite rectangle-free:

grid16r2a.png

Gasarch offers his $289 bounty for any four-colored 17-by-17 grid with no monochromatic rectangles. Why is that grid of particular interest? It’s a border case. Among square grids, all those up through 16-by-16 have been shown to have rectangle-free four-colorings. For the 18-by-18 grid and all larger squares, rectangle-free four-coloring has been proved impossible. For squares larger than 18-by-18, four-coloring has been proved impossible. The status of the 17-by-17 and 18-by-18 grids remains unsettled, but Gasarch believes that both are four-colorable.

Gasarch has much more to say about the mathematics behind this problem. Here I would like to muse on some computational aspects of searching for a 17-by-17 four-coloring.

To state the obvious first, this is not a problem we can expect to solve by exhaustive enumeration. There are 4289 possible colorings of the grid. Casting out symmetries brings that down only to 2 × 4287. There’s not world enough or time for checking them all.

Testing grids at random is also hopeless in the 17-by-17 case. This nonstrategy actually works quite well for small grids. For example, you can readily find a four-coloring of an 8-by-8 grid just by generating a few hundred thousand random colorings. But the method fails for larger grids because the proportion of all colorings that are rectangle-free falls steeply with grid size. (Consider the 2-by-2 grid: There are 256 four-colorings, and all but four of them are rectangle-free.)

To make any progress toward the 17-by-17 case, we’ll have to do at least a little thinking, rather than expecting the computer to do all the work. Here’s one idea that’s very easy to implement: Find a monochrome rectangle somewhere within the grid, change the color of one of its corners, and repeat until you can’t find any more rectangles. This algorithm works reasonably well for grids up to about 12-by-12, but then it runs out of steam. On larger grids, changing the color of a node to eliminate one rectangle is likely to create another rectangle elsewhere (or several more of them). As a result, the system merely takes a random walk, with trendless fluctuations in the number of rectangles at any given moment. You discover a rectangle-free coloring only if the walk happens to stumble on the zero point.

I found the 15-by-15 four-coloring shown above with an algorithm that’s a little more effective even thought it’s no more sophisticated than the corner-twiddling method. The program repeatedly chooses a node at random and tries assigning it all four possible colors, tallying up the number of rectangles for each color choice. Some color or set of colors must minimize the rectangle count; from among these optimal colors the program chooses one at random and sets the node to that color before repeating the loop. This is a “greedy” method: At each step the number of rectangles can decrease or remain constant but can never increase. Greedy methods are notorious for getting stuck in local optima that are inferior to the global optimum. Maybe that’s what happens to my program when I try it on 16-by-16 and 17-by-17 grids. Or maybe the search space is just too large. In any case, when I woke up this morning and checked the results of an overnight run, I did not find a rectangle-free four-colored 17-by-17 grid awaiting me.

Of course I really didn’t have to do any algorithm analysis at all to know that I wasn’t going to win $289 and eternal fame with a day or two of idle hacking. If the problem were that easy, Gasarch and his students would have solved it for themselves long ago.

In spite of these various failures and frustrations, the grid-coloring problem still looks tantalizingly solvable. If a four-coloring of the Gasarch grid exists, it seems like we should be able to find it by some practical computation.

There are certainly lots of approaches more powerful than the blind dart game I’ve been playing. For example, if local optima are the major impediment, some variation on simulated annealing might help.

A more radical possibility is to try to construct an instance rather than merely search for it. If we assume that the four colors are represented as evenly as possible, then the 17-by-17 grid must have 72 nodes in each of three colors and 73 nodes in the fourth color. Starting from a blank grid, it’s easy to mark off 73 nodes in a single color without creating a forbidden rectangle. Adding 72 nodes of a second color is only a little harder. But then the job gets tricky. When you try to fill in a third color, you also by default choose nodes for the fourth color at the same time, and conflicts pile up in a hurry. Some kind of backtracking approach is probably needed here. Gasarch links to a paper by Elizabeth Kupin of Rutgers that explores these ideas in more detail. (If you want to prove the nonexistence of a four-coloring, this is presumably the way to go.)

Gasarch mentions two other promising avenues: integer programming (the discrete variant of linear programming) and SAT solvers–algorithms for the satisfiability problem. Having spent some time hanging out with a few master SAT solvers, I’m intrigued by the latter possibility. You can almost encode the grid-coloring problem as an instance of NAE-SAT, or not-all-equal SAT. Each node of the grid is represented by a variable that can take on any of four values. We group subsets of variables four at a time into clauses, where each clause includes the variables representing the four corners of a rectangle somewhere in the grid. For the 17-by-17 grid there are \({17 \choose 2}^2=18496\) clauses of this kind. The entire formula is satisfied if we can assign values to the 289 variables in such a way that none of the 18496 clauses has all four of its variables with the same value. After 40 years of work on SAT, there’s a highly developed technology for solving such problems. However, there’s a hitch. SAT problems are formulated in terms of boolean variables, with just two values each, but the grid-color variables have four values. Thus a further layer of encoding is likely to be needed, bringing a further explosion in the size of the problem instance.

One final hackerish note: What’s the best way to detect the presence of a monochromatic rectangle in a grid? My candidate goes like this. We encode the rows of the grid in a set of bit vectors–four vectors for each row, representing the four possible colors. For example, the red vector for a row has a 1 at each position where the corresponding node is red, and zeroes elsewhere. The blue vector has 1s for blue nodes, and so forth. Now we can detect a rectangle merely by taking the logical AND of two rows (an operation that could be a single machine instruction). A rectangle exists if and only if the result of the AND is nonzero. at least two bits are set in the resulting vector.

[Thanks to all the commenters for corrections and elaborations.]

The birth of the giant component

Friday, November 20th, 2009

The waste product of my document scanning project is a slag heap of extracted staples:

staples-in-bowl-2667.jpg

The other day I made a discovery: If you grab one of the discarded staples and lift it, the whole ball of tangled, mangled metal comes along, leaving behind only a few stragglers in the bottom of the bowl.

ball-of-old-staples-closer-2691.jpg

When I noticed this, my first thought was “Hmm, that’s funny.” My second thought was “Oh, of course: Erdős-Rényi.” And my third thought–well, I’m still working on my third thought, as well as thoughts four, five and six.

Erdős and Rényi are Paul Erdős and Alfréd Rényi, who wrote a big paper on “The Evolution of Random Graphs” 50 years ago (Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 1960, 5:17–61). Their paper wasn’t quite the debut appearance of random graphs in the mathematical literature, but it’s usually cited as the theory’s point of origin.

In one version of the Erdős-Rényi process, you start with a set of n isolated vertices and then add random edges one at a time; specifically, at each stage you choose two vertices at random from among all pairs that are not already connected, then you draw an edge between them. It turns out there’s a dramatic change in the nature of the graph when the number of edges reaches n/2. Below this threshold, the graph consists of many small, isolated components; above n/2, the fragments coalesce into one giant component that includes almost all the vertices. “The Birth of the Giant Component” was later described in greater detail in an even bigger paper–it filled an entire issue of Random Structures and Algorithms (1993, 4:233–358)–by Svante Janson, Donald E. Knuth, Tomasz Luczak and Boris Pittel.

What made me think of a connection between Erdős-Rényi graphs and my hairball of staples? What I had in mind was something like this: The long, straight, middle part of a staple corresponds to an edge of a graph, and the bent end pieces, which can grab on to each other, are the vertices. Thus a single staple is a graph consisting of two vertices connected by one edge. When two staples hook up, two of their vertices merge and we’re left with a connected graph of three vertices and two edges. Since each staple contributes one edge and at most two vertices to the graph, the number of edges must be at least half the number of vertices. Thus the graph is always over the threshold for forming a giant component, according to Erdős and Rényi.

The counting part of this analysis seems okay, but I’m afraid the rest of it doesn’t hold up very well. Whatever is going on in the staple ball, the evolution of the system is not well modeled by the Erdős-Rényi process of adding edges to a fixed set of vertices. Instead, each staple brings both an edge and two vertices. The crucial event that makes the cluster hang together is the merging of vertices when staples link hands; this merging has no counterpart in the Erdős-Rényi process.

The underlying problem here is that Erdős-Rényi graphs are purely topological–there’s no concept of distance, and any two vertices are equally likely to have an edge joining them. But the staple graph has important geometric constraints. Two vertices can be joined by an edge only if the distance between them is approximately equal to the length of a staple.

The geometric structure suggests trying a different kind of model–perhaps the kind that describes the molecular structure of liquids and solids. Water molecules, for example, are linked together by a network of hydrogen bonds; each hydrogen atom in one molecule can bond to the oxygen atom in another molecule. But the bonds cannot extend over arbitrary distances; they reach only between neighboring molecules. In the resulting three-dimensional structure the basic motif is a tetrahedron with an oxygen atom at its center and hydrogen atoms at the four corners. (There’s also a schematic two-dimensional model known as square ice.) We might imagine something similar going on with the staples, where the two bent ends can form hydrogen-bond-like links to other nearby staples.

But there’s a problem with such chemical models as well. Atoms have a fixed valence (more or less–let’s not quibble); old-staples-detail-2697.jpgin water, for example, each hydrogen atom can form a hydrogen bond with only one oxygen atom. But we have no reason to suppose that the hooked ends of a staple can attach to only one other staple. As a matter of fact, if such a restriction were enforced, then the staples could form only chains and rings, not dense clusters. In a close look at the actual clusters, it’s easy to find places where three or more staples are all hooked together at the same point. By carefully teasing apart the cluster, I have spotted vertices that appear to have a degree of at least six.

Yet another physical process that might provide a model for the staple graph is diffusion-limited aggregation. This is the mechanism responsible for the filigree pattern in the banner at the top of the bit-player web page. It is generated by sticky particles that drift at random until they wander onto the substrate or touch another particle that is already in contact (directly or indirectly) with the substrate. For staples, I suppose the drifters would be tumbling dumbbells with sticky ends–somewhat harder to simulate.

Another factor to keep in mind is that spatial dimensions are surely important here. For one thing, there’s just more room to maneuver in three dimensions, with more opportunities to glom onto a neighbor. But in the specific case of staples, there’s another reason: Confined to a plane, they have a hard time linking up:

staples-on-plate-detail-2681.jpg

Dispersed on a flat plate, they refuse to coagulate even when swirled vigorously. The reason, presumably, is that secure links form only when the staples can turn 90 degrees and interlock.

In this connection it would seem significant that these are used staples, somewhat varied in shape, with hooked ends that had been bent approximately 180 degrees in the process of stapling and that mostly retained an angle greater than 90 degrees after being pried out the papers. I wondered how shiny new staples would behave, and so I tried the experiment. (Materials and methods: 630 Stanley Bostitch chisel-point staples, model SBS191/4CP, freshly dispensed from an open-jaw Swingline stapler.)

ball-of-new-staples-crop-2700.jpg

I was mildly surprised at the result. Although the aggregation was somewhat looser and more delicate, it really wasn’t that much different. Again we witness the birth of a giant component.

Flights of fancy

Tuesday, October 27th, 2009

starlings-closeup-2058.JPG

As I have mentioned in the past, I’m fascinated by the acrobatics of bird flocks, especially the big congregations of European starlings that gather in the evening at this time of year. Evidently I’m not the only one with such an interest. In the past few years the subject has attracted the attention of quite a large flock of scientists, including not only biologists but also various luminaries in physics, mathematics and computer science.

Below are some notes on a few of the recent papers, but first I have to mention a classic from 20 years ago:

Reynolds, Craig W. 1987. Flocks, herds, and schools: a distributed behavioral model. Computer Graphics 21(4):25–33. Author archive.

This is the paper that began the modern era of flocking studies by proposing that animals could coordinate and synchronize their movements without any need for a leader or external cues. Others were thinking along the same lines at about the same time, but it was Reynolds who attracted wide notice with his enchanting computer animations of “boids” soaring through an imaginary three-dimensional space. Each individual in the flock acts according to simple, local, fixed rules, and the synchronized maneuvers emerge spontaneously.

Reynolds suggested three particular rules that might guide the behavior of each bird:

  • Avoid collisions.
  • Try to match the speed and heading of nearby birds.
  • Move toward the center of the group in which you are flying.

Reynolds was working in computer graphics, and his ideas were soon taken up by movie studios and by the makers of video games. In a sense, his simulations only had to look right; they didn’t have to reflect what actually goes on in a starling’s head. But whether or not the birds were paying attention, students of animal behavior certainly were.

starlings-wide-2064.jpg

Much of the recent activity arises out of new field studies, conducted mainly by physicists.

Cavagna, Andrea, Irene Giardina, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale and Vladimir Zdravkovic. 2008. The STARFLAG handbook on collective animal behaviour. 1: Empirical methods. Animal
Behaviour
76:217–236. Preprint.

Cavagna, Andrea, Irene Giardina, Alberto Orlandi, Giorgio Parisi and Andrea Procaccini. 2008. The STARFLAG handbook on collective animal behaviour. 2: Three-dimensional analysis. Animal Behaviour 76:237–248. Preprint.

This group, coordinated by Andrea Cavagna and Irene Giardina of the University of Rome La Sapienza, has been photographing starling flocks near the city’s main railroad station (the Termini), which is just a few blocks from the university. Using pairs of synchronized cameras, the observers have captured stereoscopic images and then applied special image-analysis software to reconstruct the three-dimensional trajectory of each bird. Similar techniques have been tried in the past, but only with small flocks (a few dozen birds). The Italian group has traced the motions of individual birds in groups of up to 2,600. The two papers cited above give technical details on how the data were gathered and analyzed.

Ballerini, Michele, Nicola Cabibbo, Raphael Candelier, Andrea Cavagna, Evaristo Cisbani, Irene Giardina, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale and Vladimir Zdravkovic. 2008. Empirical investigation of starling flocks: a benchmark study in collective animal behaviour. Animal Behaviour 76:201–215. Preprint.

Ballerini, Michele, Nicola Cabibbo, Raphael Candelier, Andrea Cavagna, Evaristo Cisbani, Irene Giardina, Vivien Lecomte, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale and Vladimir Zdravkovic. 2008. Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proceedings of the National Academy of Science of the USA 105:1232–1237. Open access.

And here the same authors (with a few additions) report their results and conclusions. They base their interpretation on a computational model that is recognizably a descendant of the Reynolds scheme, but with one crucial modification. Reynolds and others assumed that each bird is influenced by all other birds within some fixed distance (a “metric neighborhood”); Ballerini et al. get a closer match to the data by assuming that a bird attends to the motions of a fixed number of near neighbors, regardless of distance (a “topological neighborhood”). In other words, the graph of interacting birds has nearly constant vertex degree; the typical degree is probably six or seven. The main significance of this algorithmic change is that it helps maintain the cohesion of the flock in spite of large variations in density.

Hildenbrandt, Hanno, Claudio Carere and Charlotte K. Hemelrijk. 2009. Self-organised complex aerial displays of thousands of starlings: a model. arXiv:0908.2677v1

Those same flocks at Termini have a role in this study as well; the model presented here draws on data from Ballerini et al. as well as videotapes made at Termini by Carere. (Carere is another physicist at Sapienza; Hildenbrandt and Hemelrijk are biologists at the University of Groningen.)

The model works on the same essential principles, but it differs in intellectual style and emphasis. Hildenbrandt et al. want to account for specific details of a flock’s behavior—not just the general tendency to fly in close formation but also the particular shapes of starling flocks, the maneuvers they perform, the altitudes they prefer, and so on. Reaching for this verisimilitude leads to a rather complicated model with many parameters in need of fine tuning, such as aerodynamic properties of the bird’s wing and body and banking angles in turns. Hildenbrandt et al. report some success in explaining the geometry of flocks (they tend to be horizontally flattened rather than spherical). They do less well in an attempt to account for an extra-dense layer of birds observed at the periphery of a flock.

starlings-landing-2072.jpg

Cucker, Felipe, and Steve Smale. 2007. Emergent behavior in flocks. IEEE Transactions on Automatic Control 52:852–862.

Chazelle, Bernard. 2009. Natural algorithms. Proceedings of the 20th Symposium on Discrete Algorithms, pp. 422-431. Preprint.

Chazelle, Bernard. 2009. The convergence of bird flocking. arXiv:0905.4241v1

Leaving behind the breathy wing-beats of living starlings, we enter a world of mathematical abstractions.

Cucker and Smale, peripatetic mathematicians currently at the City University of Hong Kong, take a stripped-down model of flocking and ask this question: Is it guaranteed that all the birds in the flock will eventually settle on the same velocity, and thus fly together forever? Chazelle, a theoretical computer scientist at Princeton, asks a follow-on question: If the birds do converge on the same speed and heading, how long might it take for them to do so, in the worst case?

The answer to the Cucker-Smale question turns out the be yes: Given certain preconditions and parameter values, convergence is certain. But Chazelle shows that it can take quite a while for the flock to reach consensus. For n birds adjusting their velocities in discrete steps, the upper bound is 2 ↑↑ (4 log n) steps. As I was saying just the other day, this up-arrow notation denotes an exponential tower of 2s with, in this case, 4 log2 n levels. In other words, in a flock of a thousand birds, the convergence time is roughly

\[2^{2^{2^{\cdot^{\cdot^{\cdot^2}}}}}\]

with 40 levels of exponentiation. This is a ridiculous number, far exceeding the lifetime of a starling (or of a universe, for that matter). As Chazelle notes: “Our bounds obviously say nothing about physical birds in the real world. They merely highlight the exotic behavior of the mathematical models.”

It is rather wonderful to reflect—as you stand in a field of corn stubble admiring the flocks of birds wheeling overhead in the evening sky—that these avian entertainments should be the starting point for a line of reasoning that ventures so far into the wild blue yonder of inexpressible numbers.

Lebar Bajec, Iztok, and Frank H. Heppner. 2009. Organized flight in birds. Animal Behaviour 78:777–789. Preprint.

I mention this piece last, but it would actually be a good place to start if you want a primer on flocking. Frank Heppner, a biologist at the University of Rhode Island, is one of the pioneers of flocking-and-swarming studies; here, with a mathematical colleague from the University of Ljubljana, he reviews many of the recent contributions and puts them in historical context. The review includes a discussion of the more crystalline flying formations of large birds such as geese as well as the amorphous flocks of starlings.

A Wiki proof

Wednesday, October 21st, 2009

This morning’s list of new submissions to the mathematics section of the arXiv brings a paper signed by “D. H. J. Polymath.” The name is too good to be true, of course. The paper is the first fruit of a project instigated by Timothy Gowers of Cambridge. In a blog post last January, Gowers asked “Is massively collaborative mathematics possible?” and proposed a problem that might serve as a test case. There were more than 100 responses, and soon the game was on. Discussion began in the comments section of Gowers’s blog and was later supplemented and summarized in a Wiki maintained by Michael Nielsen.

The problem under attack is known as the density Hales-Jewett theorem (hence Dr. Polymath’s initials). The ordinary version of the Hales-Jewett theorem states that if you play tic-tac-toe on a board of high enough dimension, the game can never end in a draw. When you have filled in all the boxes, some row or column or diagonal in the multidimensional grid must consist entirely of x’s or o’s—even if you’ve been trying to reach a stalemate position. The theorem also holds with more than two players and more than two symbols, provided the dimension of space is high enough. The “density” version of the theorem says that sometimes you can’t avoid a winning play even if you stop before filling in all the boxes: A grid-spanning line of solid x’s or o’s is certain to appear as soon as the density, or fraction of filled boxes, reaches a threshold level.

The density version of the Hales-Jewett theorem was proved almost 20 years ago by Hillel Furstenberg and Yitzhak Katznelson, so this is not an open problem. But the Furstenberg-Katznelson proof drew on some results from ergodic theory, a branch of mathematics that seems slightly exotic for a problem stated in such simple, finite terms. Gowers asked if there might be a purely combinatorial proof. And he focused attention specifically on the case where each row and column of the tic-tac-toe board has three positions and there are three players. In other words, the grid is a 3 × 3 × 3 ×  . . .  × 3 hypercube where each vertex is to be marked with one of three symbols.

The combinatorial proof is given in the Polymath paper, which also puts a bound on how high a dimension is needed. The bound is 2 ↑↑ O(1/δ3), where δ is the density and the double-arrow notation indicates an “exponential tower” of 2s — namely O(1/δ3) of them.

A collective pseudonym such as Polymath immediately brings to mind the famous Nicolas Bourbaki. As in the writings of that French group, the Polymath paper includes no listing of contributors. But there’s a difference: The Bourbakists were a secretive bunch, a sort of sleeper cell within mathematics, and historians have pieced together who did what only in retrospect. The Polymath group describes their style of work as “open source” mathematics. It’s all there on the web.

Congruent numbers

Tuesday, October 6th, 2009

A press release from the American Institute of Mathematics two weeks ago announced that all the congruent numbers up to 1 trillion have been enumerated. Two questions leap to mind. What the heck is a congruent number? And who cares?

I’ll return to those questions. But first I’d like to pause just a moment to marvel at the idea of calculating anything up to 1012. A few decades ago, such a project would have been unthinkable. Today, counting to a trillion takes only an hour or so, even on plain vanilla hardware. This is truly one of the wonders of the age, and we shouldn’t grow too blasé about it. But the computation of all those congruent numbers involved a lot more than looping through “+1″ a trillion times, and it took considerably longer than an hour.

Okay, so what’s a congruent number? I would like to sidle up to that question rather than face it straightaway. The definition is not hard to understand—it’s all about right triangles with rational side lengths and integer areas—but when I started looking into this story, it took me a while to appreciate why those particular triangles might be worth thinking about.

Let’s begin with Pythagorean triples—sets of positive integers (a, b, c) that define a right triangle, with a and b giving the lengths of the legs and c the hypotenuse. The familiar (3, 4, 5) triangle is everybody’s favorite example. For which right triangles with integer side lengths is the area also an integer? That’s easy: All of them. The area of a right triangle is ab/2; in any Pythagorean triple either a or b (or both) must be an even number, which implies that ab/2 is a whole number. For the (3, 4, 5) triangle the area is (3 × 4)/2 = 6.

Knowing one such triangle, we can make more. Lots more. Just multiply a, b and c by any integer k, which has the effect of multiplying the area by k2. If k = 2, we get the (6, 8, 10) triangle with area 24; k = 3 yields the (9, 12, 15) triangle with area 54, and so on. The resulting sequence of triangles is infinite but not very interesting; all the larger triangles are just scaled-up versions of the original, like photographic enlargements. We can view the entire infinite series as a single class of triangles, and take the smallest member of the series as the prototype. That smallest triangle is given by a primitive Pythagorean triple—one where a, b and c have no factors in common. It turns out there are also infinitely many primitive triples. Euclid bequeathed us an algorithm that will generate all of them, if you let it run forever.

Given this infinite set of infinite series of triangles, it’s clear that infinitely many integers can be the area of a Pythagorean triangle. But that certainly doesn’t mean that every positive integer has this property. For example, there’s no Pythagorean triangle with an area of 5. To convince yourself of this fact, just measure the area of every integer-sided right triangle with legs no longer than 10. None of those triangles has an area of 5, and no triangle with longer integer legs can have an area that small. In principle, the same laborious but reliable procedure could be applied to any integer N to determine whether or not it is the area of some Pythagorean triangle. Here are the first few integers that appear in such an enumeration ( sequence A009112.):

6, 24, 30, 54, 60, 84, 96, 120, 150, 180, 210, 216, 240, 270, 294, 330

Rational triangles. Suppose we relax the constraints a little, allowing the sides of a right triangle to be rational numbers—either fractions or integers but not irrational values such as the square root of 2. The area is still required to be an integer.

A first question is whether right triangles with fractional sides and integer areas even exist. Maybe the cupboard is bare; maybe there are no such triangles. But no, they do exist. Here’s a proof by example:

35-12triangle.png

There’s no trickery here. If you’re in any doubt, do the arithmetic; you’ll find that the numbers define a genuine right triangle (the Pythagorean theorem holds) and the area really is exactly 7.

At last we have arrived in the realm of congruent numbers. A congruent number is an integer that’s the area of a right triangle with rational sides. All integer-sided right triangles are included in the category, along with those whose sides are rational fractions. Here’s a more formal definition:

A positive integer N is congruent if there exist rational numbers a, b and c such that a2 + b2 = c2 and ab/2 = N.

All the same questions we were asking about Pythagorean triples also come up in this new context. Where do we find rational triangles with integer area, or how can we manufacture them? Which integers N can be the area of such a triangle? Answers aren’t quite as easy to come by in this case. In particular, the strategy of enumerating triangles from the smallest up won’t work, because there is no smallest rational triangle. There are other ways of imposing an ordering on the rationals, but none of them lead to a good algorithm for enumerating congruent numbers.

So where did I get the example illustrated above? I didn’t just stumble upon it by trying random rational triangles. Note that multiplying each rational side length by the common denominator of the fractions will return us to the land of integer-sided triangles. For the example shown, multiplying through by 60 yields a (175, 288, 337) triangle; these numbers have no factor in common, so they form a primitive Pythagorean triple. The area of the new triangle is 7 × 60 × 60, or 25,200. Thus we’ve recovered an integer triangle from a rational one; what’s more important, the process can be reversed to derive rational triangles from integer-sided ones.

I mentioned a Euclidean method for generating primitive Pythagorean triples. It goes like this: Take any two positive integers m and n that satisfy the following conditions:

  • m > n;
  • one of the numbers is odd and the other even;
  • the numbers are coprime (no common factors other than 1).

Then (2mn, m2n2, m2 + n2) is a primitive Pythagorean triple. By counting through all suitable values of (m, n) starting with (2, 1), all primitive triples are generated.

A postprocessing step built atop Euclid’s procedure will generate congruent numbers. The plan is to produce a primitive Pythagorean triple and calculate the area N = ab/2 of the corresponding triangle. If the area is “square-free”—that is, it has no pairs of repeated prime factors and thus cannot be divided evenly by a perfect square—then we’re done; that value of N is a congruent number and cannot be reduced to a smaller congruent number. But if N is not square-free, we can divide out each square factor, leaving a smaller triangle with rational sides and integer area, and thereby generating another congruent number.

A worked example: The (m, n) pair (5, 4) produce the primitive triple (9, 40, 41), with area N = (9 × 40)/2 = 180. Thus 180 is identified as a congruent number, but it is not square-free; it factors as 22 × 32 × 5. We can therefore divide the area by 4 and the sides by 2, producing a shrunken (9/2, 20, 41/2) triangle of area N = 45. In this way we learn that 45 is also a congruent number, but again it is not square-free. Dividing the area by 9 and the sides by 3 yields the (3/2, 20/3 and 41/6) triangle with area N = 5. And so 5, too, is congruent; it’s also square-free and therefore cannot be reduced further. There is no smaller triangle similar to the (9, 40, 41) triangle that has integer area.

This scheme can produce an unlimited supply of congruent numbers. Unfortunately, it’s not so well suited to answering the question of whether a particular integer is congruent. The problem is that the congruent values are not generated in order from smallest to largest. If we turn the crank for a while and discover that a certain integer N appears in the algorithm’s output, then we know for sure that N is congruent. But if N has not shown up, we can’t conclude that it is not among the congruent numbers; we might simply have to wait longer for N’s turn to come.

When I turn the crank on my own little program for generating congruent numbers, these are the first 101 values to pop out, in order of appearance:

6, 30, 60, 15, 84, 21, 210, 180, 45, 5, 330, 630, 70, 924, 231, 546, 504, 126, 14, 1320, 1560, 390, 840, 1386, 154, 2340, 585, 65, 1224, 306, 34, 990, 110, 2730, 3570, 1710, 190, 2574, 286, 4620, 1155, 5610, 5016, 1254, 2310, 1716, 429, 7140, 1785, 7980, 1995, 3036, 759, 4290, 7956, 1989, 221, 10374, 10920, 8970, 3900, 975, 39, 7854, 11970, 1330, 14490, 1610, 11550, 462, 4914, 6630, 12540, 3135, 19320, 4830, 6090, 4080, 1020, 255, 11856, 2964, 741, 18480, 23184, 5796, 1449, 161, 25200, 6300, 1575, 175, 7

Here’s the same list sorted into ascending order of magnitude:

5, 6, 7, 14, 15, 21, 30, 34, 39, 45, 60, 65, 70, 84, 110, 126, 154, 161, 175, 180, 190, 210, 221, 231, 255, 286, 306, 330, 390, 429, 462, 504, 546, 585, 630, 741, 759, 840, 924, 975, 990, 1020, 1155, 1224, 1254, 1320, 1330, 1386, 1449, 1560, 1575, 1610, 1710, 1716, 1785, 1989, 1995, 2310, 2340, 2574, 2730, 2964, 3036, 3135, 3570, 3900, 4080, 4290, 4620, 4830, 4914, 5016, 5610, 5796, 6090, 6300, 6630, 7140, 7854, 7956, 7980, 8970, 10374, 10920, 11550, 11856, 11970, 12540, 14490, 18480, 19320, 23184, 25200

The problem, again, is that we can’t conclude anything about the numbers that don’t appear in this collection. For example, 13 is absent, even though it is in fact a congruent number; it just doesn’t turn up until we get well down the list—it comes from the 49,485th triangle examined, which has sides (780/323, 323/30, 106921/9690). On the other hand, 8 is unlisted because it is not congruent; it will never show up in the output hopper no matter how long we keep cranking away.

Here are the first 101 true congruent numbers (sequence A003273):

5, 6, 7, 13, 14, 15, 20, 21, 22, 23, 24, 28, 29, 30, 31, 34, 37, 38, 39, 41, 45, 46, 47, 52, 53, 54, 55, 56, 60, 61, 62, 63, 65, 69, 70, 71, 77, 78, 79, 80, 84, 85, 86, 87, 88, 92, 93, 94, 95, 96, 101, 102, 103, 109, 110, 111, 112, 116, 117, 118, 119, 120, 124, 125, 126, 127, 133, 134, 135, 136, 137, 138, 141, 142, 143, 145, 148, 149, 150, 151, 152, 154, 156, 157, 158, 159, 161, 164, 165, 166, 167, 173, 174, 175, 180, 181, 182, 183, 184, 188, 189

Ancient history. The search for congruent numbers does not stretch back all the way to Pythagorus or Euclid, although Diophantus apparently considered a couple of special cases. L. E. Dickson’s big history of number theory attributes the first full statement of the problem to “an anonymous Arab manuscript, written before 972.” Other sources cite the works of Abu Bakr al-Karaji, a mathematician who worked in Baghdad at the end of the 10th century. A couple of centuries later Fibonacci, who straddled the Arabic and European worlds, had more to say about the problem in his Liber Quadratorum. Later still, Pierre de Fermat proved (in a marginal note, for which he had sufficient room!) that 1 is not a congruent number. (The proof extends to 4 and 9 and 16 and all other square numbers, none of which are congruent.)

These early writers formulated the problem in a somewhat different way than the rational-triangle scheme explained above. Given an integer N, they asked whether it’s possible to find three perfect squares in arithmetic progression with the interval between the squares equal to N. In other words, they sought a number s2 such that s2 – N, s2, and s2 + N are all perfect squares. For example, if N = 24, then the solution is s = 5; the three perfect squares are 25 – 24 = 1, 25, and 25 + 24 = 49.

The two versions of the problem—the integer-area right triangles and the squares in arithmetic progression—are equivalent, although that’s not exactly obvious. Here’s the connection: If a2 + b2 = c2 and ab/2 = N, then setting s = c/2 guarantees that s2N, s2, and s2 + N are all perfect squares. (For the algebraic exercise proving this, see the book by Neal Koblitz cited below.)

By the way, the squares-in-arithmetic-progression version is where we get the term “congruent numbers.” The three numbers s2N, s2, and s2 + N are all congruent modulo N. For example, 1, 25 and 49 are all congruent to 1 modulo 24. If you ask me, “congruent numbers” is a poor excuse for a name, and is particularly confusing in the context of triangles, where “congruent” has another meaning altogether. But after a thousand years I suppose it’s too late to fix it. “Karaji numbers,” anyone?

By 1915, all the integers up to 100 had been classified as either congruent or noncongruent. But as recently as 1980, when Ronald Alter wrote a brief review of the status of the problem, there were still 189 square-free numbers less than 1,000 for which the question of congruence remained unsettled. Then everything changed in 1983. In that year the nature of the congruent-number problem was transformed by Jerrold B. Tunnell of Rutgers, who not only found a better way to calculate congruent numbers but also showed why it’s worth the effort to do so.

From right triangles to elliptic curves. Before I go on, a warning: I am about to walk up to the blackboard with a piece of chalk in my hand and impersonate one of those masterful, self-confident lecturers who rattle off long trains of equations, invent lemmas on the fly, and always come to QED just as the blackboard fills up and the class ends. In my case any such air of mastery is a complete illusion; I’m just learning this stuff as I go along. But I’ll do my best to make it an entertaining illusion.

So here goes. It’s not hard to see that any product of perfect squares is also a perfect square: a2b2c2 = (abc)2. Therefore, if N is a congruent number, and s2 – N, s2, and s2 + N are all squares, we can set the product of these three factors equal to another square; call it y2. Thus we get the equation:

y2  =  (s2N) s2(s2 + N).

Multiplying the three terms on the right, this becomes:

y2  =  (s2)3N2s2.

Now perform a simple substitution of variables, setting s2 = x:

y2 = x3N2x.

This is the equation of an elliptic curve—another mathematical object with a really unfortunate name. An elliptic curve looks nothing like an ellipse. Here’s the graph of a particular elliptic curve, the one for N = 6:

elliptic-curve-6.jpg

The two pieces of the blue line mark the locus of all points (x, y) that satisfy the equation

y2 = x3 – 36x.

And the hot pink dot? That marks a rational point on the curve—a point where x and y both take on rational values. (Insiders, I’m told, call it a ratpoint.) Specifically, the dot identifies the point x = 25/4, y = 35/8. If you care to plug those numbers into the equation, you’ll find that indeed 1225/64 = 15625/64 – 225, and so the point does lie on the curve.

Tunnell, elaborating on earlier work of Kurt Heegner, showed that N is congruent if the elliptic curve y2 = x3N2x passes through a certain kind of point on the (x, y) plane. Specifically, we have to search for points whose x and y coordinates are both rational numbers, but we ignore the three points with y = 0. Then the x coordinate of the point has to satisfy three more conditions. Letting x = u/v, we require that:

  • u and v are perfect squares,
  • v is even,
  • u has no factors in common with N.

If we can find just one point on the curve that matches all these properties, then we can generate an infinity of other rational points. Moreover, the existence of such a point implies, according to Tunnell’s theorem, that N is congruent. (The hot pink point above clearly qualifies.) Thus the congruent-number problem appears to be solved: We have an unambiguous criterion for deciding if a given N is congruent or not. Just construct the corresponding elliptic curve and check the ratpoints.

Regrettably, we’ve been set up for yet another disappointment. Searching for rational points on elliptic curves is a task for which we have no efficient general method. We’re really no better off than trying to generate all Pythagorean triples.

But wait! We’re not done yet. From the properties of elliptic curves, Tunnell derived yet another criterion—and this one turns out to be the key to a simple and practical test. It all hinges on the number of integer solutions to some quadratic equations that on first glance appear to be arbitrary strings of symbols plucked out of thin air. The criterion is this: If N is a square-free congruent number, and if N is odd, then the number of integer solutions to the equation N = 2x2 + y2 + 8z2 must be exactly double the number of integer solutions to N = 2x2 + y2 + 32z2. (If both equations have no integer solutions, the condition is satisfied, since 2 × 0 = 0.) For even N, the two equations are slightly different, but the test works in exactly the same way.

tunnell-criterion-1000.png

The graphic above shows the 361 square-free numbers up to 1,000 that pass the Tunnell test. The height of each dot indicates the number of integer solutions to the equation N = 2x2 + y2 + 8z2 (or the corresponding equation for even N). The two values of N with 160 solutions are 689 and 761. (It’s curious that in most cases—308 out of 361—the number of solutions is actually zero. I don’t know what this means or whether that pattern continues with larger N.) [See Update 2009-10-10, below.]

What sets the Tunnell criterion apart from all the others is that we can actually carry out the test in a reasonable and predictable amount of time. For any given N, counting the solutions should be doable in time proportional to N3/2, simply by trying all integer values of x, y and z less than the square root of N.

So that’s it, right? Problem solved at last? Well, no, there’s still a small hitch—a bit of awkward fine print. Tunnell proved that if N is square-free and congruent, then the criterion on the number of integer solutions must be satisfied. This much is unconditionally true. But what about the converse? If the criterion is satisfied, can we be certain that N is a congruent number? In this direction, the proof is not unconditional. It’s contingent on a proposition known as the Birch–Swinnerton-Dyer conjecture, which is widely believed, and supported by an abundance of empirical evidence, but still unproved. Which leaves just enough doubt to make the game still interesting.

Why should anyone care about this stuff? On first acquaintance, the search for congruent numbers sounds like one of those mathematical pastimes that appeal to amateurs (like me!) but don’t really command the attention of the research community. There are so many kinds of cutely named numbers out there—perfect, amicable, lucky, happy—and not all of them carry deep significance. The congruent numbers might well be just another amusing sideshow. But it turns out they’re not. There’s serious mathematics here, enough to engage the interest of serious mathematicians.

Elliptic curves have been a focus of intense scrutiny for decades. Henri Poincaré studied them in the early years of the 20th century. In the 1920s Louis Mordell proved a theorem about the rational points on elliptic curves: Even when there are infinitely many points, they all come from a finite set of “generators”; the number of generators and hence the number of infinite families is called the rank of the curve. In the 1950s Yutaka Taniyama and Goro Shimura, with later refinements by André Weil, worked out a conjecture about elliptic curves and another class of mathematical objects, called modular forms. Andrew Wiles and Richard Taylor proved part of that conjecture in the course of settling Fermat’s Last Theorem in the 1990s; the rest of the Taniyama-Shimura conjecture has since been proved as well.

And now we have the Birch–Swinnerton-Dyer conjecture, one of the famous million-dollar math problems. I’m not going to try to explain the conjecture in any detail—I’ve used up all my chalk, and probably my readers’ patience, too—but I think the basic idea goes something like this. On the one hand we have an elliptic curve, drawn in the (x, y) plane. On the other hand we have something called an L-function, which is defined on the plane of complex numbers. Think of the L-function as an undulating landscape with mountains rising above the plane and submarine canyons deep under it. The topography of this surface is determined in part by points selected from the elliptic curve, so there’s a connection between the two objects. The conjecture formulated in the 1960s by Bryan Birch and Peter Swinnerton-Dyer says that the shape of the L-function in the neighborhood where it passes through zero gives us information about the rank of the elliptic curve and thus about the number of rational points.

There’s an analogy between the BSD conjecture and an even more celebrated problem on the million-dollar list, the Riemann hypothesis. The zeros of the Riemann zeta function (which is much like an L-function), carry information about the distribution of prime numbers. The Birch–Swinnerton-Dyer conjecture invites us to suppose that the zeros of L-functions of certain elliptic curves tell us something about the distribution of congruent numbers.

Incidentally, the BSD conjecture must be one of the earliest products of computer-driven experimental mathematics. The numerical explorations that led to the conjecture were done with the EDSAC, the pioneering electronic computer built at the University of Cambridge.

The trillion triangles. Hand-waving about elliptic curves and L-functions might provide vague a rationale for interest in congruent numbers, but one part of the story I still didn’t get was why anyone would bother to compute mass quantities of congruent numbers. So I asked Michael Rubinstein of the University of Waterloo. Rubinstein had earlier computed the congruent numbers up to 109, and it was his challenge that provoked the recent thousandfold expansion of the inventory. Rubinstein explained:

I’m interested in the statistical distribution of congruent numbers. A few years ago, Brian Conrey, Jon Keating, myself, and Nina Snaith came up with a prediction for the asymptotic number of congruent numbers up to x, akin to the prime number theorem. This prediction grew out of remarkable models created by the same researchers and David Farmer for related elliptic curve L-functions that were inspired by random matrix theory and based on a number of unproved assumptions.

The prediction is that the number of congruent numbers less than x, arising from even-rank elliptic curves, is asymptotically

\[c x^{3/4} \log(x)^{11/8}\]

where c is a constant. Our model doesn’t let us completely nail down the constant c, and the data will help us understand what the correct constant is. The asymptotic behaviour stabilizes at a logarithmic rate, so going to 1012 is only 50 percent better than going to 108.

It will also provide a good amount of data for studying the statistics of ‘higher rank elliptic curves’ in the family of elliptic curves related to congruent numbers. The correct model to use for higher rank elliptic curves is not really understood, and the billions of congruent numbers found will end up yielding several million higher rank curves with which we hope to gain insight into the statistics of higher rank curves (compared to just thousands of higher rank curves from earlier computations).

The enumeration of the trillion triangles was done by two teams. William Hart of the University of Warwick and Gonzalo Tornaria of the Universidad de la Republica in Uruguay ran their program on a computer called Selmer at Warwick. Mark Watkins of the University of Sydney, David Harvey of the Courant Institute and Robert Bradshaw of the University of Washington used the SAGE computer at Washington.

The strategy of the computation is based on the Tunnell criterion, but it turns out there’s a better way to go about it than explicitly counting the number of integer solutions to those various quadratic equations in (x, y, z). Tunnell showed that all the information about the number of solutions could be encoded in a formal power series, which looks like this:

\[A(q) = a_{1}q^{1} + a_{2}q^{2} + a_{3}q^{3} + a_{4}q^{4} + ... \]

It’s a “formal” series in the sense that we don’t actually care about evaluating the sum for any particular value of q; instead, we just want to know the various coefficients ai. This is how the Tunnell criterion gets encoded in the series: If ai is zero in this series, and if i is a square-free number, then i is congruent. (Actually, there are two separate series, one for odd i and one for even i.)

The odd and even power series are generated by a couple of formidable-looking products. Here’s the one for the odd series:

\[A(q)=q\prod_{n=1}^\infty(1-q^{8n})(1-q^{16n})\left(1+\sum_{n=1}^\infty 2q^{2n^2}\right)\]

Since both the overall product and the sum in the third factor call for infinitely many values of n, we can’t multiply out this whole expression. Fortunately, though, it turns out that larger n contribute only to higher coefficients, and so we can compute early terms in the series without worrying about what might happen later. Kent Morrison notes that taking just the factors

\[q(1-q^8)(1-q^{16})(1-q^{16})(1+2q^2+2q^8+2q^{18})\]

is enough to generate all the odd congruent numbers up to 25. Those five factors expand to:

\[A(q)=q^1+2q^3+q^9-2q^{11}-4q^{17}-2q^{19}-3q^{25}+...\text{ higher terms}\]

The terms missing from this series—those with a zero coefficient—are just the odd square-free congruent numbers in this range: 5, 7, 13, 15, 21, 23.

The big computation by Bradshaw et al. used essentially this same scheme, with a few refinements and optimizations. The challenge is that when you’re trying to compute all the terms of the series out to a1,000,000,000,000q1,000,000,000,000, you wind up multiplying some enormous numbers. I don’t mean just that the numbers are too big to fit into the registers of a 32-bit or a 64-bit computer. They’re too big to fit into the main memory of a computer with 128 gigabytes of RAM. Moreover, the arithmetic has to be done exactly; approximations are useless. Thus a major part of the effort in setting up the computation was devising efficient “out of core” multiplication routines for ridiculously large numbers. The basic algorithm was a fast-fourier-transform method.

And the result? Up to 1012, the computation identified 3,148,379,694 square-free congruent numbers number candidates with N in the residue classes 1, 2, and 3 modulo 8. I look forward to reading more about the analysis of their distribution. [See Update 2009-10-10, below.]

References and resources. Two weeks of reading and tinkering have not made me the master of this subject. For those who would like to explore further I can offer some resources I’ve found helpful, listed in no particular order.

  • Ed Eikenberg has posted some lecture notes from a talk on elliptic curves and congruent numbers given in 2000.
  • William Stein offers slides from a Harvard lecture in 2001.
  • The web page for another William Stein course (this one at the University of Washington in 2006) provides notes and handouts and a collection of function definitions that will let you play with congruent numbers and elliptic curves in the Sage computer mathematics system.
  • Koblitz, Neal. 1993. Introduction to Elliptic Curves and Modular Forms. Second edition. New York: Springer-Verlag. Some kind soul has posted page images for the first chapter online.
  • Guy, Richard K. 2004. Unsolved Problems in Number Theory. Third edition. New York: Springer. (Section D27, p. 306, discusses congruent numbers.)
  • Tunnell, J. B. 1983. A classical diophantine problem and modular forms of weight 3/2. Inventiones Mathematicae 72:323–334. (Available online via the DigiZeitschriften.)
  • Barry Cipra writes on the recent trillion-trangle computation: Tallying the class of congruent numbers, ScienceNOW, September 23, 2009. (Get it now, while it’s free.)
  • Dickson, Leonard Eugene. 1952. History of the Theory of Numbers. New York: Chelsea Pub. Co. (For congruent numbers see Vol. 2, Chapter 16, p. 459.)
  • Brown, Ezra. 2000. Three Fermat trails to elliptic curves. The College Mathematics Journal 31:162–172. Free PDF available online (unlike most CMJ articles), apparently because it won a Polya prize.
  • Rubin, Karl, and Alice Silverberg. 2002. Ranks of elliptic curves. Bulletin of the American Mathematical Society 39:455–474. Online.
  • Silverberg, Alice. 2001. Open questions in arithmetic algebraic geometry. In Arithmetic Algebraic Geometry, Institute for Advanced Study/Park City Mathematics Series 9, pp. 83–142. Providence: American Mathematical Society. Postscript preprint.
  • The American Institute of Mathematics web page on congruent numbers, which includes several helpful appendices, such as a discussion of FFT multiplication, an essay on congruent numbers by Kent Morrison and a link to the paper by Bradshaw, Hart, Harvey,
    Tornaria and Watkins reporting the trillion-triangle computation. (At this writing, the posted version of the paper is still an incomplete draft.)
  • Conrey, J. B., J. P. Keating, M. O. Rubinstein and N. C. Snaith. 2000. On the frequency of vanishing of quadratic twists of modular L-functions. arXiv preprint. (This paper and the one listed below discuss conjectures about the distribution of congruent numbers, among other topics.)
  • Conrey, J. Brian, Jon P. Keating, Michael O. Rubinstein and Nina C. Snaith. 2004. Random matrix theory and the Fourier coefficients of half-integral weight forms. arXiv preprint.
  • Brown, Jim. 2007. Congruent numbers and elliptic curves. PDF (Brown refers to this text as lecture notes for an undergraduate course, but in fact it is a very polished and well-organized expository article. Includes some Sage code for working with rational points of elliptic curves.)
  • Alter, Ronald. 1980. The congruent number problem. American Mathematical Monthly 87:43–45. (Tables of what was known and unknown shortly before Tunnell’s breakthrough.)

Special thanks to David Farmer for alerting me to this story in the first place and to Michael Rubinstein for help understanding what it all means.

Update 2009-10-10: In the comments, Barry Cipra points out that the quadratic equations in the Tunnell criterion necessarily have no solution whenever N is congruent to 5, 6, or 7 modulo 8. In fact all of the zero-solution points in the graph I presented above come from such N. Numbers in the residue classes 0 and 4 modulo 8 cannot be square-free. If we plot only those N that are square-free and congruent to 1, 2 or 3 modulo 8, we are left with just 53 square-free congruent-number candidates up to 1,000:

tunnell-criterion-123.png

Only the numbers in these three residue classes were included in the trillion-triangle computation.

Nautical numeration

Tuesday, September 8th, 2009

I’ve been goofing off in Nova Scotia for a few days. In Halifax I climbed up to the Citadel, a hilltop fortress built to protect the city from the French and later rebuilt to fend off the Americans; now it welcomes both those nationalities and anyone else willing to pay $12 for the tour and the costume show.

signal-balls-vert.jpg In one room of the museum I discovered the curious diagrams reproduced at right. These figures are extracted from a poster on military codes designed for ship-to-shore communication. Signals were sent by hoisting large balls or disks on a mast reaching high above the fortress ramparts; ships in the harbor could reply by raising similar signal disks on their own masks.

The part of the code shown here obviously pertains to the transmission of numbers, but I’ve never seen a weirder system of numeration. Based on these diagrams and others, I infer that the signal disks were raised on three halyards. (A fourth halyard is present in the diagrams but always seems to be empty.) Each halyard could display zero or one or two disks, and each displayed disk could be in either an upper or a lower position. That comes to five possible patterns per halyard, and thus 53 = 125 patterns in all. What baffles me is why the ten decimal digits were assigned to the specific patterns shown in the diagrams. Who counts in the sequence 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 9, 9?

I’m not even sure I understand the basic signaling protocol. When I first looked at the poster, I assumed that the ten digits were represented thus:

positional-encoding.png

Presumably, the three encodings for ‘9′ were equivalent, and any of them could be used at the signaler’s option.

Later, after much musing over this puzzle, I decided that another interpretation of the diagrams seemed more plausible:

cumulative-encoding.png

In this scheme at least the numerals from 1 through 5 have some kind of logic to them, in that the number of disks shown matches the numeral being transmitted; in essence we have a unary notation within that limited range. On the other hand, if this interpretation of the diagrams is correct, we have to ask: Why did the designers stop at 5? They could have extended the same unary scheme to the range from 1 to 6, and then used doubled disks at the top of the mast for the numerals 7, 8 and 9.

If we were creating a code like this today, I’m sure everyone’s first impulse would be a system based on binary notation. It’s not terribly surprising that the British naval authorities circa 1850 didn’t think of that. But I still find the apparent arbitrariness of this numerical code utterly perplexing. Am I missing some pattern in the data, either obvious or subtle? (I suppose the encoding could be deliberately obscure, but there’s no evidence that this was meant to be a secret code, and the mere existence of the poster—which appears to be a 19th-century artifact—suggests otherwise.)

Added 2009-09-10: Below is the signal mask on which the disks were displayed, seen from the landward side. The disks (actually fabric-covered crossed hoops, which have an approximately circular cross section from any azimuth) were 36 inches in diameter. I think the diameter of the mast is about 12 inches at the base.

photo of Halifax military signal mast

* * *

The Googleverse has so far failed to solve this mystery for me, but in the process of searching I stumbled upon a remarkable book that suggests there was some lucid thinking in the 19th century on themes that we would now identify as information theory. The book is A Manual of Signals: For the Use of Signal Officers in the Field, and for Military and Naval Students, Military Schools, Etc., by Albert J. Myer. The full text is available on Google Books in either HTML or PDF.

Myer founded the U.S. Army Signal Corps just before the outbreak of the Civil War. He had studied medicine, writing a dissertation on “A New Sign Language for Deaf Mutes,” and had also worked as a telegraph operator. Then, in the army, he devised the signaling method commonly known as wigwag, using a single flag waved to the left and the right. Fort Myer in Virginia is named for him. (These biographical snippets come from a Signal Corps history, which offers much further detail.)

Myer’s book was first published in 1864 and then revised in 1868, but it takes quite a modern mathematical approach to problems of communication and information. He begins with a tutorial on permutations and combinations, then applies these ideas to the encoding of signals:

We wish for example to make a large number of signals…. We take any few different and simple known signs, sounds, motions, or indications, which we can easily make, and we join them together, twos or threes, or more at a time, making one after another into many and different and more complex signs or arrangements. Each of these new signs becomes, when a meaning is given to it, a signal. We can increase the number of such signals to any limit by continuing to join together the known signals in greater numbers or in new arrangements.

That’s a pretty good description of Σ*, the set of all strings over a finite alphabet.

* * *

Guest update 2009-09-10: The following comes from Barry Cipra.

I have an idea for an alternative interpretation of the way digits are meant to be represented by disks on halyards. My basic premise is that a distant viewer can tell the difference between a disk being up high and it being down low, but cannot accurately judge which halyard a disk is on unless there’s at least one disk on each halyard. (In particular, two disks on outer halyards could be confused with two disks on adjacent halyards if the mast is at an angle to the distant viewer.) In this case, the digits 0 through 9 can be unambiguously represented as follows:

disks-on-halyards encoding suggested by Barry Cipra

In other words, basically your second interpretation for 0 through 8, but with an extra “anchoring” pip for the numbers 1 through 5, but your first interpretation for the number 9. This makes it clear why the small-number format stops at 5.

If my premise is correct (or accepted as correct), then out of the 216 distinct patterns, there are only 1+5+25+125 = 156 unambiguous signals possible. That is, there is 1 signal with no halyards having any disks, 5 with exactly one halyard showing at least one disk, 25 with exactly two halyards showing at least one disk each, and 125 with disks showing on all three halyards.

But maybe we should also require the meaning of a pattern to be unambiguous when the orientation of the ship is uncertain — i.e., A,B,C should have the same meaning as C,B,A. If I’ve thought through things correctly, this cuts the number of unambiguous signals down to 1+5+15+75 = 96.

It would be great to find out how the Navy really did things!

* * *

Update 2009-09-17: Many thanks to Jim Ward (in the comments) for unearthing a 2007 document by Spurgeon G. Roscoe that illuminates the history of this signaling system, even if it doesn’t quite pin down how the code worked.

Roscoe describes a system of signaling towers founded by Edward, Duke of Kent, during his busy tenure as military commander in the Maritime Provinces at the end of the 18th century. Kent’s optical telegraph line was to run from Halifax to Annapolis in Nova Scotia and on to Fredericton in New Brunswick. It is not known with certainty how much of the network was completed or whether it ever served as a practical communications link. (Roscoe is skeptical, pointing out among other things that the territory around the Bay of Fundy is very foggy.)

The chart on exhibit at the Citadel in Halifax is not directly connected with the Kent telegraph system; it comes from a later era and is identified as a device for ship-to-shore communication; nevertheless, there is an unmistakable familial resemblance. Sketches reproduced by Roscoe give the following code system for the Duke of Kent’s overland telegraph:

numeric code scheme from Roscoe manuscript

This is merely a cyclic permutation of positions 0, 6, 7, 8 and 9 in the Citadel code. (On first glance there also seems to be a mirror reversal involved, but that’s not the case; we’re merely looking at the signal mast from the opposite direction.)

Roscoe says nothing about how to interpret these diagrams—whether, for example, the display for “6″ consisted of six disks or a single disk in the lower right corner, or one of the other schemes discussed above or in the comments. But his description of how messages were composed seems so impossibly cumbersome that it leaves me wondering if this plan was ever really put into practice. Apparently, alphabetic characters did not have codes of their own; each letter was encoded in a sequence of numerals. In one system, “A” was 2, 3, 0; “B” was 1, 4, 0; “Q” was 4; “T” was 2, 3. (The compact, single-digit encoding of “Q” makes this a kind of anti-Hamming code.) Can anything so maladaptive have survived the trial of use in the field?

Update 2009-09-22: This is a response to Carl Witty’s comment. I’m putting it here in the main text rather than in a further comment because I think Witty has essentially solved the mystery.

Witty’s interpretation of the Roscoe manuscript rules out all schemes in which the code is “cumulative” — where the signal for 3, for example, consists of the disks numbered 1, 2 and 3. Instead, digits 1 through 6 are each represented by a single displayed disk, and digits 7 through 9 each consist of a single pair of disks. (The triple code for 9 is to be interpreted as an “OR”: Any of the three positions can be used with the same meaning.) Then these digits — which are really just opaque symbols, with no numerical meaning — are combined in various ways to represent the letters of the alphabet and the numbers from 1 to 99. (Supplementary flags bring the range of numbers up to 499.)

Thus when Roscoe indicates that the letter S has the coding “1, 3, 5,” that means the disks labeled 1, 3 and 5 are all displayed simultaneously in order to transmit an S.

There’s a test for this hypothesis. If the scheme is to be workable, the alphabetic and numerical codes composed by displaying two or three or four digits at once must have a particular form. There can be no repeated digits in any encoding, and no two of the digits can use the same position on the mast. Specifically, in the Duke of Kent code illustrated in the 2009-09-17 update above, there can be no composite code that calls for both a 1 and a 7, or a 3 and an 8 or a 9 and a 5.

Does the code transcribed by Roscoe pass this test? Roscoe actually gives two codes, both apparently found in an 1802 “Signal Book” from Camperdowne Station in Nova Scotia. Roscoe’s second listing of alphabetic and numeric encodings does indeed satisfy the constraints. Indeed, it obeys a stronger restriction: No combination of symbols in the code calls for hoisting more than two disks on a single halyard. For example, the code avoids not only 1 and 7 but also 2 and 7.

The other code that Roscoe lists fails the test — or so it seems at first glance. There are patterns such as “1, 7″ encoding the number 54 and “3, 8″ for 64. But on looking closer, it appears that this code adheres to a different set of constraints: There are no instances where a 6 appears together with either a 1 or 2, or where 7 is combined with 3 or 4, or finally where 8 comes together with 5 or 0. These are exactly the restrictions that have to be applied in the Citadel code!

Outnumbered

Monday, August 10th, 2009

My column in the new issue of American Scientist is about the challenges of computing with very large numbers. The column ends with an open question, which I want to restate here as an invitation to commentary.

Any system of arithmetic in which numbers occupy a fixed amount of space can accommodate only a finite set of numbers. For example, if all numbers have to fit into a 32-bit register, then there are only 232 representable numbers. So here’s the question: If our numbering system can represent only 4,294,967,296 distinct values, which 4,294,967,296 values should we choose to represent?

There is an obvious trade-off here between range and precision: To reach farther out on the number line, we have to leave wider gaps between successive representable numbers. But that’s not the only choice to be made. We must also decide whether to sprinkle the numbers uniformly across the available range or to pack them densely in some regions while spreading them thinly elsewhere. I’m not at all sure what principles to adopt in trying to identify the best distribution of numbers.

To frame the question more clearly, I’d like to introduce a series of toy number systems. In each of these systems a number is represented by a block of six bits. There are 26=64 possible six-bit patterns, and so there can be no more than 64 distinct numbers. A function f(x) maps each bit pattern x to a number. Sometimes it’s convenient to regard the bit patterns themselves as numbers, ordered in the obvious way, in which case f(x) is a function from numbers to numbers.

In the simplest case, f(x) is just the identity function, and so the bit patterns from 000000 through 111111 map into the counting numbers from 0 through 63. A slightly more general scheme allows f(x) to be a linear transformation, f(x)=ax+b, where the parameters a and b are constants. If a=1 and b=0 we again get the nonnegative counting numbers. If a=1 and b=–32, we have a range of integers from –32 to +31. With a=1/64 and b=0, the entire spectrum of numbers is packed into the interval [0,1). Numbers formed in this way are called fixed-point numbers, since there’s an implied radix point at the same position in all the numbers. By adjusting the values of a and b, we can generate fixed-point numbers to cover any range, but they are always distributed uniformly across that range.

The well-known floating-point number system, modeled on scientific notation, is a tad more complicated. In my toy version, three bits are allocated to an exponent t, which therefore ranges from 0 to 7. The other three bits specify a fractional significand s, interpreted as having values in the range 8/16, 9/16, 10/16, …, 15/16. The mapping function is defined as f(t, s)=s×2t. The 64 numbers generated in this way range from 1/2 to 120. (Real floating-point systems, such as those defined by the IEEE standard, not only have more bits but also have a more elaborate structure, with allowance for positive and negative significands and exponents and various other doodads.)

Floating-point numbers are not distributed uniformly; they are densest at the bottom of their range and grow sparse toward the outskirts of the number line. Specifically, the density of binary floating-point numbers is reduced by half at every integer power of 2. In the toy model, the gap between successive representable numbers is 1/16 in the range between 1/2 and 1, then 1/8 between 1 and 2, then 1/4 between 2 and 4, and so on. At the top of the range—beyond 64—the only numbers that exist are those divisible by 8.

The distribution of floating-point numbers describes a piecewise-linear curve, which approximates the function 2x. In other words, if you squint and turn your head sideways, you can see that floating-point numbers look a lot like base-2 logarithms. This suggests another numbering scheme: Instead of approximating logarithms, why not use the real thing? We can simply interpret a binary pattern as a logarithm and generate numbers by exponentiating. For the six-bit model, we can take the first two bits as the integer part of the logarithm (the characteristic) and the last four bits as the fractional part (the mantissa). Assuming we continue to work in base 2, the number-defining function is just f(x)=2x. The resulting system of six-bit numbers has a range from 1 to about 235. (By simple rescaling, we could make the range of these logarithmic numbers match that of the floating-point numbers, 1/2 to 120. I’ve not done so for a trivial reason: So that in the graphs presented below the two curves do not overlap and obscure each other.)

Finally, I want to mention a fourth numbering scheme, called the level-index system. Indeed, what led me to this whole topic in the first place was an encounter with some papers on level-index numbering written in the 1980s by Charles W. Clenshaw and Frank W. J. Olver. The level-index system takes a step beyond logarithms to iterated logarithms and beyond exponents to towers of exponents. The idea is easiest to explain in terms of the mapping function f(x). As with logarithmic numbers, we view the bit pattern x as the concatenation of an integer part (the level) and a fractional part (the index). Then f(x) takes the following recursive form:

level-index-eqn.png

For any level greater than 0, this formula gives rise to a tower of exponentiations. For example, the level-index number 3.25 expands as follows:

tower-of-e.png

In the six-bit model, we can dedicate two bits to the integer level and four bits to the fractional index. The result is a numbering system that covers a very wide range: from 0 to almost 400,000 in 64 steps. The distribution of representable numbers in this range is extremely nonuniform: the gap between the two smallest numbers is 1/16, whereas that between the largest numbers is more than 320,000. (Note also that we have shifted from powers to 2 to powers of e. Actually, a level-index system could be defined on any base, but e has some pleasant properties.)

The graph below compares the distribution of numbers in the four six-bit systems.

linear-curves.png

For fixed-point numbers the graph is necessarily a straight line, since the numbers are distributed at equal intervals. The floating-point graph is a jointed sequence of straight-line segments, with the slope doubling at every power of 2. The logarithmic numbers produce a smooth curve, concave-upward. (Again, this curve could be made to coincide with that of the floating-point numbers.) The level-index curve is also smooth but has an even more pronounced elbow or hockey-stick form.

Plotting the same data on log-linear scales clarifies certain details:

log-curves.png

Now it’s the logarithmic system that yields a straight-line graph. The floating-point curve approximates a parallel straight line. The fixed-point graph is concave-downward, while the level-index curve is sigmoid.

By twiddling various knobs and dials, we could create a number system that would approximate any monotonic line or curve in this space, giving us fine-grained control over the distribution of numbers. For example, on the semilogarithmic graph a straight line of any positive slope can be generated by adjusting the base of the logarithmic number system or the radix of a floating-point system; a larger base or radix yields a steeper slope. The inflection of the level-index curve can be altered in the same way, by choosing different bases. If we wished, we could interpolate between the fixed-point curve and the logarithmic curve by creating number systems in which f(x) is some polynomial, such as x2. If we wanted a flatter curve than the fixed-point system, we could build numbers around a function such as f(x)=√2.

So many ways of counting! Let me count the ways.

And so I return to the main question. On what basis should we choose a number system for a digital computer? Let’s leave aside practicalities of implementation: Suppose we could do arithmetic with equal efficiency using any of these schemes. Which of the curves should we prefer? There are plausible arguments for allocating a greater share of our numerical resources to smaller numbers, on the grounds that we spend most of our time on that part of the number line—it’s like the middle-C octave on the piano keyboard. That would seem to favor the logarithmic or floating-point system over fixed-point numbers. But can we make the argument quantitative, so that it will tell us the ideal slope for the logarithmic line, and hence the mathematically optimal base or radix? (Does Benford’s law help at all in making such a decision?)

And what about the level-index system, which skews the distribution even further, providing higher resolution in the neighborhood of 1 in exchange for a sparser population out in the boondocks? An argument in favor of this strategy is that overflow is a more serious failure than loss of precision; the choice is between a less-accurate answer and no answer at all. Level-index numbers grow so fast that they can effectively eliminate overflow: Although it’s a finite system, you can’t fall off the end of the number line.

Some years ago Nick Trefethen, a numerical analyst, wrote down some predictions for the future of scientific computing, including these remarks:

One thing that I believe will last is floating point arithmetic. Of course, the details will change, and in particular, word lengths will continue their progression from 16 to 32 to 64 to 128 bits and beyond…. But I believe the two defining features of floating point arithmetic will persist: relative rather than absolute magnitudes, and rounding of all intermediate quantities. Outside the numerical community, some people feel that floating point arithmetic is an anachronism, a 1950s kludge that is destined to be cast aside as machines become more sophisticated. Computers may have been born as number crunchers, the feeling goes, but now that they are fast enough to do arbitrary symbolic manipulations, we must move to a higher plane. In truth, however, no amount of computer power will change the fact that most numerical problems cannot be solved symbolically. You have to make approximations, and floating-point arithmetic is the best general-purpose approximation idea ever devised.

At this point, predicting the continued survival of floating-point formats seems like a safe bet, if only because of the QWERTY factor—the system is deeply entrenched, and any replacement would have to overcome a great deal of inertia. But Trefethen’s “two defining features of floating point arithmetic” are not actually unique to the floating-point system; logarithmic and level-index numbers (and doubtless others as well) also employ “relative magnitudes” and require “rounding of all intermediate quantities.” It may well be that floating point is the best general-purpose approximation idea ever devised, but I’m not convinced that all the alternatives have been given serious evaluation.

Free Pi

Two further notes. Because of space constraints, my American Scientist column appeared with a painfully truncated bibliography. I have posted an ampler list of references here.

Also, would anyone like to have 10 million digits of pi on microfiche cards? While scrounging in my files for interesting lore on large numbers, I came upon a yellow envelope in which 10,015,000 digits of pi fill up nine fiche cards. The envelope bears the notation “rec’d from Y. Kanada, Univ. of Tokyo, 1983.” The 10 million digits set a record back then, which I noted in a brief Scientific American news item. It seems a shame to destroy such a curious artifact; on the other hand, I can think of no earthly use for it.

Kanada’s group has gone on to calculate more than a trillion digits of pi, but he didn’t send me the 900,000 microfiche cards it would take to hold the listing.