Archive for the ‘computing’ Category

Flights of fancy

Tuesday, October 27th, 2009

starlings-closeup-2058.JPG

As I have mentioned in the past, I’m fascinated by the acrobatics of bird flocks, especially the big congregations of European starlings that gather in the evening at this time of year. Evidently I’m not the only one with such an interest. In the past few years the subject has attracted the attention of quite a large flock of scientists, including not only biologists but also various luminaries in physics, mathematics and computer science.

Below are some notes on a few of the recent papers, but first I have to mention a classic from 20 years ago:

Reynolds, Craig W. 1987. Flocks, herds, and schools: a distributed behavioral model. Computer Graphics 21(4):25–33. Author archive.

This is the paper that began the modern era of flocking studies by proposing that animals could coordinate and synchronize their movements without any need for a leader or external cues. Others were thinking along the same lines at about the same time, but it was Reynolds who attracted wide notice with his enchanting computer animations of “boids” soaring through an imaginary three-dimensional space. Each individual in the flock acts according to simple, local, fixed rules, and the synchronized maneuvers emerge spontaneously.

Reynolds suggested three particular rules that might guide the behavior of each bird:

  • Avoid collisions.
  • Try to match the speed and heading of nearby birds.
  • Move toward the center of the group in which you are flying.

Reynolds was working in computer graphics, and his ideas were soon taken up by movie studios and by the makers of video games. In a sense, his simulations only had to look right; they didn’t have to reflect what actually goes on in a starling’s head. But whether or not the birds were paying attention, students of animal behavior certainly were.

starlings-wide-2064.jpg

Much of the recent activity arises out of new field studies, conducted mainly by physicists.

Cavagna, Andrea, Irene Giardina, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale and Vladimir Zdravkovic. 2008. The STARFLAG handbook on collective animal behaviour. 1: Empirical methods. Animal
Behaviour
76:217–236. Preprint.

Cavagna, Andrea, Irene Giardina, Alberto Orlandi, Giorgio Parisi and Andrea Procaccini. 2008. The STARFLAG handbook on collective animal behaviour. 2: Three-dimensional analysis. Animal Behaviour 76:237–248. Preprint.

This group, coordinated by Andrea Cavagna and Irene Giardina of the University of Rome La Sapienza, has been photographing starling flocks near the city’s main railroad station (the Termini), which is just a few blocks from the university. Using pairs of synchronized cameras, the observers have captured stereoscopic images and then applied special image-analysis software to reconstruct the three-dimensional trajectory of each bird. Similar techniques have been tried in the past, but only with small flocks (a few dozen birds). The Italian group has traced the motions of individual birds in groups of up to 2,600. The two papers cited above give technical details on how the data were gathered and analyzed.

Ballerini, Michele, Nicola Cabibbo, Raphael Candelier, Andrea Cavagna, Evaristo Cisbani, Irene Giardina, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale and Vladimir Zdravkovic. 2008. Empirical investigation of starling flocks: a benchmark study in collective animal behaviour. Animal Behaviour 76:201–215. Preprint.

Ballerini, Michele, Nicola Cabibbo, Raphael Candelier, Andrea Cavagna, Evaristo Cisbani, Irene Giardina, Vivien Lecomte, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale and Vladimir Zdravkovic. 2008. Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proceedings of the National Academy of Science of the USA 105:1232–1237. Open access.

And here the same authors (with a few additions) report their results and conclusions. They base their interpretation on a computational model that is recognizably a descendant of the Reynolds scheme, but with one crucial modification. Reynolds and others assumed that each bird is influenced by all other birds within some fixed distance (a “metric neighborhood”); Ballerini et al. get a closer match to the data by assuming that a bird attends to the motions of a fixed number of near neighbors, regardless of distance (a “topological neighborhood”). In other words, the graph of interacting birds has nearly constant vertex degree; the typical degree is probably six or seven. The main significance of this algorithmic change is that it helps maintain the cohesion of the flock in spite of large variations in density.

Hildenbrandt, Hanno, Claudio Carere and Charlotte K. Hemelrijk. 2009. Self-organised complex aerial displays of thousands of starlings: a model. arXiv:0908.2677v1

Those same flocks at Termini have a role in this study as well; the model presented here draws on data from Ballerini et al. as well as videotapes made at Termini by Carere. (Carere is another physicist at Sapienza; Hildenbrandt and Hemelrijk are biologists at the University of Groningen.)

The model works on the same essential principles, but it differs in intellectual style and emphasis. Hildenbrandt et al. want to account for specific details of a flock’s behavior—not just the general tendency to fly in close formation but also the particular shapes of starling flocks, the maneuvers they perform, the altitudes they prefer, and so on. Reaching for this verisimilitude leads to a rather complicated model with many parameters in need of fine tuning, such as aerodynamic properties of the bird’s wing and body and banking angles in turns. Hildenbrandt et al. report some success in explaining the geometry of flocks (they tend to be horizontally flattened rather than spherical). They do less well in an attempt to account for an extra-dense layer of birds observed at the periphery of a flock.

starlings-landing-2072.jpg

Cucker, Felipe, and Steve Smale. 2007. Emergent behavior in flocks. IEEE Transactions on Automatic Control 52:852–862.

Chazelle, Bernard. 2009. Natural algorithms. Proceedings of the 20th Symposium on Discrete Algorithms, pp. 422-431. Preprint.

Chazelle, Bernard. 2009. The convergence of bird flocking. arXiv:0905.4241v1

Leaving behind the breathy wing-beats of living starlings, we enter a world of mathematical abstractions.

Cucker and Smale, peripatetic mathematicians currently at the City University of Hong Kong, take a stripped-down model of flocking and ask this question: Is it guaranteed that all the birds in the flock will eventually settle on the same velocity, and thus fly together forever? Chazelle, a theoretical computer scientist at Princeton, asks a follow-on question: If the birds do converge on the same speed and heading, how long might it take for them to do so, in the worst case?

The answer to the Cucker-Smale question turns out the be yes: Given certain preconditions and parameter values, convergence is certain. But Chazelle shows that it can take quite a while for the flock to reach consensus. For n birds adjusting their velocities in discrete steps, the upper bound is 2 ↑↑ (4 log n) steps. As I was saying just the other day, this up-arrow notation denotes an exponential tower of 2s with, in this case, 4 log2 n levels. In other words, in a flock of a thousand birds, the convergence time is roughly

\[2^{2^{2^{\cdot^{\cdot^{\cdot^2}}}}}\]

with 40 levels of exponentiation. This is a ridiculous number, far exceeding the lifetime of a starling (or of a universe, for that matter). As Chazelle notes: “Our bounds obviously say nothing about physical birds in the real world. They merely highlight the exotic behavior of the mathematical models.”

It is rather wonderful to reflect—as you stand in a field of corn stubble admiring the flocks of birds wheeling overhead in the evening sky—that these avian entertainments should be the starting point for a line of reasoning that ventures so far into the wild blue yonder of inexpressible numbers.

Lebar Bajec, Iztok, and Frank H. Heppner. 2009. Organized flight in birds. Animal Behaviour 78:777–789. Preprint.

I mention this piece last, but it would actually be a good place to start if you want a primer on flocking. Frank Heppner, a biologist at the University of Rhode Island, is one of the pioneers of flocking-and-swarming studies; here, with a mathematical colleague from the University of Ljubljana, he reviews many of the recent contributions and puts them in historical context. The review includes a discussion of the more crystalline flying formations of large birds such as geese as well as the amorphous flocks of starlings.

Congruent numbers

Tuesday, October 6th, 2009

A press release from the American Institute of Mathematics two weeks ago announced that all the congruent numbers up to 1 trillion have been enumerated. Two questions leap to mind. What the heck is a congruent number? And who cares?

I’ll return to those questions. But first I’d like to pause just a moment to marvel at the idea of calculating anything up to 1012. A few decades ago, such a project would have been unthinkable. Today, counting to a trillion takes only an hour or so, even on plain vanilla hardware. This is truly one of the wonders of the age, and we shouldn’t grow too blasé about it. But the computation of all those congruent numbers involved a lot more than looping through “+1″ a trillion times, and it took considerably longer than an hour.

Okay, so what’s a congruent number? I would like to sidle up to that question rather than face it straightaway. The definition is not hard to understand—it’s all about right triangles with rational side lengths and integer areas—but when I started looking into this story, it took me a while to appreciate why those particular triangles might be worth thinking about.

Let’s begin with Pythagorean triples—sets of positive integers (a, b, c) that define a right triangle, with a and b giving the lengths of the legs and c the hypotenuse. The familiar (3, 4, 5) triangle is everybody’s favorite example. For which right triangles with integer side lengths is the area also an integer? That’s easy: All of them. The area of a right triangle is ab/2; in any Pythagorean triple either a or b (or both) must be an even number, which implies that ab/2 is a whole number. For the (3, 4, 5) triangle the area is (3 × 4)/2 = 6.

Knowing one such triangle, we can make more. Lots more. Just multiply a, b and c by any integer k, which has the effect of multiplying the area by k2. If k = 2, we get the (6, 8, 10) triangle with area 24; k = 3 yields the (9, 12, 15) triangle with area 54, and so on. The resulting sequence of triangles is infinite but not very interesting; all the larger triangles are just scaled-up versions of the original, like photographic enlargements. We can view the entire infinite series as a single class of triangles, and take the smallest member of the series as the prototype. That smallest triangle is given by a primitive Pythagorean triple—one where a, b and c have no factors in common. It turns out there are also infinitely many primitive triples. Euclid bequeathed us an algorithm that will generate all of them, if you let it run forever.

Given this infinite set of infinite series of triangles, it’s clear that infinitely many integers can be the area of a Pythagorean triangle. But that certainly doesn’t mean that every positive integer has this property. For example, there’s no Pythagorean triangle with an area of 5. To convince yourself of this fact, just measure the area of every integer-sided right triangle with legs no longer than 10. None of those triangles has an area of 5, and no triangle with longer integer legs can have an area that small. In principle, the same laborious but reliable procedure could be applied to any integer N to determine whether or not it is the area of some Pythagorean triangle. Here are the first few integers that appear in such an enumeration ( sequence A009112.):

6, 24, 30, 54, 60, 84, 96, 120, 150, 180, 210, 216, 240, 270, 294, 330

Rational triangles. Suppose we relax the constraints a little, allowing the sides of a right triangle to be rational numbers—either fractions or integers but not irrational values such as the square root of 2. The area is still required to be an integer.

A first question is whether right triangles with fractional sides and integer areas even exist. Maybe the cupboard is bare; maybe there are no such triangles. But no, they do exist. Here’s a proof by example:

35-12triangle.png

There’s no trickery here. If you’re in any doubt, do the arithmetic; you’ll find that the numbers define a genuine right triangle (the Pythagorean theorem holds) and the area really is exactly 7.

At last we have arrived in the realm of congruent numbers. A congruent number is an integer that’s the area of a right triangle with rational sides. All integer-sided right triangles are included in the category, along with those whose sides are rational fractions. Here’s a more formal definition:

A positive integer N is congruent if there exist rational numbers a, b and c such that a2 + b2 = c2 and ab/2 = N.

All the same questions we were asking about Pythagorean triples also come up in this new context. Where do we find rational triangles with integer area, or how can we manufacture them? Which integers N can be the area of such a triangle? Answers aren’t quite as easy to come by in this case. In particular, the strategy of enumerating triangles from the smallest up won’t work, because there is no smallest rational triangle. There are other ways of imposing an ordering on the rationals, but none of them lead to a good algorithm for enumerating congruent numbers.

So where did I get the example illustrated above? I didn’t just stumble upon it by trying random rational triangles. Note that multiplying each rational side length by the common denominator of the fractions will return us to the land of integer-sided triangles. For the example shown, multiplying through by 60 yields a (175, 288, 337) triangle; these numbers have no factor in common, so they form a primitive Pythagorean triple. The area of the new triangle is 7 × 60 × 60, or 25,200. Thus we’ve recovered an integer triangle from a rational one; what’s more important, the process can be reversed to derive rational triangles from integer-sided ones.

I mentioned a Euclidean method for generating primitive Pythagorean triples. It goes like this: Take any two positive integers m and n that satisfy the following conditions:

  • m > n;
  • one of the numbers is odd and the other even;
  • the numbers are coprime (no common factors other than 1).

Then (2mn, m2n2, m2 + n2) is a primitive Pythagorean triple. By counting through all suitable values of (m, n) starting with (2, 1), all primitive triples are generated.

A postprocessing step built atop Euclid’s procedure will generate congruent numbers. The plan is to produce a primitive Pythagorean triple and calculate the area N = ab/2 of the corresponding triangle. If the area is “square-free”—that is, it has no pairs of repeated prime factors and thus cannot be divided evenly by a perfect square—then we’re done; that value of N is a congruent number and cannot be reduced to a smaller congruent number. But if N is not square-free, we can divide out each square factor, leaving a smaller triangle with rational sides and integer area, and thereby generating another congruent number.

A worked example: The (m, n) pair (5, 4) produce the primitive triple (9, 40, 41), with area N = (9 × 40)/2 = 180. Thus 180 is identified as a congruent number, but it is not square-free; it factors as 22 × 32 × 5. We can therefore divide the area by 4 and the sides by 2, producing a shrunken (9/2, 20, 41/2) triangle of area N = 45. In this way we learn that 45 is also a congruent number, but again it is not square-free. Dividing the area by 9 and the sides by 3 yields the (3/2, 20/3 and 41/6) triangle with area N = 5. And so 5, too, is congruent; it’s also square-free and therefore cannot be reduced further. There is no smaller triangle similar to the (9, 40, 41) triangle that has integer area.

This scheme can produce an unlimited supply of congruent numbers. Unfortunately, it’s not so well suited to answering the question of whether a particular integer is congruent. The problem is that the congruent values are not generated in order from smallest to largest. If we turn the crank for a while and discover that a certain integer N appears in the algorithm’s output, then we know for sure that N is congruent. But if N has not shown up, we can’t conclude that it is not among the congruent numbers; we might simply have to wait longer for N’s turn to come.

When I turn the crank on my own little program for generating congruent numbers, these are the first 101 values to pop out, in order of appearance:

6, 30, 60, 15, 84, 21, 210, 180, 45, 5, 330, 630, 70, 924, 231, 546, 504, 126, 14, 1320, 1560, 390, 840, 1386, 154, 2340, 585, 65, 1224, 306, 34, 990, 110, 2730, 3570, 1710, 190, 2574, 286, 4620, 1155, 5610, 5016, 1254, 2310, 1716, 429, 7140, 1785, 7980, 1995, 3036, 759, 4290, 7956, 1989, 221, 10374, 10920, 8970, 3900, 975, 39, 7854, 11970, 1330, 14490, 1610, 11550, 462, 4914, 6630, 12540, 3135, 19320, 4830, 6090, 4080, 1020, 255, 11856, 2964, 741, 18480, 23184, 5796, 1449, 161, 25200, 6300, 1575, 175, 7

Here’s the same list sorted into ascending order of magnitude:

5, 6, 7, 14, 15, 21, 30, 34, 39, 45, 60, 65, 70, 84, 110, 126, 154, 161, 175, 180, 190, 210, 221, 231, 255, 286, 306, 330, 390, 429, 462, 504, 546, 585, 630, 741, 759, 840, 924, 975, 990, 1020, 1155, 1224, 1254, 1320, 1330, 1386, 1449, 1560, 1575, 1610, 1710, 1716, 1785, 1989, 1995, 2310, 2340, 2574, 2730, 2964, 3036, 3135, 3570, 3900, 4080, 4290, 4620, 4830, 4914, 5016, 5610, 5796, 6090, 6300, 6630, 7140, 7854, 7956, 7980, 8970, 10374, 10920, 11550, 11856, 11970, 12540, 14490, 18480, 19320, 23184, 25200

The problem, again, is that we can’t conclude anything about the numbers that don’t appear in this collection. For example, 13 is absent, even though it is in fact a congruent number; it just doesn’t turn up until we get well down the list—it comes from the 49,485th triangle examined, which has sides (780/323, 323/30, 106921/9690). On the other hand, 8 is unlisted because it is not congruent; it will never show up in the output hopper no matter how long we keep cranking away.

Here are the first 101 true congruent numbers (sequence A003273):

5, 6, 7, 13, 14, 15, 20, 21, 22, 23, 24, 28, 29, 30, 31, 34, 37, 38, 39, 41, 45, 46, 47, 52, 53, 54, 55, 56, 60, 61, 62, 63, 65, 69, 70, 71, 77, 78, 79, 80, 84, 85, 86, 87, 88, 92, 93, 94, 95, 96, 101, 102, 103, 109, 110, 111, 112, 116, 117, 118, 119, 120, 124, 125, 126, 127, 133, 134, 135, 136, 137, 138, 141, 142, 143, 145, 148, 149, 150, 151, 152, 154, 156, 157, 158, 159, 161, 164, 165, 166, 167, 173, 174, 175, 180, 181, 182, 183, 184, 188, 189

Ancient history. The search for congruent numbers does not stretch back all the way to Pythagorus or Euclid, although Diophantus apparently considered a couple of special cases. L. E. Dickson’s big history of number theory attributes the first full statement of the problem to “an anonymous Arab manuscript, written before 972.” Other sources cite the works of Abu Bakr al-Karaji, a mathematician who worked in Baghdad at the end of the 10th century. A couple of centuries later Fibonacci, who straddled the Arabic and European worlds, had more to say about the problem in his Liber Quadratorum. Later still, Pierre de Fermat proved (in a marginal note, for which he had sufficient room!) that 1 is not a congruent number. (The proof extends to 4 and 9 and 16 and all other square numbers, none of which are congruent.)

These early writers formulated the problem in a somewhat different way than the rational-triangle scheme explained above. Given an integer N, they asked whether it’s possible to find three perfect squares in arithmetic progression with the interval between the squares equal to N. In other words, they sought a number s2 such that s2 – N, s2, and s2 + N are all perfect squares. For example, if N = 24, then the solution is s = 5; the three perfect squares are 25 – 24 = 1, 25, and 25 + 24 = 49.

The two versions of the problem—the integer-area right triangles and the squares in arithmetic progression—are equivalent, although that’s not exactly obvious. Here’s the connection: If a2 + b2 = c2 and ab/2 = N, then setting s = c/2 guarantees that s2N, s2, and s2 + N are all perfect squares. (For the algebraic exercise proving this, see the book by Neal Koblitz cited below.)

By the way, the squares-in-arithmetic-progression version is where we get the term “congruent numbers.” The three numbers s2N, s2, and s2 + N are all congruent modulo N. For example, 1, 25 and 49 are all congruent to 1 modulo 24. If you ask me, “congruent numbers” is a poor excuse for a name, and is particularly confusing in the context of triangles, where “congruent” has another meaning altogether. But after a thousand years I suppose it’s too late to fix it. “Karaji numbers,” anyone?

By 1915, all the integers up to 100 had been classified as either congruent or noncongruent. But as recently as 1980, when Ronald Alter wrote a brief review of the status of the problem, there were still 189 square-free numbers less than 1,000 for which the question of congruence remained unsettled. Then everything changed in 1983. In that year the nature of the congruent-number problem was transformed by Jerrold B. Tunnell of Rutgers, who not only found a better way to calculate congruent numbers but also showed why it’s worth the effort to do so.

From right triangles to elliptic curves. Before I go on, a warning: I am about to walk up to the blackboard with a piece of chalk in my hand and impersonate one of those masterful, self-confident lecturers who rattle off long trains of equations, invent lemmas on the fly, and always come to QED just as the blackboard fills up and the class ends. In my case any such air of mastery is a complete illusion; I’m just learning this stuff as I go along. But I’ll do my best to make it an entertaining illusion.

So here goes. It’s not hard to see that any product of perfect squares is also a perfect square: a2b2c2 = (abc)2. Therefore, if N is a congruent number, and s2 – N, s2, and s2 + N are all squares, we can set the product of these three factors equal to another square; call it y2. Thus we get the equation:

y2  =  (s2N) s2(s2 + N).

Multiplying the three terms on the right, this becomes:

y2  =  (s2)3N2s2.

Now perform a simple substitution of variables, setting s2 = x:

y2 = x3N2x.

This is the equation of an elliptic curve—another mathematical object with a really unfortunate name. An elliptic curve looks nothing like an ellipse. Here’s the graph of a particular elliptic curve, the one for N = 6:

elliptic-curve-6.jpg

The two pieces of the blue line mark the locus of all points (x, y) that satisfy the equation

y2 = x3 – 36x.

And the hot pink dot? That marks a rational point on the curve—a point where x and y both take on rational values. (Insiders, I’m told, call it a ratpoint.) Specifically, the dot identifies the point x = 25/4, y = 35/8. If you care to plug those numbers into the equation, you’ll find that indeed 1225/64 = 15625/64 – 225, and so the point does lie on the curve.

Tunnell, elaborating on earlier work of Kurt Heegner, showed that N is congruent if the elliptic curve y2 = x3N2x passes through a certain kind of point on the (x, y) plane. Specifically, we have to search for points whose x and y coordinates are both rational numbers, but we ignore the three points with y = 0. Then the x coordinate of the point has to satisfy three more conditions. Letting x = u/v, we require that:

  • u and v are perfect squares,
  • v is even,
  • u has no factors in common with N.

If we can find just one point on the curve that matches all these properties, then we can generate an infinity of other rational points. Moreover, the existence of such a point implies, according to Tunnell’s theorem, that N is congruent. (The hot pink point above clearly qualifies.) Thus the congruent-number problem appears to be solved: We have an unambiguous criterion for deciding if a given N is congruent or not. Just construct the corresponding elliptic curve and check the ratpoints.

Regrettably, we’ve been set up for yet another disappointment. Searching for rational points on elliptic curves is a task for which we have no efficient general method. We’re really no better off than trying to generate all Pythagorean triples.

But wait! We’re not done yet. From the properties of elliptic curves, Tunnell derived yet another criterion—and this one turns out to be the key to a simple and practical test. It all hinges on the number of integer solutions to some quadratic equations that on first glance appear to be arbitrary strings of symbols plucked out of thin air. The criterion is this: If N is a square-free congruent number, and if N is odd, then the number of integer solutions to the equation N = 2x2 + y2 + 8z2 must be exactly double the number of integer solutions to N = 2x2 + y2 + 32z2. (If both equations have no integer solutions, the condition is satisfied, since 2 × 0 = 0.) For even N, the two equations are slightly different, but the test works in exactly the same way.

tunnell-criterion-1000.png

The graphic above shows the 361 square-free numbers up to 1,000 that pass the Tunnell test. The height of each dot indicates the number of integer solutions to the equation N = 2x2 + y2 + 8z2 (or the corresponding equation for even N). The two values of N with 160 solutions are 689 and 761. (It’s curious that in most cases—308 out of 361—the number of solutions is actually zero. I don’t know what this means or whether that pattern continues with larger N.) [See Update 2009-10-10, below.]

What sets the Tunnell criterion apart from all the others is that we can actually carry out the test in a reasonable and predictable amount of time. For any given N, counting the solutions should be doable in time proportional to N3/2, simply by trying all integer values of x, y and z less than the square root of N.

So that’s it, right? Problem solved at last? Well, no, there’s still a small hitch—a bit of awkward fine print. Tunnell proved that if N is square-free and congruent, then the criterion on the number of integer solutions must be satisfied. This much is unconditionally true. But what about the converse? If the criterion is satisfied, can we be certain that N is a congruent number? In this direction, the proof is not unconditional. It’s contingent on a proposition known as the Birch–Swinnerton-Dyer conjecture, which is widely believed, and supported by an abundance of empirical evidence, but still unproved. Which leaves just enough doubt to make the game still interesting.

Why should anyone care about this stuff? On first acquaintance, the search for congruent numbers sounds like one of those mathematical pastimes that appeal to amateurs (like me!) but don’t really command the attention of the research community. There are so many kinds of cutely named numbers out there—perfect, amicable, lucky, happy—and not all of them carry deep significance. The congruent numbers might well be just another amusing sideshow. But it turns out they’re not. There’s serious mathematics here, enough to engage the interest of serious mathematicians.

Elliptic curves have been a focus of intense scrutiny for decades. Henri Poincaré studied them in the early years of the 20th century. In the 1920s Louis Mordell proved a theorem about the rational points on elliptic curves: Even when there are infinitely many points, they all come from a finite set of “generators”; the number of generators and hence the number of infinite families is called the rank of the curve. In the 1950s Yutaka Taniyama and Goro Shimura, with later refinements by André Weil, worked out a conjecture about elliptic curves and another class of mathematical objects, called modular forms. Andrew Wiles and Richard Taylor proved part of that conjecture in the course of settling Fermat’s Last Theorem in the 1990s; the rest of the Taniyama-Shimura conjecture has since been proved as well.

And now we have the Birch–Swinnerton-Dyer conjecture, one of the famous million-dollar math problems. I’m not going to try to explain the conjecture in any detail—I’ve used up all my chalk, and probably my readers’ patience, too—but I think the basic idea goes something like this. On the one hand we have an elliptic curve, drawn in the (x, y) plane. On the other hand we have something called an L-function, which is defined on the plane of complex numbers. Think of the L-function as an undulating landscape with mountains rising above the plane and submarine canyons deep under it. The topography of this surface is determined in part by points selected from the elliptic curve, so there’s a connection between the two objects. The conjecture formulated in the 1960s by Bryan Birch and Peter Swinnerton-Dyer says that the shape of the L-function in the neighborhood where it passes through zero gives us information about the rank of the elliptic curve and thus about the number of rational points.

There’s an analogy between the BSD conjecture and an even more celebrated problem on the million-dollar list, the Riemann hypothesis. The zeros of the Riemann zeta function (which is much like an L-function), carry information about the distribution of prime numbers. The Birch–Swinnerton-Dyer conjecture invites us to suppose that the zeros of L-functions of certain elliptic curves tell us something about the distribution of congruent numbers.

Incidentally, the BSD conjecture must be one of the earliest products of computer-driven experimental mathematics. The numerical explorations that led to the conjecture were done with the EDSAC, the pioneering electronic computer built at the University of Cambridge.

The trillion triangles. Hand-waving about elliptic curves and L-functions might provide vague a rationale for interest in congruent numbers, but one part of the story I still didn’t get was why anyone would bother to compute mass quantities of congruent numbers. So I asked Michael Rubinstein of the University of Waterloo. Rubinstein had earlier computed the congruent numbers up to 109, and it was his challenge that provoked the recent thousandfold expansion of the inventory. Rubinstein explained:

I’m interested in the statistical distribution of congruent numbers. A few years ago, Brian Conrey, Jon Keating, myself, and Nina Snaith came up with a prediction for the asymptotic number of congruent numbers up to x, akin to the prime number theorem. This prediction grew out of remarkable models created by the same researchers and David Farmer for related elliptic curve L-functions that were inspired by random matrix theory and based on a number of unproved assumptions.

The prediction is that the number of congruent numbers less than x, arising from even-rank elliptic curves, is asymptotically

\[c x^{3/4} \log(x)^{11/8}\]

where c is a constant. Our model doesn’t let us completely nail down the constant c, and the data will help us understand what the correct constant is. The asymptotic behaviour stabilizes at a logarithmic rate, so going to 1012 is only 50 percent better than going to 108.

It will also provide a good amount of data for studying the statistics of ‘higher rank elliptic curves’ in the family of elliptic curves related to congruent numbers. The correct model to use for higher rank elliptic curves is not really understood, and the billions of congruent numbers found will end up yielding several million higher rank curves with which we hope to gain insight into the statistics of higher rank curves (compared to just thousands of higher rank curves from earlier computations).

The enumeration of the trillion triangles was done by two teams. William Hart of the University of Warwick and Gonzalo Tornaria of the Universidad de la Republica in Uruguay ran their program on a computer called Selmer at Warwick. Mark Watkins of the University of Sydney, David Harvey of the Courant Institute and Robert Bradshaw of the University of Washington used the SAGE computer at Washington.

The strategy of the computation is based on the Tunnell criterion, but it turns out there’s a better way to go about it than explicitly counting the number of integer solutions to those various quadratic equations in (x, y, z). Tunnell showed that all the information about the number of solutions could be encoded in a formal power series, which looks like this:

\[A(q) = a_{1}q^{1} + a_{2}q^{2} + a_{3}q^{3} + a_{4}q^{4} + ... \]

It’s a “formal” series in the sense that we don’t actually care about evaluating the sum for any particular value of q; instead, we just want to know the various coefficients ai. This is how the Tunnell criterion gets encoded in the series: If ai is zero in this series, and if i is a square-free number, then i is congruent. (Actually, there are two separate series, one for odd i and one for even i.)

The odd and even power series are generated by a couple of formidable-looking products. Here’s the one for the odd series:

\[A(q)=q\prod_{n=1}^\infty(1-q^{8n})(1-q^{16n})\left(1+\sum_{n=1}^\infty 2q^{2n^2}\right)\]

Since both the overall product and the sum in the third factor call for infinitely many values of n, we can’t multiply out this whole expression. Fortunately, though, it turns out that larger n contribute only to higher coefficients, and so we can compute early terms in the series without worrying about what might happen later. Kent Morrison notes that taking just the factors

\[q(1-q^8)(1-q^{16})(1-q^{16})(1+2q^2+2q^8+2q^{18})\]

is enough to generate all the odd congruent numbers up to 25. Those five factors expand to:

\[A(q)=q^1+2q^3+q^9-2q^{11}-4q^{17}-2q^{19}-3q^{25}+...\text{ higher terms}\]

The terms missing from this series—those with a zero coefficient—are just the odd square-free congruent numbers in this range: 5, 7, 13, 15, 21, 23.

The big computation by Bradshaw et al. used essentially this same scheme, with a few refinements and optimizations. The challenge is that when you’re trying to compute all the terms of the series out to a1,000,000,000,000q1,000,000,000,000, you wind up multiplying some enormous numbers. I don’t mean just that the numbers are too big to fit into the registers of a 32-bit or a 64-bit computer. They’re too big to fit into the main memory of a computer with 128 gigabytes of RAM. Moreover, the arithmetic has to be done exactly; approximations are useless. Thus a major part of the effort in setting up the computation was devising efficient “out of core” multiplication routines for ridiculously large numbers. The basic algorithm was a fast-fourier-transform method.

And the result? Up to 1012, the computation identified 3,148,379,694 square-free congruent numbers number candidates with N in the residue classes 1, 2, and 3 modulo 8. I look forward to reading more about the analysis of their distribution. [See Update 2009-10-10, below.]

References and resources. Two weeks of reading and tinkering have not made me the master of this subject. For those who would like to explore further I can offer some resources I’ve found helpful, listed in no particular order.

  • Ed Eikenberg has posted some lecture notes from a talk on elliptic curves and congruent numbers given in 2000.
  • William Stein offers slides from a Harvard lecture in 2001.
  • The web page for another William Stein course (this one at the University of Washington in 2006) provides notes and handouts and a collection of function definitions that will let you play with congruent numbers and elliptic curves in the Sage computer mathematics system.
  • Koblitz, Neal. 1993. Introduction to Elliptic Curves and Modular Forms. Second edition. New York: Springer-Verlag. Some kind soul has posted page images for the first chapter online.
  • Guy, Richard K. 2004. Unsolved Problems in Number Theory. Third edition. New York: Springer. (Section D27, p. 306, discusses congruent numbers.)
  • Tunnell, J. B. 1983. A classical diophantine problem and modular forms of weight 3/2. Inventiones Mathematicae 72:323–334. (Available online via the DigiZeitschriften.)
  • Barry Cipra writes on the recent trillion-trangle computation: Tallying the class of congruent numbers, ScienceNOW, September 23, 2009. (Get it now, while it’s free.)
  • Dickson, Leonard Eugene. 1952. History of the Theory of Numbers. New York: Chelsea Pub. Co. (For congruent numbers see Vol. 2, Chapter 16, p. 459.)
  • Brown, Ezra. 2000. Three Fermat trails to elliptic curves. The College Mathematics Journal 31:162–172. Free PDF available online (unlike most CMJ articles), apparently because it won a Polya prize.
  • Rubin, Karl, and Alice Silverberg. 2002. Ranks of elliptic curves. Bulletin of the American Mathematical Society 39:455–474. Online.
  • Silverberg, Alice. 2001. Open questions in arithmetic algebraic geometry. In Arithmetic Algebraic Geometry, Institute for Advanced Study/Park City Mathematics Series 9, pp. 83–142. Providence: American Mathematical Society. Postscript preprint.
  • The American Institute of Mathematics web page on congruent numbers, which includes several helpful appendices, such as a discussion of FFT multiplication, an essay on congruent numbers by Kent Morrison and a link to the paper by Bradshaw, Hart, Harvey,
    Tornaria and Watkins reporting the trillion-triangle computation. (At this writing, the posted version of the paper is still an incomplete draft.)
  • Conrey, J. B., J. P. Keating, M. O. Rubinstein and N. C. Snaith. 2000. On the frequency of vanishing of quadratic twists of modular L-functions. arXiv preprint. (This paper and the one listed below discuss conjectures about the distribution of congruent numbers, among other topics.)
  • Conrey, J. Brian, Jon P. Keating, Michael O. Rubinstein and Nina C. Snaith. 2004. Random matrix theory and the Fourier coefficients of half-integral weight forms. arXiv preprint.
  • Brown, Jim. 2007. Congruent numbers and elliptic curves. PDF (Brown refers to this text as lecture notes for an undergraduate course, but in fact it is a very polished and well-organized expository article. Includes some Sage code for working with rational points of elliptic curves.)
  • Alter, Ronald. 1980. The congruent number problem. American Mathematical Monthly 87:43–45. (Tables of what was known and unknown shortly before Tunnell’s breakthrough.)

Special thanks to David Farmer for alerting me to this story in the first place and to Michael Rubinstein for help understanding what it all means.

Update 2009-10-10: In the comments, Barry Cipra points out that the quadratic equations in the Tunnell criterion necessarily have no solution whenever N is congruent to 5, 6, or 7 modulo 8. In fact all of the zero-solution points in the graph I presented above come from such N. Numbers in the residue classes 0 and 4 modulo 8 cannot be square-free. If we plot only those N that are square-free and congruent to 1, 2 or 3 modulo 8, we are left with just 53 square-free congruent-number candidates up to 1,000:

tunnell-criterion-123.png

Only the numbers in these three residue classes were included in the trillion-triangle computation.

Hey Google, gimme back my widgets!

Friday, September 11th, 2009

Sometime this morning, a web page that I visit occasionally changed its appearance from this:

google-pretty.png

to this:

google-ugly.png

The images are grabbed from Firefox on OS X; same effect in Safari. Some CSS forensics reveals what’s gone wrong here: Google has added a “height:28px” property to the style for the two buttons, and boosted their type size from 13 pixels to 16. (Type inside the search box is also given the giant billboard treatment.) Apparently the height property prevents the browser from using those cute jelly-bean widgets for the buttons.

Can we strike a bargain, Google? I’ll let you take over the world and track my every movement; I’ll sign over all my copyrights and promise never to block your advertising, if you’ll just give me my widgets back. Go ahead and Be Evil; just Don’t Be Ugly.

Or am I going to have to override you in userContent.css?

Update: Marissa Mayer, Google’s Vice President for Search Products & User Experience, explains:

For us, search has always been our focus. And, starting today, you’ll notice on our homepage and on our search results pages, our search box is growing in size. Although this is a very simple idea and an even simpler change, we’re excited about it — because it symbolizes our focus on search and because it makes our clean, minimalist homepage even easier and more fun to use. The new, larger Google search box features larger text when you type so you can see your query more clearly. It also uses a larger text size for the suggestions below the search box, making it easier to select one of the possible refinements.

For Firefox users who find the “supersized” page a nonimprovement, I offer the following quick hack:

@-moz-document domain(google.com) {
  .lsb, .gac_sb {
	font-size:13px ! important;
	height:inherit ! important;
	}

  .lst, .gac_m {
	font-size:14px ! important;
	}

}

Add these lines to your userContent.css file (see the Customizing Mozilla site for further instructions). This seems to fix the widget problem and the input-field text size, but all the drop-down gadgetry is still a mess and maybe the dropdown menu of suggestions. This will break if Google’s next whim is to change the class names “lsb,” “lst,” etc. If anyone has a cleaner solution, please pass it on.

Outnumbered

Monday, August 10th, 2009

My column in the new issue of American Scientist is about the challenges of computing with very large numbers. The column ends with an open question, which I want to restate here as an invitation to commentary.

Any system of arithmetic in which numbers occupy a fixed amount of space can accommodate only a finite set of numbers. For example, if all numbers have to fit into a 32-bit register, then there are only 232 representable numbers. So here’s the question: If our numbering system can represent only 4,294,967,296 distinct values, which 4,294,967,296 values should we choose to represent?

There is an obvious trade-off here between range and precision: To reach farther out on the number line, we have to leave wider gaps between successive representable numbers. But that’s not the only choice to be made. We must also decide whether to sprinkle the numbers uniformly across the available range or to pack them densely in some regions while spreading them thinly elsewhere. I’m not at all sure what principles to adopt in trying to identify the best distribution of numbers.

To frame the question more clearly, I’d like to introduce a series of toy number systems. In each of these systems a number is represented by a block of six bits. There are 26=64 possible six-bit patterns, and so there can be no more than 64 distinct numbers. A function f(x) maps each bit pattern x to a number. Sometimes it’s convenient to regard the bit patterns themselves as numbers, ordered in the obvious way, in which case f(x) is a function from numbers to numbers.

In the simplest case, f(x) is just the identity function, and so the bit patterns from 000000 through 111111 map into the counting numbers from 0 through 63. A slightly more general scheme allows f(x) to be a linear transformation, f(x)=ax+b, where the parameters a and b are constants. If a=1 and b=0 we again get the nonnegative counting numbers. If a=1 and b=–32, we have a range of integers from –32 to +31. With a=1/64 and b=0, the entire spectrum of numbers is packed into the interval [0,1). Numbers formed in this way are called fixed-point numbers, since there’s an implied radix point at the same position in all the numbers. By adjusting the values of a and b, we can generate fixed-point numbers to cover any range, but they are always distributed uniformly across that range.

The well-known floating-point number system, modeled on scientific notation, is a tad more complicated. In my toy version, three bits are allocated to an exponent t, which therefore ranges from 0 to 7. The other three bits specify a fractional significand s, interpreted as having values in the range 8/16, 9/16, 10/16, …, 15/16. The mapping function is defined as f(t, s)=s×2t. The 64 numbers generated in this way range from 1/2 to 120. (Real floating-point systems, such as those defined by the IEEE standard, not only have more bits but also have a more elaborate structure, with allowance for positive and negative significands and exponents and various other doodads.)

Floating-point numbers are not distributed uniformly; they are densest at the bottom of their range and grow sparse toward the outskirts of the number line. Specifically, the density of binary floating-point numbers is reduced by half at every integer power of 2. In the toy model, the gap between successive representable numbers is 1/16 in the range between 1/2 and 1, then 1/8 between 1 and 2, then 1/4 between 2 and 4, and so on. At the top of the range—beyond 64—the only numbers that exist are those divisible by 8.

The distribution of floating-point numbers describes a piecewise-linear curve, which approximates the function 2x. In other words, if you squint and turn your head sideways, you can see that floating-point numbers look a lot like base-2 logarithms. This suggests another numbering scheme: Instead of approximating logarithms, why not use the real thing? We can simply interpret a binary pattern as a logarithm and generate numbers by exponentiating. For the six-bit model, we can take the first two bits as the integer part of the logarithm (the characteristic) and the last four bits as the fractional part (the mantissa). Assuming we continue to work in base 2, the number-defining function is just f(x)=2x. The resulting system of six-bit numbers has a range from 1 to about 235. (By simple rescaling, we could make the range of these logarithmic numbers match that of the floating-point numbers, 1/2 to 120. I’ve not done so for a trivial reason: So that in the graphs presented below the two curves do not overlap and obscure each other.)

Finally, I want to mention a fourth numbering scheme, called the level-index system. Indeed, what led me to this whole topic in the first place was an encounter with some papers on level-index numbering written in the 1980s by Charles W. Clenshaw and Frank W. J. Olver. The level-index system takes a step beyond logarithms to iterated logarithms and beyond exponents to towers of exponents. The idea is easiest to explain in terms of the mapping function f(x). As with logarithmic numbers, we view the bit pattern x as the concatenation of an integer part (the level) and a fractional part (the index). Then f(x) takes the following recursive form:

level-index-eqn.png

For any level greater than 0, this formula gives rise to a tower of exponentiations. For example, the level-index number 3.25 expands as follows:

tower-of-e.png

In the six-bit model, we can dedicate two bits to the integer level and four bits to the fractional index. The result is a numbering system that covers a very wide range: from 0 to almost 400,000 in 64 steps. The distribution of representable numbers in this range is extremely nonuniform: the gap between the two smallest numbers is 1/16, whereas that between the largest numbers is more than 320,000. (Note also that we have shifted from powers to 2 to powers of e. Actually, a level-index system could be defined on any base, but e has some pleasant properties.)

The graph below compares the distribution of numbers in the four six-bit systems.

linear-curves.png

For fixed-point numbers the graph is necessarily a straight line, since the numbers are distributed at equal intervals. The floating-point graph is a jointed sequence of straight-line segments, with the slope doubling at every power of 2. The logarithmic numbers produce a smooth curve, concave-upward. (Again, this curve could be made to coincide with that of the floating-point numbers.) The level-index curve is also smooth but has an even more pronounced elbow or hockey-stick form.

Plotting the same data on log-linear scales clarifies certain details:

log-curves.png

Now it’s the logarithmic system that yields a straight-line graph. The floating-point curve approximates a parallel straight line. The fixed-point graph is concave-downward, while the level-index curve is sigmoid.

By twiddling various knobs and dials, we could create a number system that would approximate any monotonic line or curve in this space, giving us fine-grained control over the distribution of numbers. For example, on the semilogarithmic graph a straight line of any positive slope can be generated by adjusting the base of the logarithmic number system or the radix of a floating-point system; a larger base or radix yields a steeper slope. The inflection of the level-index curve can be altered in the same way, by choosing different bases. If we wished, we could interpolate between the fixed-point curve and the logarithmic curve by creating number systems in which f(x) is some polynomial, such as x2. If we wanted a flatter curve than the fixed-point system, we could build numbers around a function such as f(x)=√2.

So many ways of counting! Let me count the ways.

And so I return to the main question. On what basis should we choose a number system for a digital computer? Let’s leave aside practicalities of implementation: Suppose we could do arithmetic with equal efficiency using any of these schemes. Which of the curves should we prefer? There are plausible arguments for allocating a greater share of our numerical resources to smaller numbers, on the grounds that we spend most of our time on that part of the number line—it’s like the middle-C octave on the piano keyboard. That would seem to favor the logarithmic or floating-point system over fixed-point numbers. But can we make the argument quantitative, so that it will tell us the ideal slope for the logarithmic line, and hence the mathematically optimal base or radix? (Does Benford’s law help at all in making such a decision?)

And what about the level-index system, which skews the distribution even further, providing higher resolution in the neighborhood of 1 in exchange for a sparser population out in the boondocks? An argument in favor of this strategy is that overflow is a more serious failure than loss of precision; the choice is between a less-accurate answer and no answer at all. Level-index numbers grow so fast that they can effectively eliminate overflow: Although it’s a finite system, you can’t fall off the end of the number line.

Some years ago Nick Trefethen, a numerical analyst, wrote down some predictions for the future of scientific computing, including these remarks:

One thing that I believe will last is floating point arithmetic. Of course, the details will change, and in particular, word lengths will continue their progression from 16 to 32 to 64 to 128 bits and beyond…. But I believe the two defining features of floating point arithmetic will persist: relative rather than absolute magnitudes, and rounding of all intermediate quantities. Outside the numerical community, some people feel that floating point arithmetic is an anachronism, a 1950s kludge that is destined to be cast aside as machines become more sophisticated. Computers may have been born as number crunchers, the feeling goes, but now that they are fast enough to do arbitrary symbolic manipulations, we must move to a higher plane. In truth, however, no amount of computer power will change the fact that most numerical problems cannot be solved symbolically. You have to make approximations, and floating-point arithmetic is the best general-purpose approximation idea ever devised.

At this point, predicting the continued survival of floating-point formats seems like a safe bet, if only because of the QWERTY factor—the system is deeply entrenched, and any replacement would have to overcome a great deal of inertia. But Trefethen’s “two defining features of floating point arithmetic” are not actually unique to the floating-point system; logarithmic and level-index numbers (and doubtless others as well) also employ “relative magnitudes” and require “rounding of all intermediate quantities.” It may well be that floating point is the best general-purpose approximation idea ever devised, but I’m not convinced that all the alternatives have been given serious evaluation.

Free Pi

Two further notes. Because of space constraints, my American Scientist column appeared with a painfully truncated bibliography. I have posted an ampler list of references here.

Also, would anyone like to have 10 million digits of pi on microfiche cards? While scrounging in my files for interesting lore on large numbers, I came upon a yellow envelope in which 10,015,000 digits of pi fill up nine fiche cards. The envelope bears the notation “rec’d from Y. Kanada, Univ. of Tokyo, 1983.” The 10 million digits set a record back then, which I noted in a brief Scientific American news item. It seems a shame to destroy such a curious artifact; on the other hand, I can think of no earthly use for it.

Kanada’s group has gone on to calculate more than a trillion digits of pi, but he didn’t send me the 900,000 microfiche cards it would take to hold the listing.

The Oracle of Wolfram

Thursday, June 25th, 2009

In a comment on my earlier note about Wolfram Alpha, Daniel Asimov takes me to task for failing to explain “what Wolfram Alpha is.” I’ll accept the criticism, but I have to add that the question he raises is a real toughie. What, indeed, is Wolfram Alpha? Much of the prerelease hype (e.g., CIO, ZDNet, Telegraph) suggested it was to be some kind of search engine—a “Google killer,” or else, as Steven Levy wrote, “more like the anti-Google.” Another common theme (Infotoday, Guardian) suggests that Alpha is a manifestation of the semantic web, “that thing that Sir Tim Berners-Lee has been banging on about.”

There have been lots of other attempts to answer the ontological question. Jonathan Zittrain (via the New York Times) calls Alpha a “computable almanac.” Larry Greenemeier, writing for Scientific American, says: “Think of it as Ask Jeeves with [a] PhD.” Stephen Wolfram, the creator of Alpha, tells Rudy Rucker: “If anything, you might call it a platonic search engine, unearthing eternal truths that may never have been written down before.” Yuri Alkin, in a blog called Connections, had the wit to present the question to Alpha itself: “Who are you?” he asked. The polite reply, worthy of HAL or Commander Data, was “I am a computational knowledge engine.”

After a few weeks of sporadic poking around with Alpha, I’m finally ready to take my own shot at answering the big question. If you ask me, Wolfram Alpha is an oracle. Not an oracle in the computer-science sense—a hypothetical black box that simplifies complexity analysis by always giving the correct answer for queries of a specific form. I mean an oracle in the Greek-mythology sense—a sybil in a cave or a temple, whose responses to questions are often helpful but tend to be enigmatic and require careful interpretation. Sometimes you get just the answer you were looking for. Sometimes you get no answer at all. Sometimes the answer leaves you more perplexed than when you began.

All in all, perhaps it’s better to set aside the question of what Alpha is and ask what it can do.

It can do your homework (or your students’ homework):

Query: Limit (x^p)^(1/p) as p->0

Answer: \(\lim_{p \to 0}(x^p)^{1/p} = x\)

It can graph a function:

Query: plot sin(x)/x from -10 to 10

Answer:

sinxoverxgraph.gif

It makes a handy desk calculator:

Query: 12 choose 3

Answer: 220

Query: factor 8549176323

Answer: 3 × 127 × 22438783 (3 distinct factors)

It provides access to a rich trove of “curated” data:

Query: molecular weight vanadium dioxide

Answer: 82.9403 (grams per mole)

It offers links to “live” data, updated in real time, on topics such as the weather and financial markets:

Query: weather Buenos Aires

Answer:

weatherAlpha.png

But the big payoff of a service like this lies in combining factual queries with mathematical or algorithmic analysis. Surely that’s what a “computational knowledge engine” should be good at, no? I’ve been trying hard to make Alpha perform in this way. So far I’ve found the process pretty frustrating.

Here’s a case study. Remembering an old story about Kansas being flatter than a pancake, I submitted the query, “flattest state in the U.S.” The response was another question: “Did you mean ‘fastest state in the U.S.?’”

Well, no, I didn’t mean that, but out of curiosity I clicked the link to learn which is the fastest state in the U.S. The reply, in its entirety, was this:

fastestUSstate.png

The oracle was in deep enigma mode. I decided to go back to the “flattest” question. Let me add that I hadn’t really expected my first query to work; a ranking of states by flatness is not something you’d find in an almanac (computable or otherwise), and indeed the concept of flatness has various possible definitions. I thought I could give Alpha some help by being more explicit.

Query: All US states maximum elevation - minimum elevation

Answer: Did you mean: US states maximum elevation minimum elevation

I wasn’t quite sure how to respond here, but it doesn’t cost anything to try, so I accepted Alpha’s rephrasing of the query. What I got back was not the answer I was looking for, but it was not entirely without interest:

Query: US states maximum elevation minimum elevation

Answer:

elevationscatterplot.png

The scatterplot of highest and lowest elevations by states tipped me off that the data I’m looking for are in the system somewhere. Indeed, one of those dots in the lower left corner, with both lowest and highest elevations near zero, is probably the answer to my question (at least if we define flatness as the difference between maximum and minimum elevation). But how to identify the dot? Or, for that matter, how to identify the conspicuous outlier—the one state with a minimum elevation well above 1,000 feet?

I allowed myself to be distracted by the latter question. There are a couple of obvious guesses for the state with the highest lowest elevation, so I tried one:

Query: Colorado minimum elevation

Answer: 3314 feet

Hmm. That’s not the outlier in the scatterplot; 3300 feet is well off the chart. That means at least one state was clipped from the graph. Another query makes this more obvious:

Query: US states minimum elevation

Answer:

elevationranks.png

It appears that ranks 1 through 4 lie somewhere above the top edge of the graph. Is there some way to force Alpha to plot the complete data set, without arbitrary cropping? For some kinds of plotting, I’ve figured out how to control the range of the independent variable (see the command “plot sin(x)/x from -10 to 10″ above), but in this context I’ve not discovered the key, if there is one. And, as far as I can tell, there is no warning given when a plot is chopped.

Nevertheless, I was able to identify the four missing states. Accompanying the rank-order graph above was a helpful list, whose first entries were: Colorado 3314, Wyoming 3100, New Mexico 2844, and Utah 2001. The highest visible dot in the graph represents the fifth state in the sequence—Montana, with a minimum elevation of 1801 feet. The list gave the first five states and the last five in the ranking. This looked promising. If I could get a complete list of minimum elevations for all the states, and then the corresponding list of maximum elevations, perhaps Alpha could also give me the differences. I would ask it to alphabetize both lists, then subtract them element by element, and finally take the minimum of the result, or else sort again according to magnitude.

A button next to the truncated list of states promised “More.” I pressed it. Now I had the first 10 and the last 10 states, but I was still missing the 30 in the middle. Something else had changed as well: All the numbers were different, with the list of elevations beginning 1010, 945, 867. After a moment’s perplexity, I realized that Alpha had decided to shift from feet to meters. No matter. Three more presses of the “More” button finally got me a complete list of minimum state elevations (in meters). And the same rigmarole soon produced the analogous list of maximum elevations (again in meters).

But now I was stumped. How do I sort the list alphabetically? Can I subtract one list from another? Can I do anything to transform the output of a command? Is there any way to compose commands, so that the output of one routine becomes the input of another? Not a clue.

But perhaps I could do it the other way, slicing the salami crossways instead of longitudinally. Instead of compiling a list of maxes and a list of mins and then subtracting, I could subtract lowest point from highest point state by state and then list the results. Searching through various help files and lists of examples, I eventually came to a page on “Elevation Data,” with a subcategory “Minimum and Maximum Elevations.” And there, at the bottom of the page, was this suggested query: “Montana maximum elevation - minimum elevation.” Clicking on it gave me the result “11,007 feet.” So I could get the elevation range for a single state. All that remained was to persuade Alpha to map the same computation over all the states….

But wait. That’s where this story began, with the query “All US states maximum elevation - minimum elevation.” It didn’t work when I tried it before, and it still doesn’t work now.

I tried some minor variations in phrasing and punctuation, such as this one:

Query: (US states maximum elevation) - (US states minimum elevation)

Answer: 4341 feet

What does the number 4341 mean? A “Show Details” button led to the explanation:

medianelevations.png

Instead of subtracting the vectors element by element, the program is taking the median of each elevation list and then subtracting. (If I had wanted to do that, I wouldn’t have known how to ask for it.)

Finally, shown below in full detail is what came back after one further attempt to formulate the “flattest state” query:

albanianlekfeet.png

Who asked about Albanian currency? I guess this is what the sybil says when she’s tired of listening to all of my questions.

*   *   *

Wolfram Alpha is an ambitious project, as its makers would be the first to proclaim. Here’s what the “About” page tells us:

Wolfram|Alpha’s long-term goal is to make all systematic knowledge immediately computable and accessible to everyone. We aim to collect and curate all objective data; implement every known model, method, and algorithm; and make it possible to compute whatever can be computed about anything.

It’s hard to resist making fun of these lofty and all-encompassing aims, especially when a fairly simple geographic query returns a result expressed in units of Albanian Lek-feet. All the same, I still applaud the attempt to create such a service, and I hope that Stephen Wolfram and his colleagues achieve some reasonable fraction of their goals.

The main sticking point, it seems pretty obvious, is not in collecting and curating data or in formulating models, methods and algorithms. It’s the access part. How am I to communicate with the system? How am I to specify which bits of systematic knowledge I’d like to retrieve, and how do I tell Alpha which models, methods and algorithms to apply? For more than 50 years the answer to this question has generally been a programming language of some kind. The designers of Wolfram Alpha have deliberately turned their back on that option, in favor of a natural-language interface. I’m sure they made this choice with the best of motives, in order to reach out to a wider audience that might be intimidated by formal notation. Unfortunately, the natural-language interface is so limited that we’re effectively left with no notation at all.

In a way, talking to Wolfram Alpha is rather like communicating in a natural language—a foreign language you don’t happen to speak. With grunts and gestures and a few stray nouns you may be able to get across the most rudimentary touristic needs—”Where toilet?” or “How much?”—but if you want to carry on a real conversation, you need more vocabulary and, most of all, you need grammar. I’m skeptical that Wolfram Alpha will ever be of much use without such a linguistic structure.

Alpha bravo

Sunday, May 17th, 2009

WolframAlpha, the new whatchamacallit from Stephen Wolfram, is up and running. It’s also stumbling: I’ve gotten a fair number of error messages. But now is not the time for nitpicky griping. I just want to say bravo to the whole idea.

I applaud because Alpha invites us to use a computer for actual computing. Did you know that this machine that brings you YouTube and Facebook can also evaluate integrals and calculate correlation coefficients? What a thought!

I have ranted elsewhere about the sad decline of casual, inquisitive programming, and the lack of tools that will draw more people into that mode of exploring their world. WolframAlpha is not exactly the answer to my plea. It is a powerful computational tool cleverly disguised as a mild-mannered search engine. You’re not writing programs; you’re just “searching” for the roots of x2 + x – 1 or the limit of (1 + 1/n)n as n goes to infinity or the sine of 1050. I have a feeling that this search-engine metaphor may wear thin pretty quickly, but it may also serve its obvious purpose—breaking down barriers for those who wouldn’t dream of writing a line of Python or Lisp but who are masters of Google.

WolframAlpha is not the first thing of its kind. (My Sage friends will be quick to point out that they have had an online notebook interface up and running for a couple of years.) I’ve only just begun to explore Alpha, and I’m sure there are both treasures and pitfalls I haven’t glimpsed yet. Still, for the moment, I just want to cheer it on.

Himalayan mathematics

Monday, April 20th, 2009

Browsing through some notes I jotted down sometime in 1988, I come upon this sentence:

Mathematicians feel about computers much as the Nepalese feel about jet aircraft.

Did I read that somewhere, or did I make it up? My notes give no clue to the provenance of the thought, and Google is unhelpful.

Bits from its

Wednesday, April 15th, 2009

Sometime in the early 1980s my friend Greg Chaitin decided to go digital. He wanted to have all of his own writings, along with some other documents he particularly valued, ready at hand in machine-readable form. This was long before Postscript or PDF; scanners and OCR were exotic technology. So Greg hired a typist. I know about all this because he sent me a printout of his digitized oeuvre—an impressively thick sheaf of fanfold paper, with sprocket strips still attached. (The irony of this mode of distribution was not lost on either of us.)

I’m finally trying to catch up with Greg. Instead of hiring a typist, though, I’ve bought a cute little document scanner, which has been busily transforming heaps of moldering cellulose into shiny new bits. The scanner produces PDFs, which then run through an OCR program to build an index, making the page images searchable. (Bring on the Googlebot!) The result falls short of ideal—files are obese, graphics are low-res, there’s no opportunity to edit or reformat—but still it’s better than rummaging through cobwebby filing cabinets in my basement. And I don’t have to FedEx cartons of fanfold to my friends.

I’ll be adding links to my publications list as I upload the files. Here are a few teasers:

The HP-41C: A Literate Calculator? Byte, January 1981, pages 118-138. PDF (3.3 MB)

40,000 Points of Light. Pixel, Vol. 1, No. 2, May-June 1990, pages 36-41. [On a graphics language in which an image is defined as a locus of points.] PDF (2.4 MB)

Rank-and-File Thinking. Lotus, June 1985, pages 73-77. PDF (1.8 MB)

Theory & Practice: On the bathtub algorithm for dot-matrix holograms. Computer Language, October 1986, pages 21-32. [On patterns generated by severely oversampled signals.] PDF (4.6 MB)

The Information Age: The Numbering Crisis in World Zone 1. The Sciences, Vol. 32, No. 6, November-December 1992, pages 12-15. [On the impending shortage of telephone numbers.] PDF (1.8 MB)

I note for the record that every one of the publications mentioned in the list above is now defunct. I wonder if there’s any significance in that?

Computing in the classroom

Wednesday, April 8th, 2009

As the students file into the classroom, each of them reaches into a basket by the door and selects a ticket, on which a number is printed. For convenience, let’s assume the numbers are integers, they’re of reasonable size, and no two tickets bear the same number. Once everyone has settled down, the instructor assigns the class this exercise: Consult among yourselves and show me the ticket with the largest number.

How will the students solve this problem? Perhaps they pass all the tickets to one nimble kid in the first row, who thumbs through them looking for the maximum. This is a sequential algorithm, where the running time ought to be roughly proportional to N, the number of tickets. An obvious parallel solution is based on a tournament tree: Pairs of students compare their numbers, then the winners of each of these matches get together in pairs, and so on, until only one student remains. This routine yields an answer in about log2 N rounds of comparisons.

Here’s another idea. The students all stand up and shout out their numbers repeatedly, like commodity dealers in the trading pit. If a student hears a number larger than her own, she sits down and remains quiet thereafter. The last student standing has the largest number.

The efficiency of this shouting-match algorithm is hard to quantify. If we assume that everyone can yell and listen at the same time without confusion, then the problem of identifying the maximum number is solved instantaneously. For large N, however, that assumption is surely unrealistic.

A variant of the outcry algorithm is easier to analyze: A single student chosen at random calls out his number, and all those students with smaller numbers sit down. Then another student is selected from among those still standing, and the procedure is repeated until only one student remains. This algorithm also seems to have an expected running time of log2 N, since each round of eliminations should exclude (on average) half the students still in contention.

Suppose the students are asked to deliver the sum of the numbers, rather than the maximum. The tournament-tree computation adapts readily to this task, again requiring depth log2 N. On the other hand, I don’t know any reliable way to do addition by shouting.

What about calculating the average, or arithmetic mean, of the numbers? Of course this could be done by summing the numbers and then dividing by N—or by dividing first and then summing—but either of these strategies requires the students to count themselves before they can compute the average. Can’t they just use the tournament-tree algorithm to compute the average directly, as they did for the maximum and the sum? Suppose they replace the ‘max’ or ‘+’ operator with a new binary average operator, ‘•’, defined as (a • b) = (a + b)/2. Plugging this operator into the tree algorithm works just fine if N happens to be an exact power of 2, but otherwise it fails. In the case of N=5, for example, the computational tree might look like this:

averagetree.png

The four values on the left are combined in a regular, symmetrical tree to yield the correct average, but then the fifth value has to be inserted in an extra, ad hoc operation. Applying the binary average operator at the top node of the tree would give a result of 3.75, whereas of course the true average is 3. The underlying problem here is that the averaging operator is not associative: (a • (b • c)) ≠ ((a • b) • c). [Forgive me if all this is thunderingly obvious to everyone but me. It actually took me a while to figure it out.]

One could fix this problem by passing pairs of numbers up the tree; one member of each pair is the running average, and the other member is a weight equal to the number of numbers that went into calculating that average. But this is just the sum-and-divide procedure in disguise. Let me note that there is a very simple algorithm for approximating the average. Have the students form random pairs repeatedly; in each pair, both students replace the value on their ticket with the average of the two values (in other words, they apply the ‘•’ operator). In this procedure the students never actually calculate either N or the sum, but both of those quantities are conserved as invariants of the computation. As the random process continues, all of the tickets eventually converge on the same value, namely the overall average. Experiments suggest that about log2 N rounds of random coupling yield a reasonably good approximation.

In judging the merits of these algorithms, my criteria have more to do with style than with computational complexity. My dislike for the sum-and-divide method probably has no firmer basis than stubborn prejudice. All the same, a computer in which the processors are people does seem to call for a particular style of program. Ideally (in my view) an algorithm for the classroom computer would be fully parallel (every student gets an equal chance to play), homogeneous (all the students get the same instructions) and local (students interact in small groups, without needing to know much about the global state of the system—such as the value of N). An algorithm gets extra points if it exploits the mobility and flexible communication patterns of human computers.

So here’s another challenge for the class. As before, the students take numbered tickets, but now the instructor asks to see the ticket with the median number—the value that is larger than half the other numbers and smaller than half. (Let’s make life easy and assume that N is odd.) Here is a median-finding algorithm—a variation on a method sometimes called Quickselect—that I would like to believe a group of students might invent. First, all the students raise their right hand. Then they repeat the following series of steps until the median is identified.

  • Choose a student at random from among those who have a hand raised. This student’s ticket number is designated the pivot.
  • All students whose number is smaller than the pivot line up on the left and those with larger numbers line up on the right.
  • If the two lines are of equal length, then the pivot is the median. Stop.
  • Otherwise, all students in the shorter line lower their hand (if it’s not down already); the pivot student also lowers her hand.
  • Repeat.

The expected number of times the loop will be executed again appears to be approximately log2 N. Note that in this hand-waving algorithm the students never have to sort themselves. The method also requires no counting—not even for the comparison of line lengths. But we do assume that all the students can simultaneously compare their ticket numbers with the pivot.

The diagram below shows an example calculation. The green dots at the top of each configuration are the successive pivots; students holding lesser and greater numbers are queued up to the left and right of the pivot; blue dots represent numbers still eligible for consideration as pivots (i.e., students with raised hands); red dots are numbers that have been excluded because at some stage of the computation they were in the shorter queue or were pivots that turned out not to be the median.

mediandiagram.png

One could dream up plenty of other exercises to keep the students busy. They could search for subsets of numbers with special properties, such as Pythagorean triples. They could be asked to identify the longest sequence of consecutive integers in the set of numbers they hold. And then there’s the balanced subset-sum problem: The students divide themselves in two groups in such a way that the sums of the numbers in the groups are equal, or as close as possible to equal. (This is an NP-complete problem, but many instances yield easily to the right heuristics.)

The various marching-band patterns and parliamentary shouting matches described above are my own fantasies about how groups of people might “enact” algorithms. I’m curious to know how real students, in a real classroom setting, would actually go about solving such problems. Would they adopt the kinds of local and homogeneous algorithms I find aesthetically pleasing? Or would they favor a simpler and more pragmatic approach? In some cases I suspect that the ruse of passing all the tickets to the class whiz might well be fastest and most reliable way to get an answer. Moreover, for any other approach the time needed for the students to choose an algorithm and organize themselves may well exceed the execution time. Indeed, the very process of coming to agreement on how to solve a problem calls for significant group dynamics; this is not something we see in other kinds of parallel computers.

Another interesting point about people as processors is that their capabilities are not very sharply defined. For example, most algorithms assume that comparisons are strictly pairwise; a predicate such as ‘>’ or ‘=’ is applied to exactly two numbers, and the efficiency of the algorithm is measured by counting those operations. But it seems likely that three or four students could huddle together and find the maximum of their numbers about as quickly as two students could. Thus the complexity of some algorithms might be reduced to log3 N or log4 N.

On the other hand, I’ve assumed the existence of some primitive operations that may be difficult to realize. In particular, the axiom of random choice seems a little dubious in this context. If a group of students is asked to choose one their members at random, how do they go about it? Can they always do it in unit time?

There’s also the question of how such algorithms scale as N increases. A scheme that works well in a classroom with 30 students might not be ideal in a lecture hall with 300. And surely the process would go very differently in a stadium holding 30,000.

I assume that many experiments of this kind have been tried over the years, although I haven’t found much literature on the subject. (Perhaps the closest match is a 1994 article by Bachelis, James, Maxim and Stout; they also note the scarcity of published material.) If anyone has reports from the field, I’d be delighted to learn about them.

Update 2009-04-14: Thanks to the commenters for some particularly helpful and lucid responses. I would like to explore one of these themes a little further.

Mario Klingemann’s “small optimization” to the outcry algorithm is more than just a minor improvement; it corrects an outright bug in the procedure I described. Klingemann notes that when a student calls out a number and all the students with smaller numbers sit down, the calling student must also sit unless no one else remains standing. This is important! Consider the difference it makes when N=2.

So which version of the algorithm—my buggy one or Klingemann’s correct one—matches up with JeffE’s analytic result showing that the expected number of rounds is the Nth harmonic number? In the graph below the harmonic numbers H(N) are represented by the red line; the green and blue lines are empirical results averaged over 100,000 repetitions of programs implementing the two algorithms.

outcry.png

For a given N, the Klingemann result is H(N) – 1/2; the buggy algorithm yields H(N) + 1 (asymptotically, for large N).

Wrong number

Wednesday, February 25th, 2009

Is it just me, or does it happen to everyone? When I try to correct someone else’s arithmetic, I can count on making an error of my own. For instance: I was reviewing William Goldbloom Bloch’s clever and provocative book The Unimaginable Mathematics of Borges’ Library of Babel. I noted in passing a numerical error, trivial in significance even though it’s enormous in magnitude. Bloch had occasion to introduce the number:

blochnumber.png

To give a sense of this number’s size, he mentioned that it has 33 million digits. I pointed out that the true digit count is exponentially larger: 1033,013,740. Now I’ve received a correction to my correction, in a note from Michael Rosenthal:

The author [of the book] stated 33 million digits as the length, but Hayes getting closer states that the length is (10 to the 33,013,730). Isn’t the actual number of digits ((10 to the 33,013,730) + 1)?

Of course Rosenthal is right. There are 10n n-digit decimal numbers, but 10n is not one of them. On the other hand, Rosenthal also gives a double-barrelled demonstration of my point about the perils of correcting other people’s arithmetic: In the course of setting me straight, he makes a typographical slip of his own, twice mistranscribing 33,013,740.

My correspondence with Rosenthal provoked me to take a closer look at where that enormous number came from in the first place. This has led me into some diverting adventures with factorials and logarithms. But it has also put me in the position of proposing another correction to a published number, which means I am in grave danger of making still more mistakes.

Bloch’s book is a mathematical companion to “The Library of Babel,” a brief ficción by the Argentinian fabulist Jorge Luis Borges. In the Library of Babel all books are written in an alphabet of just 25 symbols. Each book has 410 pages, and each page has 40 lines of 80 characters, for a total of 1,312,000 characters per volume. We are told that the library possesses one copy of every possible volume composed in this way. Thus the number of volumes on the library’s shelves is 251,312,000. For convenience of reference let us call this number B (for Book, or Babel, or Borges, or Bloch).

The much larger number mentioned in my review enters Bloch’s story when he asks how many ways we might arrange the books on the library shelves. The answer, of course, is the factorial of B. If we imagine that the shelves have discrete slots that hold one book each, then we have B choices for the first slot, B–1 choices for the second slot, B–2 for the third, and so on, until we come to the final slot where there’s just one book left to choose. The product of all these numbers is B!.

So what sort of a number is B!? Can we have a look at it? In this age of computational wonders, surely we can just fire up Sage or Mathematica, or write a four-liner in Lisp, and have the digits of that number served to us on a silver platter, no?

No. We can’t.

After three decades of playing with computers, I am still childishly agog at the ease with which I can tap a few keys and calculate a gargantuan quantity such as 52!, the number of permutations of a standard deck of playing cards:

8065817517094387857166063685640376
6975289505440883277824000000000000

I can even calculate 10,000!, unleashing a Niagara of digits that spill out onto my screen in a long gray torrent—some 35,660 of them by the time the flood abates. (Digression: How many of those digits are trailing zeros? This is a question Fred Gruenberger would always ask when he talked about factorials in his old Popular Computing newsletter, back in the days when computing that many digits took a bit of doing.)

Computing has come a long way, but even with so much numerical horsepower at our fingertips, the direct calculation of B! is still totally unthinkable. If Bloch had been right about the 33 million digits, the computation might be feasible, if tedious, but computing a number with 1033,013,740 digits is just ridiculous, with or without the extra digit I forgot.

If we can’t compute B! outright, maybe we can at least get its logarithm. For a rough estimate of the magnitude—in other words, for counting the digits—the realm of logarithms is the right place to be. Since the factorial of n is a product,

factorial.png

the logarithm of the factorial has to be a sum:

logfactorial.png

I don’t know a quick and easy way to perform this summation, but I do know how to evaluate a slightly different sum, one where every term on the right hand side of the equation is log n. Since there are n of these terms, the sum is obviously just n log n. What we have calculated in this way is the logarithm of nn. Thus nn can be taken as a crude approximation—a very sloppy upper bound—to the value of n!. For our particular case we have log BB = B log B. When I plug in the numerical value of B, and use base-10 logarithms, I come up with

logbtob.png

Exponentiating both sides gives this answer:

btob.png

Hmm. There’s a problem here. This is supposed to be a loose upper bound, but it’s smaller than the value we were expecting for B!. Clearly, BB can’t be less than B!.

Bloch reports that he came to his estimate of B! via Stirling’s approximation. If you look up this bit of mathematical technology, you’ll find a formula something like this:

stirlingformula.png

It would be easy to translate the formula directly into computer code, but it’s pointless to do so if the aim is to calculate B!. Again the only practical way to proceed is to take logarithms of both sides. Then the product in the formula becomes a sum of three terms. The first term, log nn, we already know: It’s n log n. The second term, log e–n, is simply –n if we choose to work with natural logarithms; if we want to stick with base-10 logs, it’s –n/log 10. And the third term we don’t need to worry about; at the level of precision we can hope to achieve in these calculations, ½ log 2πn is too small to notice. So, keeping just the first two terms, we have a rough-and-ready approximation to Stirling’s approximation, which ought to work well enough for very large n. B certainly seems to qualify as a large n; when I plug in its value, here’s what I get:

blogbminusb.png

I have plodded my way slowly and methodically through the steps of this computation in an effort to persuade myself that I might finally have it right. At this point I’m fairly sure that log B! is closer to 101,834,103 than it is to 1033,013,740, but I’m chary of making a stronger claim. With all my crossing back and forth between the safe world of logarithms and the uninhabitable wasteland of exponentials, I worry that I’ve misplaced a factor of e or, worse, mistaken the logarithm of a number for the number itself. If I’ve goofed again somewhere in this calculation, I trust that readers will let me know.

I want to emphasize that my aim in telling this story is certainly not to embarrass Bloch, whose work I admire. His book does a splendid job of explaining ideas in combinatorics, number theory and even topology, in a literary context where such themes don’t get much exposure. His little numerical error—if it is an error—has no impact on the meaning of the passage where it appears.

Indeed, it’s hard to imagine a circumstance where the difference between the two proposed values of B! could have any earthly consequence. Both numbers are always going to be ghostly presences in our universe—actors whose voice we hear from offstage but who never show their face under the lights. A certain breed of radical constructivist would argue that the numbers we’re talking about don’t exist at all. I don’t find that doctrine helpful or persuasive, but I do think there’s a distinction worth making between numbers we can actually hold in our hands and these strange, outsized objects that come equipped with their own exclamation points. Computing with numbers this large is like working with toxic or radioactive materials. You never get to touch anything directly; all manipulations are performed by remote control from behind a blast shield. The experience can be exciting, but afterwards you don’t come away with a sense that you really know what you’re doing.

Fred Gruenberger would have wanted to know how many trailing zeros there are in B!. Can we answer?