Archive for May, 2007

Easy as abc

Thursday, May 24th, 2007

It all starts with the equation a + b = c, which looks straightforward enough. Assume that a, b and c are positive integers that have no divisors in common other than 1; for example, the triple {1, 2, 3} meets this condition, and so does {4, 5, 9}.

Now let’s take the product of the three numbers abc, list all the prime factors of this product, and cast out any duplicates, so that each prime in the list appears just once. The product of these unique primes is called the radical of abc; it is necessarily a “square-free” number, one that cannot be divided by any perfect square. For the triple {1, 2, 3} the product is 1 × 2 × 3 = 6 and the prime factor list is (2, 3); since this list has no duplicates, the radical rad(6) is 6. In the case of {4, 5, 9}, the product is 4 × 5 × 9 = 180 and the factor list is (2, 2, 3, 3, 5). Removing the duplicated 2s and 3s leaves the unique factor list (2, 3, 5), so that rad(180) = 30.

Note that in both of these examples, rad(abc) > c. Can it ever happen otherwise? Can rad(abc) be less than c? Yes, it can. The triple {5, 27, 32} has the product 5 × 27 × 32 = 4320; it’s easy to see that the unique primes in this product are again (2, 3, 5), and so rad(4320) = 30, which is less than c = 32. Triples with this property are called abc-hits. There are infinitely many of them, and yet they are quite rare. Among all abc triples with c ≤ 10000, there are just 120 abc-hits.

If rad(abc) can be less than c, just how much less? Can we find an example where rad(abc) is so much smaller that the square of rad(abc) is also less than c? No one knows of such a triple, but there are some examples where [rad(abc)]α < c for some exponent α greater than 1 though less than 2. Take the triple {2, 6436341, 6436343}. Here b is equal to 310 × 109 and c is equal to 235, and so rad(abc) = 2 × 3 × 23 × 109 = 15042. And 15042α < 6436343 for any α < 1.6299.

So who cares? Number theorists, that’s who.

In 1985 Joseph Oesterlé of the University of Paris and David W. Masser of the University of Basel conjectured that there are only finitely many such exceptional triples. Given a positive number ε that can be made arbitrarily small, [rad(abc)]1 + ε can be less than c only in a finite number of cases (which will depend on ε). That’s the abc conjecture. Although it may seem like a rather arbitrary game with numbers, if it turns out to be true, there’s a long list of important consequences.

Why am I going on about this just now? A friend has passed along a rumor that we may soon hear big news about the abc conjecture.

Some resources:

Update 2007-05-26: Time to lift the cloak of mystery. At a conference held last week at Columbia University, Lucien Szpiro of the City University of New York gave the last talk on the last day, and quietly let it be known that he had proved at least part of the abc conjecture. I’m grateful to Kevin O’Bryant for giving me an early heads-up on this news. There’s now a little more on Peter Woit’s blog.

Amazon poker

Thursday, May 10th, 2007

Investors are constantly checking the stock ticker, gamblers check the point spread, and everybody is forever checking their e-mail. For a writerly type like me, however, the unshakeable obsession is checking my Amazon sales rank. Amazon.com calculates a sales rank for every book listed on its Web site, and updates the ranking hourly. Here’s a graph of the hourly fluctuations in the ranking of my book Infrastructure: A Field Guide to the Industrial Landscape over the course of a single day last week:

Rankforest graph of Infrastructure sales rank 2007-05-03

At any given moment the current rank is listed in the “Product Details” section of the Amazon page for both the hardcover and the paperback editions. The graph above comes from a service called Rankforest, which also tracks both the hardcover and the paperback versions.

From an author’s point of view, there’s a lot of mystery in these numbers. How are the hourly rankings calculated, and what (if anything) do they mean? How do the rankings correlate with actual sales of the book? Amazon is not telling, and so authors and other interested parties have been left to speculate and experiment. The obvious experiment is to order a copy of the book and observe the effect of this purchase on the ranking. I welcome such experimentation with my book. (I would prefer that you do it with the hardcover edition, for which I get a more generous royalty.)

Morris Rosenthal of Foner Books seems to be the leading scryer of signs in this field. The interpretation of the rankings has also been discussed by Chris Anderson, the editor of Wired, in an article, a blog, and a book (current Amazon rank = 463, way above mine, dammit). It’s clear that the hourly ranking cannot simply reflect the number of copies sold within the past hour. After all, in any given hour the vast majority of books sell zero copies, and so there would be a gigantic tie for last place. The rankings also can’t be based in any simple way on total sales since publication, because the standings are much too volatile. The received wisdom is that each sale produces an uptick, followed by an exponential decay until the next sale.

I find all these speculations fascinating, but they are not what I want to write about today. My topic is even more trivial. I’ve been tracking my Amazon ranking for more than a year and a half now, ever since the hardcover edition was first listed in September 2005. (The paperback came out about a year later.) Here’s what the trend looks like:

Amazon rankings of Infrastructure since publication

The graph records the highest (i.e., numerically smallest) rank noted on each day, with rare gaps when I happened to be offline all day. Without question it would be fairer to use the daily average rather than the daily peak. But an author’s ego is a fragile thing, and so I have gone out of my way to make the outlook as rosy as I could.

Here are some of the actual numbers I recorded, for the month of April 2007:

   date          HB      PB
2007-04-01     138263   47878
2007-04-02     192152   22728
2007-04-03      29146   41862
2007-04-04      73628   57155
2007-04-05      37172   16948
2007-04-06     127858   12779
2007-04-07     171363   25212
2007-04-08      55256   20770
2007-04-09
2007-04-10      36333    7121
2007-04-11     112423   19015
2007-04-12     183063   40015
2007-04-13     225457   46781
2007-04-14     239879   29259
2007-04-15     142252   16030
2007-04-16      24803   39200
2007-04-17      93485   18939
2007-04-18     173691   26434
2007-04-19     217440   19360
2007-04-20      44426   17276
2007-04-21     213765   26652
2007-04-22      39014   21699
2007-04-23     150012   18598
2007-04-24      61301   46268
2007-04-25      33800   22474
2007-04-26      10603   39335
2007-04-27      19746   15984
2007-04-28      14027   19324
2007-04-29      16527   25999
2007-04-30       5844   63439

Notice anything out of the ordinary? What I’m looking at is not the overall pattern of rising and falling magnitudes but rather the inner patterns of digits within the numbers. As I’ve been writing down these rankings over the past 20 months, I have had the persistent impression of a peculiar overabundance of repeated digits. Just in this small sample we have 47878, 22728, 192152, 57155, 25212, 36333, 44426, 33800, 39335, 25999, and lots more. Is there something going on here? Is Jeff Bezos broadcasting secret signals hidden in the digit patterns of the Amazon sales rankings?

I consider myself a reasonably sophisticated probabilist. I know I’m not supposed to be shocked when I bring together 23 people and find that two of them share a birthday. And I know that when people try to generate a random sequence of digits by plucking numbers out of their imagination, the result is almost always too homogeneous, with a deficiency of repetitions and other patterns of the kind I’m calling attention to. Thus I was prepared to believe that the patterns I perceived were conjured up out of nothing—phantom regularities in purely random data. Still, day after day, I would note numbers like 55256 and 20770 (which in fact appeared on the same day, one for the hardcover, one for the paperback), and wonder about the odds of such coincidences. Maybe I wasn’t just letting my imagination run away with me.

Finally, this past weekend, I could stand it no longer; I had to find out.

I’m going to have to give away the punchline right now, lest it come as a disappointment later: There is nothing unusual about those numbers. The distribution of digits—the number of pairs and triples and what-not—is well within the range expected for randomly generated numbers of the same size. In other words, I have confirmed the null hypothesis. Often, such a negative result is considered unpublishable, but this one cost me a fair amount of effort, and so I’m determined to get a blog item out of it for better or worse. If the conclusion itself is not very interesting, perhaps I can find something to say about the techniques and technologies that led to it.

How do you decide whether a number like 36333 is too unusual to be a product of random processes? I decided to analyze the numbers in my sample as if they were poker hands (with the rules of poker adapted to a deck of cards with ten ranks and no suits). I confined my attention to the five-digit numbers, which are the most numerous in my sample; of the 850 rankings I had recorded, 458 were in the range between 10,000 and 99,999. Then I wrote a five-line program that takes any such number and classifies it as one of seven types of poker hands:

  • five of a kind (e.g., 77777)
  • four of a kind (36333, 11119)
  • full house (44555, 28288)
  • three of a kind (57155, 20900)
  • two pairs (97097, 28002)
  • one pair (36739, 14912)
  • bust (53208, 16897)

I decided to ignore straights; the poker hand 56789 is simply a “bust” according to this scheme of classification. The concept of a flush—all cards of the same suit—doesn’t arise, since there are no suits.

When I ran my little program over the 458 five-digit Amazon sales rankings, here’s what I got:

   hand            number       frequency

five of a kind        0          0.0000
four of a kind        3          0.0066
full house            7          0.0153
three of a kind      34          0.0742
two pairs            51          0.1114
one pair            233          0.5087
bust                130          0.2838

   TOTAL            458          1.0000

The question now, of course, is what I should expect to see in such a data set, on the hypothesis that the digit patterns are random rather than contrived for some secret, nefarious purpose. Should I find it remarkable that half of the hands have a single pair, or that more than 70 percent have a pair or better? What about those seven full-house hands—should that abundance arouse suspicion? To begin answering questions like these, we need to calculate the probabilities of the various hands.

Having just boasted of my sophistication as a probabilist, I must now confess that the main thing I’ve learned about calculating probabilities is that getting the right answer is highly improbable. I understand the principle of the thing. It’s just a matter of counting. You count the “success” cases and divide by the total number of cases. But counting—even though it tends to come early in the mathematical curriculum—is not always easy.

In the Amazon poker problem, the first trap for the unwary is in counting the total number of cases. You might think that with five decimal digits, there would be 105 possible arrangements, but in fact 104 of those arrangements are excluded from the sample, because numbers in this context cannot have a leading digit of zero. Thus the denominator in the probability calculation will be 90,000 rather than 100,000.

I think I can trust myself to calculate the probability of a five-of-a-kind hand. The first card dealt must not be a zero but can be any of the other nine digits; thereafter, each subsequent digit must be identical to the first one. Thus the number of ways of forming a five-of-a-kind hand is 9×1×1×1×1, and the probability of this outcome is 9/90,000, or 0.0001. You can expect five of a kind in one hand out of every 10,000, if the deal is fair.

I believe this answer is correct, and I am proud of having obtained it; on the other hand, the five-of-a-kind calculation is by far the easiest case. Allow me to try to work out a harder problem: the odds of a full house. I invite you to listen in on my so-called thought process:

Well, a full house is a hand that matches the pattern aaabb. The first digit can be anything but zero, and so there are nine possibilities, but then the second and third digits have to match the first. The fourth digit must differ from the first three, and so there are eight candidates left…. No, wait…. This time zero is allowed, and so there are nine possibilities again. Then the fifth digit has to be identical to the fourth. That gives us the product 9×1×1×9×1, or 81 successes out of 90,000 total cases, for a probability of 0.0009…. Did I get that right?… Of course not. What I’ve calculated is the probability of seeing the pattern aaabb in that precise sequence; I am counting occurrences of 11100, 11122, 11133, …, 99988, but I am not including sequences such as 23233 or 45454. Okay. So one approach, now that I have the number of aaabb sequences, is to multiply by the number of ways of permuting that sequence. We can just take the aaa and intercalate the two b’s in all possible positions: bbaaa, babaa, baaba,…. No, wait…. The b could be a zero, and so it’s not always allowed to appear in the first position. Hmmm. This is not going well. For the time being, let’s forget about the prohibition of leading zeros, and we’ll correct for it later. That way we can consider all possible permutations of aaabb. As a first step, take aaa, where a can be any of the ten decimal digits, and place a b is all possible positions, allowing b to assume any value that differs from a, so that there are nine choices. There are four places to put the b: baaa, abaa, aaba and aaab. Now, for each of these four sequences, the second b can be placed in any of five positions, and so there are 4×5 = 20 permutations overall. Which means that the total number of full houses is…. Hold on. No, no, no, no, no. Not all those 20 permutations are distinguishable. If we start with baaa and insert a b in either the first or the second slot, we wind up with bbaaa in either case; this sequence should not be counted twice. So how many permutations are there, really? Offhand, I don’t see any way to count them that’s easier than direct enumeration: bbaaa, babaa, baaba, baaab, abbaa, ababa, abaab, aabba, aabab, aaabb. That’s ten cases. Let’s sum up. We know that a can take any of ten values and b has nine possible values, and there are ten ways of arranging three a’s and two b’s. Thus the total number of combinations is 10×9×10 = 900. But, don’t forget, now we have to subtract away all those sequences that start with a zero. For this purpose it doesn’t matter whether the first symbol is an a or a b; exactly 10 percent of the sequences will have a leading zero. Thus the number of full houses is 900–90=810. This gives a probability of 810/90,000 = 0.009.

I happen to know, from an independent calculation, that this answer is correct, and so it’s time to stop. If I didn’t know, however, I might well go on to ask whether we need to interchange the a’s and b’s—that is, consider the case of sequences aaabb where a has only nine allowed values and b has ten. (Why don’t we have to take that into account?)

I am mildly embarrassed to put this lurching, stumbling, caricature of a probability calculation on public exhibition, and yet it is a fair description of how I often struggle with a problem of this kind. Am I the only one who suffers so? Not everyone does. I know people who could carry out the same computation quite deftly; they command the intuition, the spürkraft, to zero in immediately on the right approach, like a chess player who doesn’t waste time considering fruitless moves. I admire that kind of finesse, but I don’t possess it.

On the other hand, I know something else. I know another way to solve the problem, with less fuss. As I mentioned above, I have already cooked up a little program that can take any five-digit Amazon rank and classify it into one of the seven categories of poker hands. In milliseconds I can run that program on all possible five-digit numbers; after all, there are only 90,000 of them, and it’s quite easy to generate all of them in sequence. Then I can just count the number of hands in each category, and all the probabilities come tumbling out in a neatly formatted table:

   hand            number       frequency

five of a kind        9          0.0001
four of a kind      405          0.0045
full house          810          0.0090
three of a kind    6480          0.0720
two pairs          9720          0.1080
one pair          45360          0.5040
bust              27216          0.3024

   TOTAL          90000          1.0000

If Fermat or Pascal or the Bernoullis were asked their opinion on this approach to probability, something tells me they would find it distasteful. I’m ambivalent myself. Returning to the chess analogy, this is the equivalent of the machine that beats a grandmaster by brute force, scanning millions of positions but knowing nothing of strategy. You can win that way, but you don’t get any style points. More important, the method is frustratingly opaque. I get the answers, and I have reasonable confidence that they’re correct, but the computation gives me no understanding of why they’re correct. Still another objection is that the method does not scale well. If my Amazon rankings had ten digits instead of five (perish the thought!), I’d have a hard time classifying the nine billion possible hands. (But then again I’m not sure the more analytic method scales all that well either.)

Setting aside these qualms, we can now take the predictions of theory and the observations of the Amazon sample and compare them side by side:

   hand            prediction       observation

five of a kind        0.0001          0.0000
four of a kind        0.0045          0.0066
full house            0.0090          0.0153
three of a kind       0.0720          0.0742
two pairs             0.1080          0.1114
one pair              0.5040          0.5087
bust                  0.3024          0.2838

   TOTAL              1.0000          1.0000

Some of these frequencies match quite closely (three of a kind, one pair); others are a bit off the mark (full house, bust). In a finite sample, of course, you would never expect an exact match to the theoretical frequencies—but are the discrepancies we’re seeing significant or not? My personal instinct says that there’s nothing amiss here, that the observed frequencies are consistent with the null hypothesis. In other words, the rankings could just as well be random numbers. The old rule of thumb that the variation should be less than the square root of the observation leads to the same conclusion. We could quantify these intuitions by calculating variances or standard deviations, doing a Χ2 test, and so on. But if I can barely calculate a simple probability, can I be trusted to navigate all those treacherous subtleties such as choosing the correct number of degrees of freedom?

Again there’s another way to go about it, relying on lots of ignorant computation to replace a little smart mathematics. The question we want to answer is this: Given a set of 458 randomly generated five-digit numbers, what is the probability that the random set will differ from the predicted frequencies by at least as much as the observed Amazon set? Suppose we measure distance from the theoretical prediction in terms of the sum of the squared differences:

S^2=\\sum_{i=1}^7(y_i-\\bar{y}_i)^2

where the index i ranges over the seven types of poker hand, and the expression in parentheses is the difference between the observed and predicted frequency for each type of hand. (Technical note: The seven numbers entering into this statistic are not independent. For example, if full-house hands are in surfeit, there has to be a compensating deficiency somewhere else. How much should I worry about this?) For the actual Amazon results, S2 works out to 4.267×10–4. Now we can generate lots of batches of random Amazon poker hands, each batch consisting of 458 five-digit numbers, and calculate S2 for each batch. What proportion of them will have an S2 value exceeding 4.267×10–4? The answer, based on half a million batches, is 80 percent. Thus all the anomalies I thought I was seeing in those numbers are pure delusion.

Needless to say, I was rooting for another outcome. I would have enjoyed finding something spooky and inexplicable in the Amazon rankings. Instead, all I’ve proved is that my intuition about what random numbers look like is not to be trusted. This is a disappointment, but maybe I can salvage something from the ruins. Perhaps I can claim the discovery that the Amazon rankings are a fairly good source of random numbers.

Large-scale differences and movements in the rankings are surely not random. It’s not purely a matter of chance that my book’s current rank is 43,888 (what an interesting number!), while some preposterous tale about a pubescent wizard occupies the top of the list—even though that book hasn’t actually been published yet. (Do I sound bitter?) The rankings are nonrandom in another way as well: The first digits are not uniformly distributed but have a Benford or Zipf distribution, with an excess of ones and a shortage of nines. It’s interesting that even though the randomly generated numbers do not share this property—the first digits have a uniform distribution over the range 1 through 9—the poker-hand analysis shows that the frequency of pairs, triples, and other patterns is identical in the two data sets. Thus the digits to the right of the first digit do seem to have the statistical properties of random numbers. The source of the randomness is presumably the hourly reshuffling of the rankings by the actions of thousands of Amazon shoppers—actions that are all too predictable in the aggregate (curse you, Harry Potter) but quite random in detail.

Perhaps it’s worth noting that when the RAND Corporation prepared their famous book A Million Random Digits with 100,000 Normal Deviates in 1955, they also employed the poker test as a measure of randomness. I’m relieved to find that their calculation of the theoretical frequency of the seven types of hands agrees with mine. (They don’t say how they performed the calculation.) The current Amazon sales rank for the book is 2,668,928 (hardcover) and 281,270 (paperback).

Welcome MAA readers

Tuesday, May 1st, 2007

I am pleased and proud to note that the previous posting in this space (A mathematical fable previsited) is doing double duty as a column for MAA Online, the Web presence of the Mathematical Association of America.

Just so I can’t be accused of favoritism, I’ll also mention the American Mathematical Society, where Tony Phillips’s Math in the Media column discusses another item of mine.

A mathematical fable previsited

Tuesday, May 1st, 2007

A year ago, in my American Scientist column, I was deconstructing the oft-told tale of Carl Friedrich Gauss and his boyhood triumph over a despotic schoolmaster. As the story goes, Gauss applied a bit of mathematical ingenuity to quickly solve a problem that the teacher had thought would keep the class busy for quite some time. I collected a hundred-and-some versions of this anecdote and tried to learn something about where the story came from in the first place and how it had evolved over a century and a half of telling and retelling. I remain on the lookout for new versions of the Gauss story, and I’m even more eager to find old versions. Lately I’ve had some success on both fronts.

First, here is the latest retelling I’ve come across. It can serve as a refresher for readers who may have forgotten some of the details. The passage comes from Ian Stewart’s new book, Why Beauty Is Truth: A History of Symmetry (Basic Books):

Gauss showed his talents early. At the age of three, he was watching his father, at that point a foreman in charge of a gang of laborers, handing out the weekly wages. Noticing a mistake in the arithmetic, the boy pointed it out to the amazed Gebhard. No one had taught the child numbers. He had taught himself.

A few years later, a schoolmaster named J. G. Büttner set Gauss’s class a task that was intended to occupy them for a good few hours, giving the teacher a well-earned rest. We don’t know the exact question, but it was something very similar to this: add up all the numbers from 1 to 100. Most likely, the numbers were not as nice as that, but there was a hidden pattern to them: they formed an arithmetic progression, meaning that the difference between any two consecutive numbers was always the same. There is a simple but not particularly obvious trick for adding the numbers in an arithmetic progression, but the class had not been taught it, so they had to laboriously add the numbers one at a time.

At least, that’s what Büttner expected. He instructed his pupils that as soon as they had finished the assignment, they should place their slate, with the answer, on his desk. While his fellow students sat scribbling things like

1+2 = 3
3+3 = 6
6+4 = 10

with the inevitable mistake

10+5 = 14

and running out of space to write in, Gauss thought for a moment, chalked one number on his slate, walked up to the teacher, and slapped the slate face down on the desk.

“There it lies,” he said, went back to his desk, and sat down.

At the end of the lesson, when the teacher collected all the slates, precisely one had the correct answer: Gauss’s.

Again, we don’t know exactly how Gauss’s mind worked, but we can come up with a plausible reconstruction. In all likelihood, Gauss had already thought about sums of that kind and had spotted a useful trick. (If not, he was entirely capable of inventing one on the spot.) An easy way to find the answer is to group the numbers in pairs: 1 and 100, 2 and 99, 3 and 98, and so on, all the way to 50 and 51. Every number from 1 to 100 occurs exactly once in some pair, so the sum of all those numbers is the sum of all the pairs. But each pair adds up to 101. And there are 50 pairs. So the total is 50 × 101 = 5050. This (or some equivalent) is what he chalked on his slate.

Stewart’s account includes most of the features that are now standard in tellings of this tale: the “busywork” assignment of summing the series 1+2+3+…+100, the trick of grouping the numbers in pairs, the ritual of piling up the slates on a table, and young Gauss’s bold declaration “There it lies.” As far as I know, all versions of the story derive ultimately from a single textual source, a memorial essay published in 1856 (a year after Gauss’s death) by Wolfgang Sartorius, Baron von Waltershausen, a colleague and friend of Gauss in his later years. But some of the most familiar elements of the modern narrative are missing from that first document. In particular, Sartorius never mentions the numbers 1 through 100, or any other specific series, and he gives no hint of how Gauss solved the problem. These details (and many others) have been filled in by later writers. (Ian Stewart, I would note, is careful to indicate that some parts of the story are conjectural; other authors are not so scrupulous.)

When I began looking into the history of this bit of mathematical lore, I was especially curious about the origin of the 1-to-100 example, and of the “pairing” method of explaining the summation shortcut. In the literature I surveyed last year, both of these ideas make their earliest appearance in a 1938 book by Ludwig Bieberbach, Carl Friedrich Gauß: Ein Deutsches Gelehrtenleben. But there were a few earlier works I had not been able to consult, and so I couldn’t be sure that Bieberbach was really the primary source. I’ve now tracked down two of those earlier tellings of the story. I won’t prolong your suspense: It turns out Bieberbach was not the first to mention 1+2+3+…+100. I have discovered an earlier appearance.

At the Brown University library I found a copy of a biographical sketch by Friedrich August Theodor Winnecke, Gauss: Ein Umriss seines Lebens und Wirkens. This was published in 1877, on the occasion of the 100th anniversary of Gauss’s birth. The passage dealing with the Büttner incident (German text here) closely follows Sartorius (who is acknowledged as the source). I note only one significant departure. Where Sartorius describes the assignment merely as “the summation of an arithmetic series,” Winnecke elaborates:

. . . die Summation einer arithmetischen Reihe, für deren Ausführung die Arithmetik eine sehr einfache, rasch zum Ziel führende Weise lehrt.

If I understand this correctly—and I would welcome help with translation—it means something like, “. . . the summation of an arithmetic series, for whose implementation arithmetic offers a very simple and quick means of reaching the goal.” Evidently Winnecke wanted to make clear that the shortcut was already well known—if not to schoolchildren then at least to their teachers. Sartorius says nothing one way or the other on this issue; later writers are sharply divided about whether or not Büttner already knew the method or learned it from his young pupil.

The second text that I have belatedly located is a pamphlet, Karl Friedrich Gauss, von Franz Mathé, published in 1906 as part of a series of portraits of masters in various fields. I found a copy, sadly rather tattered, at the New York Public Library. The German text of Mathé’s version of the schoolroom anecdote is here. What follows is my tentative attempt at a translation of the crucial passage. (Again, corrections and improvements are welcome.)

The reputable schoolmaster Büttner, who taught the arithmetic class, once gave the assignment of writing down and adding all the numbers in sequence from 100 down to 1. The problem had hardly been stated when young Gauss wrote the answer on his slate, put it on the table, and declared in his Braunschweig dialect, “There it lies.” The teacher, whip in hand, cast sardonic and pitying glances at the little smart-aleck who dared claim he had a solution that the rest of the class was still bitterly sweating to produce. However, the examination of the slates revealed that young Gauss alone had supplied the correct result. Furthermore, he was able to show the teacher how he had arrived at the solution. He explained: 100 plus 1 is 101; 99 plus 2 is 101; likewise 98 plus 3 is 101, and so on. Thus one gets as many times 101 as there are “pairs” formed from one hundred numbers. The result is therefore 50 × 101, or 5,050.

Clearly, we have a new precedent for the 1-to-100 series. Was Mathé the source from whom Bieberbach and so many later writers (all the way down to Ian Stewart today) obtained this example? Possibly, but the textual evidence is not compelling. The only phrases in Bieberbach that closely resemble Mathé’s words are passages that they both borrowed from Sartorius. And note that Mathé describes the series somewhat curiously as “100 down to 1” (“alle Zahlen der Reihe nach von 100 herab bis zur 1”), whereas Bieberbach, like all later authors, counts up from 1 to 100 (“die Zahlen von eins bis hundert”). Also, whereas Mathé has young Gauss showing his teacher the trick for summing an arithmetic progression, Bieberbach states clearly that Büttner already knew the method. We can’t rule out the possibility that Bieberbach had Mathé’s essay in front of him, but there are no signs of cribbing, apart from the numerical coincidence. And both authors might have invented the 1-to-100 example independently or under the influence of older sources.

Further light on such older sources has come to me in a letter from Don Knuth. I had mentioned that the “pairing” algorithm for summing a series can be traced back to an eighth-century manuscript titled “Propositiones ad acuendos juvenes,” or “Problems to Sharpen the Young,” where the example given is our old familiar series 1-to-100. Knuth points out that the pairing idea appears another millennium earlier in the works of Archimedes. And this time there might well be evidence of influence, or borrowing, or some form of the direct transmission of ideas.

The “Propositiones ad acuendos juvenes” are usually attributed to a monk known as Alcuin of York, who was associated with the court of Charlemagne around the year 800. (The Latin manuscript is available online here. An English translation by Peter J. Burkholder was published in 1993 in HOST: An Electronic Bulletin for the History and Philosophy of Science and Technology, which was then distributed as an e-mail list. The mailing list archives are available on the Web, though in an oddly garish context.) Here is Burkholder’s translation of the Alcuin story, which takes the form of a problem and solution.

There is a ladder which has 100 steps. One dove sat on the first step, two doves on the second, three on the third, four on the fourth, five on the fifth, and so on up to the hundredth step. Let him say, he who can, How many doves were there in all?

Solution. There will be as many as follows: Take the dove sitting on the first step and add to it the 99 doves sitting on the 99th step, thus getting 100. Do the same with the second and 98th steps and you shall likewise get 100. By combining all the steps in this order, that is, one of the higher steps with one of the lower, you shall always get 100. The 50th step, however, is alone and without a match; likewise, the 100th stair is alone. Add them all and you will find 5050 doves.

When I first read Alcuin’s solution, the asymmetry of the process struck me as very strange. Why would anyone choose to pair the first element of the series with the next-to-the-last element, leaving the final element on its own? The result seemed needlessly complicated. Instead of the simple formula 50 × 101—familiar from so many tellings of the Gauss tale—we have (49 × 100) + 50 + 100. Did Alcuin favor this convoluted method just because it allowed him to deal with even hundreds rather than multiples of 101? That was my first thought, but now I wonder if there might not be a historical explanation—an instance of the powerful influence of an example from antiquity.

Archimedes’ treatment of the summation problem appears in Proposition 11 of On Spirals, but in Thomas Heath’s edition it is given as a bracketed interpolation in another treatise, On Conoids and Spheroids. Here is how Heath presents the argument:

Heath\'s Archimedes, page 105

Note the interesting grouping of terms in the third equation: A1 + An–1 = A2 + An–2 = . . . = An. There’s at least a hint here of the pairing process found in Alcuin’s solution. The resemblance is not actually all that deep, as can be seen by working through a small example, such as n = 10. Alcuin would form the sum S = (1+9) + (2+8) + (3+7) + (4+6) + 5 + 10, whereas the calculation of Archimedes (or Heath) is 2S = (1+9) + (2+8) + (3+7) + (4+6) + (5+5) + (6+4) + (7+3) + (8+2) + (9+1) + 10 + 10. Still, in spite of this underlying conceptual difference, I can imagine that whoever wrote the youth-sharpening solution to the 100-rung ladder problem was inspired by a glance at the work of Archimedes. And it’s not too much of a stretch to suppose that an educated monk in the court of Charlemagne might have some aquaintance with Archimedes. (On the other hand, David Singmaster suggests that the origin of the 100-doves problem is Chinese.)

(Another reason for caution in constructing supposed chains of transmission: a Web posting by Saïd Boutiche, dated 2005, also represents 1+2+3+…+100 as (49 × 100) + 50 + 100, and yet there is no evidence at all of influence from Archimedes or Alcuin.)

Whatever the ultimate source of the summation anecdote, I find it fascinating that we are still telling ourselves this little story, over and over, across decades and centuries and perhaps millennia. It has become a cultural artifact, like a fairy tale or a nursery rhyme, but attached specifically to mathematical culture. We learn it early on from a teacher or a book, and later we pass it on to the next generation, perhaps with alterations or embellishments that adapt it to the needs and tastes of our own time.

For more on the Gauss anecdote and related subjects, please see:

Finally, I have a couple of questions that I would like to throw out to anyone else who has time to waste on these trivial pursuits:

First, who was Franz Mathé? The 1906 pamphlet tells us nothing about the author, and I have failed to find anything about him elsewhere. He’s not listed in the Mathematical Genealogy Project.

Second, I know of one other early biographical work on Gauss that might plausibly include a version of the schoolroom anecdote. Here is the reference: Wittstein, Theodor. 1877. Gedachtnissrede auf Carl Friedrich Gauss. Hannover: Hahn. If anyone has access to a copy of this (the Harvard catalogue says there’s one in the Widener Depository) and would like to check for a Büttner story, I would be delighted to receive the information and post it here.

Update 2007-05-05: Barry Cipra has found and helpfully transcribed three more tellings of the story:

  • A 1937 essay by G. Waldo Dunnungton. With this addition we now have four published accounts of Gauss’s childhood written by Dunnington, spanning more than 30 years. In two of those versions he says nothing about the incident in Büttner’s schoolroom. In his 1955 biography, he tells the standard story, including the 1-to-100 example and the pairing method. In the 1937 version, the assignment is just “a long, difficult exercise in calculation,” in which Gauss “noticed the law of the arithmetical series.”
  • A 1997 book by J. Munro, Heroes of the Telegraph (full text available online). This one has a unique twist: The assignment was intended only for the older pupils, and young Gauss asked to be included.
  • Stephen W. Hawking’s 2005 book, God Created the Integers: The Mathematical Breakthroughs that Changed History. Hawking is one of those who offer an alternative to the “busywork” hypothesis. In his rendition, the assignment was made “when the class began to be unruly.”

Thanks, Barry.

Update 2008-08-05: I still don’t know anything about Franz Mathé, but the memoir by Theodor Wittstein is no longer a loose end. I got ahold of it last October, through the kindness of Sally Bosken of the U.S. Naval Observatory library, but neglected to post an update here. Wittstein does not mention the Büttner story, or anything else about Gauss’s childhood.