Prime After Prime

Are the prime numbers sprinkled randomly along the number line, like windblown seeds?Of course not: Primality is not a matter of chance but a product of simple arithmetic. A number is prime if and only if no smaller positive integer other than 1 divides it evenly.

Yet that’s not the end of the story. The distribution of the primes looks random, with irregular gaps and clusters that seem quite haphazard. If there’s a pattern, it’s inscrutable. As a matter of fact, the primes look random enough that you could play dice with them. Make a list of consecutive prime numbers (perhaps starting with 11, 13, 17, 19, . . . ) and reduce them modulo 7. In other words, divide each prime by 7 and keep only the remainder. The result is a sequence of integers drawn from the set {1, 2, 3, 4, 5, 6} that looks much like the outcome of repeatedly rolling a fair die.

$$\begin{align*}
11 \bmod 7 & \rightarrow 4 \qquad 47 \bmod 7 \rightarrow 5 \\
13 \bmod 7 & \rightarrow 6 \qquad 53 \bmod 7 \rightarrow 4 \\
17 \bmod 7 & \rightarrow 3 \qquad 59 \bmod 7 \rightarrow 3 \\
19 \bmod 7 & \rightarrow 5 \qquad 61 \bmod 7 \rightarrow 5 \\
23 \bmod 7 & \rightarrow 2 \qquad 67 \bmod 7 \rightarrow 4 \\
29 \bmod 7 & \rightarrow 1 \qquad 71 \bmod 7 \rightarrow 1 \\
31 \bmod 7 & \rightarrow 3 \qquad 73 \bmod 7 \rightarrow 3 \\
37 \bmod 7 & \rightarrow 2 \qquad 79 \bmod 7 \rightarrow 2 \\
41 \bmod 7 & \rightarrow 6 \qquad 83 \bmod 7 \rightarrow 6 \\
43 \bmod 7 & \rightarrow 1 \qquad 89 \bmod 7 \rightarrow 5 \\
\end{align*}$$

Working with a larger sample (the first million primes greater than \(10^7\)), I have tallied up the number of primes with each of the six possible remainders mod 7 (otherwise known as the six possible congruence classes mod 7). I have also simulated a million rolls of a six-sided die. Looking at the results of these two exercises, can you tell which is which?

               1        2        3        4        5        6
         166,787  166,569  166,714  166,573  166,665  166,692
             120      -98       47      -94       -2       25

               1        2        3        4        5        6
         166,768  166,290  166,412  166,638  167,282  166,610
             101     -377     -255      -29      615      -57

In each table the first line counts the number of outcomes, \(x\), in each of the six classes; the second line shows the difference \(x - \bar{x}\), where \(\bar{x}\) is the mean value 1,000,000 / 6 = 166,667. In both cases the numbers seem to be spread pretty evenly, without any obvious biases. The first table represents the prime residues mod 7. They have a somewhat flatter distribution than the simulated die, with smaller departures from the mean; the standard deviations of the two samples are 84 and 346 respectively. On the evidence of these tables it looks like either process could supply the randomness needed for a casual game of dice.

There’s more to randomness, however, than just ensuring that the results are evenly distributed across the allowed range. Individual events in the series must also be independent of one another. One roll of a die should have no effect on the outcome of the next roll. As a test of independence, we can look at pairs of successive events. How many times is a 1 followed by another 1, by a 2, by a 3, and so on? A 6 × 6 matrix serves to record the counts of the 36 possible pairs. If the process is truly random, all 36 pairs should be equally frequent, apart from small statistical fluctuations. We can turn the matrix into a color-coded “heatmap,” where cells with higher-than-average counts are shown in warm shades of pink and red, while those below the mean are filled with cooler shades of blue. (The quantity plotted is not the actual count \(x\) but a normalized variable \(w = (x_{i,\,j} - \bar{x})\, /\, \bar{x}\), where \(\bar{x}\) is again the mean value—in this case 1,000,000 / 36 = 27,778.) Here is the heatmap for the simulated fair die:

Figure 1.

Heatmap fair die

Not much going on there. Almost all the counts are so close to the mean value that the matrix cells appear as a neutral gray; a few are very pale pink or blue. It’s just what you would expect if consecutive rolls of a die are uncorrelated, and all possible pairs are equally likely.

Now for the corresponding matrix of consecutive primes mod 7:

Figure 2.

Heatmap 8 digit primes 1000001 mod 7

Well! I guess we’re not in Randomland anymore; this is where the old gray movie turns into Technicolor. The heatmap has a blue streak along the main diagonal (upper left to lower right), indicating that consecutive pairs of primes that have the same value mod 7 are strongly suppressed. In other words, the pairs \((1, 1), (2, 2), \ldots (6, 6)\) appear less often than they would in a truly random sequence. The superdiagonal (just above the main diagonal) is a lighter blue, meaning that \((i, j)\) pairs with \(j=i+1\) are seen at a little less than average frequency; for example, \((2, 3)\) and \((5, 6)\) have slightly negative values of normalized frequency. On the other hand, the subdiagonal (below the main diagonal) is all pink and red; pairs such as \((3, 2)\) or \((5, 4)\), with \(j=i-1\), occur with higher than average frequency. Away from the diagonal, in the upper right and lower left corners, we see a pastel checkerboard pattern.

If you’d prefer to squint at numbers rather than colored squares, here’s the underlying matrix:

                 pairs of consecutive primes mod 7
                   1      2      3      4      5      6
          1    15656  24376  33891  29964  33984  28916
          2    37360  15506  22004  32645  25095  33959
          3    25307  41107  14823  22747  32721  30009
          4    32936  26183  37129  14681  21852  33791
          5    24984  34207  26231  41154  15560  24529
          6    30543  25190  32636  25382  37453  15488

The departures from uniformity are anything but subtle. The third row, for example, shows that if you have just seen a 3 in the sequence of primes mod 7, the next number is much more likely to be a 2 than another 3. If you were wagering on a game played with prime-number dice, this bias could make a huge difference in the outcome. The prime dice are rigged!


These remarkably strong correlations in pairs of consecutive primes were discovered by Robert J. Lemke Oliver and Kannan Soundararajan of Stanford University, who discuss them in a preprint posted to the arXiv in March. What I find most surprising about the discovery is that no one noticed these patterns long ago. They are certainly conspicuous enough once you know how to look for them.

I suppose we can’t fault Euclid for missing them; ideas about randomness and probability were not well developed in antiquity. But what about Gauss? He was a connoisseur of prime tables, and he compiled his own lists of thousands of primes. In his youth, he wrote, “one of my first projects was to turn my attention to the decreasing frequency of primes, to which end I counted the primes in several chiliads . . . .” Furthermore, Gauss more less invented the idea of congruence classes and modular arithmetic. But apparently he never suspected there might be anything odd lurking in the congruences of pairs of consecutive primes.

In the 1850s the Russian mathematician Pafnuty Lvovich Chebyshev pointed out a subtle bias in the primes. Reducing the odd primes modulo 4 splits them into two subsets. All primes in the sequence 5, 13, 17, 29, 37, . . . are congruent to 1 mod 4; those in the sequence 3, 7, 11, 19, 23, 31, . . . are congruent to 3 mod 4. Chebyshev observed that primes in the latter category seem to be more abundant. Among the first 10,000 odd primes, for example, there are 4,943 congruent to 1 and 5,057 congruent to 3. However, the effect is tiny compared with the disparities seen in pairs of consecutive primes.

In modern times a few authors have reported glimpses of the consecutive-primes phenomenon; Lemke Oliver and Soundararajan mention three such sightings. (See references at the end of this article.) In the 1950s and 60s, Stanislaw Knapowski and Paul Turán investigated various aspects of prime residues mod m; in one paper, published posthumously in 1977, they discuss consecutive primes mod 4, with residues of 1 or 3. They “guess” that consecutive pairs with the same residue and those with different residues “are not equally probable.” In 2002 Chung-Ming Ko looked at sequences of consecutive primes (not just pairs of them) and constructed elaborate fractal patterns based on their varying frequencies. Then in 2011 Avner Ash and colleagues published an extended analysis of “Frequencies of Successive Pairs of Prime Residues,” including some matrices in which the diagonal depression is clearly evident.

Given these precedents, are Lemke Oliver and Soundararajan really the discoverers of the consecutive prime correlations? In my view the answer is yes. Although others may have seen the patterns before, they did not describe them in a way that registered on the consciousness of the mathematical community. As a matter of fact, when Lemke Oliver and Soundararajan announced their findings, the response was surprise verging on incredulity. Erica Klarreich, writing in Quanta, cited the reaction of James Maynard, a number theorist at Oxford:

When Soundararajan first told Maynard what the pair had discovered, “I only half believed him,” Maynard said. “As soon as I went back to my office, I ran a numerical experiment to check this myself.”

Evidently that was a common reaction. Evelyn Lamb, writing in Nature, quotes Soundararajan: “Every single person we’ve told this ends up writing their own computer program to check it for themselves.”

Well, me too! For the past few weeks I’ve been noodling away at lots of code to analyze primes mod m. What follows is an account of my attempts to understand where the patterns come from. My methods are computational and visual more than mathematical; I can’t prove a thing. Lemke Oliver and Soundararajan take a more rigorous and analytical approach; I’ll say a little more about their results at the end of this article.

If you would like to launch your own investigation, you’re welcome to use my code as a starting point. It is written in the Julia programming language, packed up in a Jupyter notebook, and available on GitHub. (Incidentally, this program was my first nontrivial experiment with Julia. I’ll have more to say about my experience with the language in a later post.)


All the examples presented above concern primes taken modulo 7, but there’s nothing special about the number 7 here. I chose it merely because the six possible remainders {1, 2, 3, 4, 5, 6} match the faces of an ordinary cubical die. Other moduli give similar results. Lemke Oliver and Soundararajan do much of their analysis with primes modulo 3, where there are only two congruence classes: A prime greater than 3 must leave a remainder of either 1 or 2 when divided by 3. This is the matrix of pair counts for the first million primes above \(10^7\):

                             1       2
                    1   218578  281453
                    2   281454  218514

Figure 3.

Heatmap 8 digit primes 1000001 mod 3

The pattern is rather minimalist but still recognizable: The off-diagonal entries for sequences \((1, 2)\) and \((2, 1)\) are larger than the on-diagonal entries for \((1, 1)\) and \((2, 2)\).

Primes modulo 10 have four congruence classes: 1, 3, 7, 9. Working in decimal notation, we don’t even need to do any arithmetic to see this. When numbers are written in base 10, every prime greater than 5 has a trailing digit of 1, 3, 7, or 9. Here are the frequency counts for the 16 pairs of consecutive final digits:

                          1      3      7      9
                 1    43811  76342  78170  51644
                 3    58922  41148  71824  78049
                 7    64070  68542  40971  76444
                 9    83164  63911  59063  43924

Figure 4.

Heatmap 8 digit primes 1000001 mod 10

The blue stripe along the main diagonal is clearly present, although elsewhere in the matrix the pattern is somewhat muted and muddled.

I have found that the correlations between successive primes show through most clearly when the modulus itself is a prime and also is not too small. Take a look at the heatmaps for consecutive primes mod 13 and mod 17:

Figure 5.

Heatmap 8 digit primes 1000001 mod 13

Figure 6.

Heatmap 8 digit primes 1000001 mod 17

Or how about mod 31?

Figure 7.

Heatmap 8 digit primes 1000001 mod 31

These would make great patterns for a quilt or a tiled floor, no? And there are interesting regularities visible in all of them. Diagonal stripes are prominent not just on the main corner-to-corner diagonal but throughout the matrix. Those stripes also generate a checkerboard motif; along any row or column, cells tend to alternate between red and blue. A subtler feature is an approximate bilateral symmetry across the antidiagonal (which runs from lower left to upper right). If you were to fold the square along this line, the cells brought together would be closely matched in color. (This is a fact noticed by Ash and his co-authors.)

As a focus of further analysis I have settled on looking at consecutive primes mod 19, a modulus large enough to yield clearly differentiated stripes but not so large as to make the matrix unweildy.

Figure 8.

Heatmap 8 digit primes 1000001 mod 19

How to make sense of what we’re seeing? A starting point is the observation that all the primes in our sample are odd numbers, and hence all the intervals between those primes are even numbers. For any given prime \(p\), the next candidates for primehood are \(p+2, p+4, p+6, \ldots\). Could this have something to do with the checkerboard pattern? If the steps between primes must be multiples of 2, that could certainly create correlations between every second cell in a given column or row. (Indeed, the every-other-cell correlations would be starkly obvious—all even-numbered entries would be exactly zero—if the modulus were an even number. It is only by “wrapping around” the edge of the matrix at an odd boundary that any of the even-numbered cells can be populated.)

The diagonal stripes in the matrix suggest strong correlations between all pairs of primes separated by a certain numerical interval. For example, the deepest blue diagonal and the brightest red diagonal are formed by cells separated by six places along the j axis. In the first row are cells 1 and 7, then 2 and 8, 3 and 9, and so on. It occurred to me that this relationship would be easier to perceive if I could “twist” the matrix, so that diagonals became columns. The idea is to apply a cyclic shift to each row; all the values in the row slide to the left, and those that fall off the left edge are reinserted on the right. The first row shifts by zero positions, the next row by one position, and so on. (Is there a name for this transformation? I’m just calling it a twist.)

When I wrote the code to apply this transformation, the result was not quite what I expected:

Figure 9.

Twisted nz heatmap 8 digit primes 1000001 mod 19

What are those zigzags all along the antidiagonal? I guessed that I must have an off-by-one error. Indeed this was the nature of the problem, though the bug lay in the data, not the algorithm. The matrices I’ve displayed in all the figures above are only partial; they suppress empty congruence classes. In particular, the matrix for pairs of primes modulo 19 ignores all primes congruent to 0 mod 19—on the sensible-sounding grounds that there are no such primes. After all, if \(p > 19\) and \(p \equiv 0 \bmod 19\), then \(p\) cannot be prime because it is divisible by 19. Nevertheless, a row and a column for \(p \equiv 0 \bmod 19\) are properly part of the matrix. When they are included, the color-coded tableau looks like this:

Figure 10.

Heatmap z 8 digit primes 1000001 mod 19

The presence of the zero row and column makes the definition of the twist transformation somewhat tidier: For each row \(i\), apply a leftward cyclic shift of \(i\) places. The resulting twisted matrix is also tidier:

Figure 11.

Twisted z heatmap 8 digit primes 1000001 mod 19

What do those vertical stripes tell us? In the original matrix, entry \(i, j\) represents the frequency with which \(i \bmod 19\) is followed by \(j \bmod 19\). Here, the color in cell \(i, j\) indicates the frequency with which \(i \bmod 19\) is followed by \((i + j) \bmod 19\). In other words, each column brings together entries with same interval mod 19 between two primes. For example, the leftmost column includes all pairs separated by an interval of length \(0 \bmod 19\), and the bright red column at \(j = 6\) counts all the cases where successive primes are separated by \(6 \bmod 19\).

The color coding gives a qualitative impression of which intervals are seen more or less commonly. For a more precise quantitative measure, we can sum along the columns and display the totals in a bar chart:

Figure 12.

Column sums twisted 8 digit primes 1000001 mod 19

Three obervations:

  • The even-odd disparity noted above is clearly visible in this graphic as well. With the exception of 0, every even interval stands out above its odd neighbors.
  • An inter-prime interval of 6 mod 19 is the most popular of all. Multiples of 6 (namely 12 and 18) are also favored but a little less so.
  • The interval 0 mod 19 is remarkably unpopular. These are the entries along the main diagonal of the original matrix, so it’s no surprise their sum is on the low side, but the deficit is deeper than I had guessed.

I wanted to understand the origin of these patterns. What makes interval 6 such a magnet for pairs of consecutive primes, and why do almost all the primes shun poor interval 0?

For the popularity of 6, I already had an inkling. In the 1990s Andrew Odlyzko, Michael Rubinstein, and Marek Wolf undertook a computational study of prime “jumping champions”:

An integer D is called a jumping champion if it is the most frequently occurring difference between consecutive primes ≤ x for some x.

Among the smallest primes (x less than about 600), the jumping champion is usually 2, but then 6 takes over and dominates for quite a long stretch of the number line. Somewhere around \(x = 10^{35}\), 6 cedes the championship to 30, which eventually gives way to 210. Odlyzko et al. estimate that the latter transition takes place near \(x = 10^{425}\). The numbers in this sequence of jumping champions—2, 6, 30, 210, . . . —are the primorials; the nth primorial is the product of the first n primes.

Why should primorials be the favored intervals between consecutive primes? If \(p\) is a large enough prime, then \(p + 2\) cannot be divisible by 2, \(p + 6\) cannot be divisible by either 2 or 3, \(p + 30\) cannot be divisible by 2, 3, or 5, and in general \(p + P_{n}\), where \(P_{n}\) is the nth primorial, cannot be divisible by any of the first n primes. Of course \(p + P_{n}\) might still be divisible by some larger prime, or there might be another prime between \(p\) and \(p + P_{n}\), so that the interprime interval is certainly not guaranteed to be a primorial. But these intervals have an edge over other contenders.

We can see this reasoning in action by taking the differences between successive elements in our list of a million eight-digit primes, then plotting their frequencies:

Figure 13.

Diffs 8 digit primes 1000001 color

Again interval 6 is the clear standout, accounting for 13.7 percent of the total; higher multiples of 6 also poke above their immediate neighbors. And note the overall shape of the distribution: a lump at the left (with a peak at 6), followed by a steady decline. The trend looks a little like a Poisson distribution, and indeed this is thought to be the correct description.

The color scheme slices the data set into tranches of 19 values each. The blue tranche, which includes inter-prime intervals of length 0 to 18, accounts for 68 percent of all the intervals present in the sample of a million primes; the gold tranche adds another 23 percent. The remaining 9 percent are spread widely and thinly. Not all of the intervals are shown in the graph; the spectrum extends as far as 210. (A single pair of consecutive primes in the sample has a separation of 210, namely 20,831,323 and 20,831,533.)

Figure 13 seems to reveal a great deal about the patterns of consecutive primes mod 19. I can make the graph even more informative with a simple rearrangement. Slide each 19-element tranche to the left until it aligns with the 0 tranche, stacking up bars that fall in the same column. Thus the second (gold) tranche moves left until bar 19 lines up with bar 0, and the third (rose) tranche brings together bar 38 with bar 0. Physically, this process can be imagined as wrapping the graph around a cylinder of circumference 19; mathematically, it amounts to reducing the inter-prime intervals modulo 19.

Figure 14.

Diffs 8 digit primes 1000001 color stacked mod 19

If you ignore the garish colors, Figure 14 is identical to Figure 12: All the bar heights match up. This should not be a surprise. In Figure 12 we reduce the primes modulo 19 and then take the differences between successive reduced primes; in Figure 14 we take the differences between the primes themselves and then reduce those differences modulo 19. The two procedures are equivalent:

\[(p \bmod 19 - q \bmod 19) \bmod 19 = (p - q) \bmod 19.\]

Looking at the colors now, the pieces of the puzzle fall into place. Why are primes mod 19 so often separated by an interval of 6? Well, “mod 19” has very little to do with it; 6 itself is by far the most common interval between primes in this sample. The only other nonnegligible contribution to \(\delta \equiv 6 \bmod 19\) comes from the third tranche, specifically a few pairs of primes at a distance of 44.

The predominance of the first tranche also explains the disparity between odd and even intervals. All the intervals in the first tranche are necessarily even; odd intervals (mod 19) begin to appear only with the second tranche (intervals 19 to 37) and for that reason alone they are less well populated. For the eight-digit primes in this sample, more than two-thirds of consecutive pairs are closer than 19 units and thus wind up in the first tranche. (The median spacing between the primes is 12. The mean interval is 16.68, in close accord with the theoretical prediction of 16.72.)

Finally, Figure 14 also has something to say about the rarity of 0 intervals mod 19. No two consecutive primes can fall into the same congruence class mod 19 unless they are separated by a distance of 38 or some multiple of 38. Thus such pairs don’t enter the scene until the beginning of the third tranche, and there can’t be very many of them. The million-prime sample has 8,384 consecutive pairs at a distance of 38—less than 1 percent of the total. This is the main reason that a prime-number die so rarely shows the same face twice in a row. It’s the origin of the blue diagonal streak in all the matrices.


I find it interesting that we can explain so much about the pattern of consecutive primes mod m without delving into any of the deep and distinctive properties of prime numbers. In fact, we can replicate much of the pattern without introducing primes at all.

Two hundred years ago, Gauss and Legendre observed that in the neighborhood of a number \(x\), the fraction of all integers that are prime is about \(1 / \log x\). In 1936 the Swedish mathematician Harald Cramér suggested that we interpret this fraction as a probability. The idea is to go through the integers in sequence, accepting each \(x\) with probability \(1 / \log x\). The numbers in the accepted set will not be prime except by coincidence, but they will have the same large-scale distribution as the primes. Here are the first few entries in a list of a million such “Cramér primes,” where the random selection process started with \(x = 10^7\):

     10000007
     10000022
     10000042
     10000065
     10000068
     10000098
     10000110
     10000116
     10000119
     10000128
     10000166

Now suppose we put these numbers through the same machinery we applied to the primes. We’ll reduce each Carmér prime mod 19 and then construct the 19 × 19 matrix of successors:

Figure 15.

Heatmap 8 digit cramers 1000007 mod 19

The prominent diagonal features look familiar, but they are much simpler than those in the corresponding prime diagrams. For any Cramér prime p mod 19, the most likely successor is p + 1 mod 19, and the least likely is p + 19 mod 19. Between these extremes there’s a smooth gradient in frequency or probability, with just a few small fluctuations that can probably be attributed to statistical noise.

One thing that’s missing from this matrix is the checkerboard motif. We can restore some of that structure by generating a new set of numbers I call Cramér semiprimes. They are formed by the same kind of probabilistic sieving of the integers, but this time we consider only the odd numbers as candidates, and adjust the probability to \(2\, / \log x\) to keep the overall density the same:

Figure 16.

Heatmap 8 digit semicramers 1000001 mod 19

That’s more like it! With all even numbers excluded from the sequence, the minimum interval between semicramers is 2, and that is also the likeliest spacing.

With one further modification, we get an even closer imitation of the true prime matrix. In addition to excluding all integers divisible by 2, we knock out those divisible by 3, and adjust the probability of selection accordingly. Call the resulting numbers Cramér demisemiprimes.

Figure 17.

Heatmap 8 digit demisemicramers 1000001 mod 19

Note that 6 mod 19 is the likeliest interval between Cramér demisemiprimes, just as it is between true primes, and there are the same echoes at intervals of 12 and 18. Indeed, the only conspicuous difference between this matrix and Figure 10 (the corresponding diagram for true primes) is in the column and the row for numbers congruent to 0 mod 19. There can be no such numbers among the primes. If we eliminate them also from the Cramér numbers, the two matrices become hard to distinguish. Here they are side by side:

Figure 18.

Primes cramers comparison z

If you look closely, there are differences to be found—check out the diagonal extending southeast from row 1, column 15—but overall these modified Cramer numbers are shockingly good fake primes. Even the symmetry about the antidiagonal is visible in both diagrams. And keep in mind that the two sets have only about 19 percent of their values in common; the Cramérs include 189,794 genuine primes.


I have one more twist to add to the tale. All the examples above are based on primes (or prime analogues) of eight decimal digits, or in other words numbers in the vicinity of \(10^7\). Do the same conclusions hold up with larger primes? Consider the tableau created by consecutive pairs of a million 40-digit primes, taken mod 19. The pattern is familiar but faded:

Figure 19.

Heatmap 40 digit primes 9166432 mod 19

Going on to primes of 400 digits each, again reduced mod 19, we find the colors have been bleached almost to oblivion:

Figure 20.

Heatmap 400 digit primes 6567379 mod 19

The blue streak on the main diagonal is barely discernible, and other features amount to mere random mottling.

Thus it seems that size matters when it comes to pairs of consecutive primes. For a hint as to why, take a look at the tally of differences between successive primes for the 40-digit sample:

Figure 21.

Diffs 40 digit primes 9166432

Compared with the distribution of intervals for eight-digit primes (Figure 13), the spectrum is much broader and flatter. In this representation the graph is truncated at interval 240; the long tail actually stretches far to the right, with the largest gap between consecutive primes at 1,328. Also, as predicted by Odlyzko and his colleagues, the most frequent interval between 40-digit primes is not 6 but 30.

Because of the wider distribution of intervals, the first tranche cannot dominate the behavior of the system in the way it does with eight-digit primes. When we stack up the tranches mod 19 (Figure 22, below), the first six or eight tranches all make substantial contributions. The odd-even alternation is still present, but the amplitude of these oscillations is much reduced. The leftmost bar in the graph, representing intervals congruent to 0 mod 19, is stunted but not as severely.

Figure 22.

Diffs 40 digit primes 9166432 stacked mod 19

The flattening of the spectrum becomes even more pronounced in the sample of a million 400-digit primes:

Figure 23.

Diffs 400 digit primes 6567379

Now the gaps between primes extend all the way out to 15,080, creating almost 800 tranches mod 19 (though only 13 are shown). And there’s a lot of intriguing, comblike structure in the array. In general, bars at multiples of 6 stand out at almost double the height of their near neighbors, showing the continuing influence of the smallest prime factors 2 and 3. Multiples of 6 that are also multiples of 30 reach even greater heights. Values in the sequence 42, 84, 126, 168, 210, . . . are also enhanced; these numbers are multiples of 42 = 2 × 3 × 7. And notice that 210, which is a multiple of 6 and 30 and 42, is the new champion interval, again supporting an Odlyzko prediction.

Despite all this intricate structure, when the bars are stacked up mod 19, the mixing of the 800 tranches is so thorough that the heights are almost uniform. All that’s left is a tiny bit of even-odd disparity.

Figure 24.

Diffs 400 digit primes 6567379 stacked mod 19

And the chronically unpopular class of intervals congruent to 0 mod 19 has finally caught up with its peers. Most of the height of the bars comes not from the dozen early tranches but from the hundreds of later ones representing intervals between 228 and 15,080 (all lumped together in the teal green area of the graph).


The experiments with large primes suggest a plausible surmise: As the size of the primes goes to infinity, all traces of correlations will fade to gray, and consecutive pairs of primes will be as random as rolls of an ideal die. But is it so? There are several reasons to be skeptical of this hypothesis. First, if we scale the modulus m along with the size of the primes—making it comparable in magnitude to the median gap between primes—the correlations may still show through. For my 40-digit sample, the median gap between primes is 66, so let’s look at the successive-pairs matrix mod 61. (To limit statistical noise, I did this computation with a sample of 10 million 40-digit primes rather than 1 million.)

Figure 25.

Heatmap 40 digit primes 5053744 mod61

The stripes are back! Indeed, in addition to the familiar bright red stripes at intervals of 6, there’s a more diffuse pink-and-blue undulation with a period of 30. I would love to see a matrix for primes of 400 digits, which might well have even more complex features, with interacting waves at periods of 6, 30, and 210. Regrettably, I can’t show you that figure. The median gap between 400-digit primes is about 640, so we’d need to set m equal to a prime in this range, say 641. Filling that 641 × 641 matrix would require about a billion consecutive 400-digit primes, which is more than I’m prepared to calculate.

There are other reasons to doubt that the correlations disappear entirely as the primes grow larger. The comblike structure seen so clearly in Figures 21 and 23 suggests that rules of divisibility by small primes have a major influence on the distribution of large primes mod m—and this influence does not wane when the primes grow larger still. Furthermore, even when m is much smaller than the median inter-prime interval, the blue streak remains faintly visible. Here is the matrix for pairs of consecutive 400-digit primes mod 3:

                             1       2
                    1   248291  251128
                    2   251127  249453

Differences between on-diagonal and off-diagonal elements are much smaller than with eight-digit primes (compare Figure 3), but the discrepancies still don’t look like random variation.

To get a clearer picture of how the correlations vary as a function of the size of the primes, I set out to sample the sequence of primes over the entire range from 1-digit numbers to 400-digit numbers. In this project I decided to go Gauss one better: He tabulated primes by the chiliad (a group of 1,000), and I’ve been computing them by the myriad (a group of 10,000). To measure the correlations among primes mod m, I calculated the mean value of the diagonal elements of the matrix and the mean of the off-diagonal elements, then took the off/on ratio. If successive primes were totally uncorrelated, the ratio should converge to 1.

Figure 26 shows the result for 797 myriads of primes mod 3. The curve is concave upward, with a steep initial decline and then a much flatter segment. Starting at about 100 digits, there are samples with off/on ratios of less than 1, meaning that the diagonal is more densely populated than the off-diagonal regions. But even at 400 digits the majority of the ratios are still above 1. What are we looking at here? Does the curve slowly approach a ratio of 1, or is there a limiting value slightly greater than 1? Unfortunately, computational experiments will not give a definitive answer.

Figure 26.

Myriads diag ratios 3


The paper by Lemke Oliver and Soundararajan brings quite different tools to bear on this problem. Although they do some numerical exploration, their focus is on finding an analytic solution. The goal is a mathematical function or formula whose inputs are four positive integers: m is a modulus, a and b are congruence classes of primes mod m, and x is an upper limit on the size of the primes. The formula should tell us how often a is followed by b among all primes smaller than x. If we were in possession of such a formula, we could color in all the squares in the m × m successor matrix without ever having to compute the actual primes up to x.

Describing the behavior of all primes up to x is far more challenging than taking a sample of primes in the neighborhood of x. And the analytic approach is harder in other ways: It requires ideas rather than just cpu cycles. The reward is also potentially greater. The equation \(A = \pi r^2\) yields an exact truth about all circles, something no finite series of computations (with a finite approximation to \(\pi\)) can give us. There’s the promise not just of rigor but of insight.

Sadly, I’ve not yet been able to gain much insight from reading the analysis of Lemke Oliver and Soundararajan. The blame lies mainly with gaping holes in my knowledge of analytic number theory, but I think it’s also fair to say that the math gets pretty hairy at certain points in this discourse. The equation below constitutes the Main Conjecture of Lemke Oliver and Soundararajan. (I have made a minor change of notation and simplified one aspect of the equation: The original applies to sequences of r consecutive primes, but this version describes pairs only, i.e., \(r = 2\).)

\[\pi(x; m, a, b) = \frac{\mathrm{li}(x)}{\phi(m)^2}\left(1 + c_1(m; a, b)\frac{\log \log x}{\log x} + c_2(m; a, b) \frac{1}{\log x} + O\Big( \frac{1}{(\log x)^{7/4}} \Big) \right)\]

I think I understand enough of what’s going on here to at least offer a glossary. To the left of the equal sign, \(\pi(x; m, a, b)\) denotes a counting function; whereas \(\pi(x)\) counts the primes up to \(x\), \(\pi(x; m, a, b)\) is the number of pairs of consecutive primes mod \(m\) that fall into the congruence classes \(a\) and \(b\). To the right of the equal sign, the main coefficient \(\mathrm{li}(x) / \phi(m)^2\) is essentially the mean or expected number of pairs if the primes were distributed uniformly at random, with no correlations between successive primes; \(\mathrm{li}(x)\) is the logarithmic integral of \(x\), an approximation to \(\pi(x)\), and \(\phi(m)^2\) is the Euler totient function, which counts the square of the number of possible congruence classes for \(m\), or in other words the number of elements in the successor matrix.

The leading term inside the large parentheses is simply \(1\), and so it takes on the value of the main coefficient \(\mathrm{li}(x) / \phi(m)^2\); thus the mean number of pairs \((a, b)\) becomes the first approximation to the counting function. The three following terms act as corrections to this first approximation; for large \(x\) they should get progressively smaller, because \(\log \log x / \log x \gt 1 / \log x \gt 1 / (\log x)^{7/4}\) whenever \(x \gt e^e \approx 15\).

What about the coefficients of those three correction terms? The notation O(\cdot) for the smallest term indicates that we’re only going to worry about the term’s order of magnitude—which will be small for large \(x\). The coefficient \(c_1\) takes the following form in the case of \(r = 2\):

\[c_1(m; a, b) = \frac{1}{2} - \frac{\phi(m)}{2} (\#\{a \equiv b \bmod m\})\]

The expression \(\#\{a \equiv b \bmod m\}\) counts the number of cases where \(a\) and \(b\) lie in the same congruence class mod \(m\). Thus the effect of the term (if I understand correctly) is to reduce the overall count along the matrix diagonal, where \(a \equiv b \bmod m\).

As for coefficient \(c_2\), Lemke Oliver and Soundararajan remark that “in general, [it] seems complicated.” Indeed it does. And so this is the place where I should encourage those readers who want to know more to go read the original.

The complexity of the mathematical treatment leaves me feeling frustrated, but it’s hardly unusual for an easily stated problem to require a deep and difficult solution. I hang onto the hope that some of the technicalities will be brushed aside and the main ideas will emerge more clearly with further work. In the meantime, it’s still possible to explore a fascinating and long-hidden corner of number theory with the simplest of computational tools and a bit of graphics.

“God may not play dice with the universe, but something strange is going on with the prime numbers”—so said Paul Erdős and/or Mark Kac, though only with a little help from Carl Pomerance. The strangeness seems to be at its strangest when we play dice with the primes.


Addendum 2016-06-14. I noted above that the distribution of primes mod 7 seems flatter, or more nearly uniform, than the result of rolling a fair die. John D. Cook has taken a chi-squared test to the data and shows that the fit to uniform distribution is way too good to be the plausible outcome of a random process. His first post deals with the specific case of primes modulo 7; his second post considers other moduli.


References

Ash, Avner, Laura Beltis, Robert Gross, and Warren Sinnott. 2011. Frequencies of successive pairs of prime residues. Experimental Mathematics 20(4):400–411.

Chebyshev, Pafnuty Lvovich. 1853. Lettre de M. le Professeur Tchébychev à M. Fuss sur un nouveaux théorème relatif aux nombres premiers contenus dans les formes 4n + 1 et 4n + 3. Bulletin de la Class Physico-mathematique de l’Academie Imperiale des Sciences de Saint-Pétersbourg 11:208. Google Books

Cramér, Harald. 1936. On the order of magnitude of the difference between consecutive prime numbers. Acta Arithmetica 2:23–46. PDF

Derbyshire, John. 2002. Chebyshev’s bias.

Granville, Andrew. 1995. Harald Cramér and the distribution of prime numbers. Harald Cramér Symposium, Stockholm, 1993. Scandinavian Actuarial Journal 1:12–28. PDF

Granville, Andrew, and Greg Martin. 2004. Prime number races. arXiv

Hamza, Kais, and Fima Klebaner. 2012. On the statistical independence of primes. The Mathematical Scientist 37:97–105.

Klarreich, Erica. 2016. Mathematicians discover prime conspiracy. Quanta.

Knapowski, S., and P. Turán. 1977. On prime numbers ? 1 resp. 3 mod 4. In Number Theory and Algebra: Collected Papers Dedicated to Henry B. Mann, Arnold E. Ross, and Olga Taussky-Todd, pp. 157–165. Edited by Hans Zassenhaus. New York: Academic Press.

Ko, Chung-Ming. 2002. Distribution of the units digit of primes. Chaos Solitons Fractals 13(6):1295–1302.

Lamb, Evelyn. 2016. Peculiar pattern found in ‘random’ prime numbers. Nature doi:10.1038/nature.2016.19550.

Lemke Oliver, Robert J., and Kannan Soundararajan. 2016 preprint. Unexpected biases in the distribution of consecutive primes. arXiv

Odlyzko, Andrew, Michael Rubinstein, and Marek Wolf. 1999. Jumping champions. Experimental Mathematics 8(2):107–118.

Rubinstein, Michael, and Peter Sarnak. 1994. Chebyshev’s bias. Experimental Mathematics 3:173–197. Project Euclid

Tao, Terrence. Structure and randomness in the prime numbers. PDF

Posted in computing, mathematics | 3 Comments

Where’s My Petabyte Disk Drive?

Fourteen years ago I noted that disk drives were growing so fast I couldn’t fill them up. Between 1997 and 2002, storage capacity doubled every year, allowing me to replace a 3 gigabyte drive with a new 120 gigabyte model. I wrote:

Extrapolating the steep trend line of the past five years predicts a thousandfold increase in capacity by about 2012; in other words, today’s 120-gigabyte drive becomes a 120-terabyte unit.

Extending that same growth curve into 2016 would allow for another four doublings, putting us on the threshold of the petabyte disk drive (i.e., \(10^{15}\) bytes).

None of that has happened. The biggest drives in the consumer marketplace hold 2, 4, or 6 terabytes. A few 8- and 10-terabyte drives were recently introduced, but they are not yet widely available. In any case, 10 terabytes is only 1 percent of a petabyte. We have fallen way behind the growth curve.

The graph below extends an illustration that appeared in my 2002 article, recording growth in the areal density of disk storage, measured in bits per square inch:

Graph of hard disk areal density in bits per square inch from 1956 to 2016

The blue line shows historical data up to 2002 (courtesy of Edward Grochowski of the IBM Almaden Research Center). The bright green line represents what might have been, if the 1997–2002 trend had continued. The orange line shows the real status quo: We are three orders of magnitude short of the optimistic extrapolation. The growth rate has returned to the more sedate levels of the 1970s and 80s.


What caused the recent slowdown? I think it makes more sense to ask what caused the sudden surge in the 1990s and early 2000s, since that’s the kink in the long-term trend. The answers lie in the details of disk technology. More sensitive read heads developed in the 90s allowed information to be extracted reliably from smaller magnetic domains. Then there was a change in the geometry of the domains: the magnetic axis was oriented perpendicular to the surface of the disk rather than parallel to it, allowing more domains to be packed into the same surface area. As far as I know, there have been no comparable innovations since then, although a new writing technology is on the horizon. (It uses a laser to heat the domain, making it easier to change the direction of magnetization.)

As the pace of magnetic disk development slackens, an alternative storage medium is coming on strong. Flash memory, a semiconductor technology, has recently surpassed magnetic disk in areal density; Micron Technologies reports a laboratory demonstration of 2.7 terabits per square inch. And Samsung has announced a flash-based solid-state drive (SSD) with 15 terabytes of capacity, larger than any mechanical disk drive now on the market. SSDs are still much more expensive than mechanical disks—by a factor of 5 or 10—but they offer higher speed and lower power consumption. They also offer the virtue of total silence, which I find truly golden.

Flash storage has replaced spinning disks in about a quarter of new laptops, as well as in all phones and tablets. It is also increasingly popular in servers (including the machine that hosts bit-player.org). Do disks have a future?

open-case photo of a 15-year-old 2.5-inch disk drive

In my sentimental moments, I’ll be sorry to see spinning disks go away. They are such jewel-like marvels of engineering and manufacturing prowess. And they are the last link in a long chain of mechanical contrivances connecting us with the early history of computing—through Turing’s bombe and Babbage’s brass gears all the way back to the Antikythera mechanism two millennia ago. From here on out, I suspect, most computers will have no moving parts.

Maybe in a decade or two the spinning disk will make a comeback, the way vinyl LPs and vacuum tube amplifiers have. “Data that comes off a mechanical disk has a subtle warmth and presence that no solid-state drive can match,” the cogniscenti will tell us.


“You can never be too rich or too thin,” someone said. And a computer can never be too fast. But the demand for data storage is not infinitely elastic. If a file cabinet holds everything in the world you might ever want to keep, with room to spare, there’s not much added utility in having 100 or 1,000 times as much space.

In 2002 I questioned whether ordinary computer users would ever fill a 1-terabyte drive. Specifically, I expressed doubts that my own files would ever reach the million megabyte mark. Several readers reassured me that data will always expand to fill the space available. I could only respond “We’ll see.” Fourteen years later, I now have the terabyte drive of my dreams, and it holds all the words, pictures, music, video, code, and whatnot I’ve accumulated in a lifetime of obsessive digital hoarding. The drive is about half full. Or half empty. So I guess the outcome is still murky. I can probably fill up the rest of that drive, if I live long enough. But I’m not clamoring for more space.

One factor that has surely slowed demand for data storage is the emergence of cloud computing and streaming services for music and movies. I didn’t see that coming back in 2002. If you choose to keep some of your documents on Amazon or Azure, you obviously reduce the need for local storage. Moreover, offloading data and software to the cloud can also reduce the overall demand for storage, and thus the global market for disks or SSDs. A typical movie might take up 3 gigabytes of disk space. If a million people load a copy of the same movie onto their own disks, that’s 3 petabytes. If instead they stream it from Netflix, then in principle a single copy of the file could serve everyone.

In practice, Netflix does not store just one copy of each movie in some giant central archive. They distribute rack-mounted storage units to hundreds of internet exchange points and internet service providers, bringing the data closer to the viewer; this is a strategy for balancing the cost of storage against the cost of communications bandwidth. The current generation of the Netflix Open Connect Appliance has 36 disk drives of 8 terabytes each, plus 6 SSDs that hold 1 terabyte each, for a total capacity of just under 300 terabytes. (Even larger units are coming soon.) In the Netflix distribution network, files are replicated hundreds or thousands of times, but the total demand for storage space is still far smaller than it would be with millions of copies of every movie.

A recent blog post by Eric Brewer, Google’s vice president for infrastructure, points out:

The rise of cloud-based storage means that most (spinning) hard disks will be deployed primarily as part of large storage services housed in data centers. Such services are already the fastest growing market for disks and will be the majority market in the near future. For example, for YouTube alone, users upload over 400 hours of video every minute, which at one gigabyte per hour requires more than one petabyte (1M GB) of new storage every day or about 100x the Library of Congress.

Thus Google will not have any trouble filling up petabyte drives. An accompanying white paper argues that as disks become a data center specialty item, they ought to be redesigned for this environment. There’s no compelling reason to stick with the present physical dimensions of 2½ or 3½ inches. Moreover, data-center disks have different engineering priorities and constraints. Google would like to see disks that maximize both storage capacity and input-output bandwidth, while minimizing cost; reliability of individual drives is less critical because data are distributed redundantly across thousands of disks.

The white paper continues:

An obvious question is why are we talking about spinning disks at all, rather than SSDs, which have higher [input-output operations per second] and are the “future” of storage. The root reason is that the cost per GB remains too high, and more importantly that the growth rates in capacity/$ between disks and SSDs are relatively close . . . , so that cost will not change enough in the coming decade.


If the spinning disk is remodeled to suit the needs and the economics of the data center, perhaps flash storage can become better adapted to the laptop and desktop environment. Most SSDs today are plug-compatible replacements for mechanical disk drives. They have the same physical form, they expect the same electrical connections, and they communicate with the host computer via the same protocols. They pretend to have a spinning disk inside, organized into tracks and sectors. The hardware might be used more efficiently if we were to do away with this charade.

Or maybe we’d be better off with a different charade: Instead of dressing up flash memory chips in the disguise of a disk drive, we could have them emulate random access memory. Why, after all, do we still distinguish between “memory” and “storage” in computer systems? Why do we have to open and save files, launch and shut down applications? Why can’t all of our documents and programs just be everpresent and always at the ready?

In the 1950s the distinction between memory and storage was obvious. Memory was the few kilobytes of magnetic cores wired directly to the CPU; storage was the rack full of magnetic tapes lined up along the wall on the far side of the room. Loading a program or a data file meant finding the right reel, mounting it on a drive, and threading the tape through the reader and onto the take-up reel. In the 1970s and 80s the memory/storage distinction began to blur a little. Disk storage made data and programs instantly available, and virtual memory offered the illusion that files larger than physical memory could be loaded all in one go. But it still wasn’t possible to treat an entire disk as if all the data were all present in memory. The processor’s address space wasn’t large enough. Early Intel chips, for example, used 20-bit addresses, and therefore could not deal with code or data segments larger than \(2^{20} \approx 10^6\) bytes.

We live in a different world now. A 64-bit processor can potentially address \(2^{64}\) bytes of memory, or 16 exabytes (i.e., 16,000 petabytes). Most existing processor chips are limited to 48-bit addresses, but this still gives direct access to 281 terabytes. Thus it would be technically feasible to map the entire content of even the largest disk drive onto the address space of main memory.

In current practice, reading from or writing to a location in main memory takes a single machine instruction. Say you have a spreadsheet open; the program can get the value of any cell with a load instruction, or change the value with a store instruction. If the spreadsheet file is stored on disk rather than loaded into memory, the process is quite different, involving not single instructions but calls to input-output routines in the operating system. First you have to open the file and read it as a one-dimensional stream of bytes, then parse that stream to recreate the two-dimensional structure of the spreadsheet; only then can you access the cell you care about. Saving the file reverses these steps: The two-dimensional array is serialized to form a linear stream of bytes, then written back to the disk. Some of this overhead is unavoidable, but the complex conversions between serialized files on disk and more versatile data structures in memory could be eliminated. A modern processor could address every byte of data—whether in memory or storage—as if it were all one flat array. Disk storage would no longer be a separate entity but just another level in the memory hierarchy, turning what we now call main memory into a new form of cache. From the user’s point of view, all programs would be running all the time, and all documents would always be open.

Is this notion of merging memory and storage an attractive prospect or a nightmare? I’m not sure. There are some huge potential problems. For safety and sanity we generally want to limit which programs can alter which documents. Those rules are enforced by the file system, and they would have to be re-engineered to work in the memory-mapped environment.

Perhaps more troubling is the cognitive readjustment required by such a change in architecture. Do we really want everything at our fingertips all the time? I find it comforting to think of stored files as static objects, lying dormant on a disk drive, out of harm’s way; open documents, subject to change at any instant, require a higher level of alertness. I’m not sure I’m ready for a more fluid and frenetic world where documents are laid aside but never put away. But I probably said the same thing 30 years when I first confronted a machine capable of running multiple programs at once (anyone remember Multifinder?).

The dichotomy between temporary memory and permanent storage is certainly not something built into the human psyche. I’m reminded of this whenever I help a neophyte computer user. There’s always an incident like this:

“I was writing a letter last night, and this morning I can’t find it. It’s gone.”

“Did you save the file?”

“Save it? From what? It was right there on the screen when I turned the machine off.”


Finally the big questions: Will we ever get our petabyte drives? How long will it take? What sorts of stuff will we keep on them when the day finally comes?

The last time I tried to predict the future of mass storage, extrapolating from recent trends led me far astray. I don’t want to repeat that mistake, but the best I can suggest is a longer-term baseline. Over the past 50 years, the areal density of mass-storage media has increased by seven orders of magnitude, from about \(10^5\) bits per square inch to about \(10^{12}\). That works out to about seven years for a tenfold increase, on average. If that rate is an accurate predictor of future growth, we can expect to go from the present 10 terabytes to 1 petabyte in about 15 years. But I would put big error bars around that number.

I’m even less sure about how those storage units will be used, if in fact they do materialize. In 2002 my skepticism about filling up a terabyte of personal storage was based on the limited bandwidth of the human sensory system. If the documents stored on your disk are ultimately intended for your own consumption, there’s no point in keeping more text than you can possibly read in a lifetime, or more music than you can listen to, or more pictures than you can look at. I’m now willing to concede that a terabyte of information may not be beyond human capacity to absorb. But a petabyte? Surely no one can read a billion books or watch a million hours of movies.

This argument still seems sound to me, in the sense that the conclusion follows if the premise is correct. But I’m no longer so sure about the premise. Just because it’s my computer doesn’t mean that all the information stored there has to be meant for my eyes and ears. Maybe the computer wants to collect some data for its own purposes. Maybe it’s studying my habits or learning to recognize my voice. Maybe it’s gathering statistics from the refrigerator and washing machine. Maybe it’s playing go, or gossiping over some secret channel with the Debian machine across the alley.

We’ll see.

Posted in computing | 33 Comments

The Bug That Ate Thursday

As I was finishing up the previous post here at bit-player.org, I noticed something off-kilter about the appearance of a few mathematical expressions. Here’s an enlarged example:

Typeset expression a + b- c, with too little space between the b and the minus sign.

Notice the spacing around the minus sign. It’s too close to the argument on its left, whereas the plus sign lies right in the middle. The proper rendering of this expression looks like this:

Typeset expression a + b - c, with correct, symmetrical spacing around both plus and minus signs.

Closely comparing the two images, I realized that spacing isn’t the only issue. In the malformed version the minus sign is also a little too long, too low, and too skinny.

For typesetting mathematics, I rely on MathJax, an amazing JavaScript program created by Davide Cervone of Union College. It works like magic: I write in standard TeX (math mode only), and the typeset output appears beautifully formatted in your web browser, with no need to bother about installing fonts or downloading plugins. For the past few years MathJax has been totally reliable, so this spacing glitch came as an annoying surprise.

The notes that follow record both what I did and what I thought as I tried to track down the cause of this problem. If anyone else ever bumps into the bug, the existence of this document might save them some angst and agita. Besides, everybody likes a detective story—even if the detective turns out to be more bumbling than brilliant. (If you just want to know how it comes out, skip to the end.)


Hypothesis: My first thought on seeing the wayward minus sign was that I must have typed something wrong. The TeX source code for the expression shown above is so simple (just a + b - c) that there’s not much room for error, but accidents happen. Maybe one of those space characters is not an ordinary word space (ASCII 0x20) but a non-breaking space (HTML  ). Or maybe the hyphen that represents a minus sign is not really a hyphen (ASCII 0x2D) but an en-dash (HTML – or –) or a discretionary hyphen (HTML ­ or ­). Experiment 1: Try typing the expression again, very carefully. Result: No change. Experiment 2: Copy the original source text into an editor that shows raw hexadecimal byte values. Result: Nothing exotic. Experiment 3: Copy the source text into a different TeX system (Pierre-Yves Chatelier’s LaTeXiT). Result: Typesets correctly. Conclusion: Probably not a typo.

Question: Could it be a browser bug? Tests: Try it in Chrome, Firefox, Safari, Opera. Results: Same appearance in all of them. Conclusion: It’s not the browser.

Internet interlude: The most important debugging tools today are Google and Stack Overflow. Most likely the answer is already out there. But searches for “minus sign spacing MathJax” and “minus sign spacing TeX” turn up nothing useful. The most promising leads take me to discussions of the binary subtraction operator \(a - b\) vs. the unary negation operator \(-b\). That’s not the issue here, so I am thrown back on my own resources.

Question: Is it just my machine? Test: Try opening the same page on another laptop. Result: Same appearance. However, these two computers are very similar. In particular, they have the same fonts installed. Test: Try a third machine, with different fonts. Result: No change.

Question: Is the problem confined to the one article I’m currently writing, or does it show up in earlier blog posts as well? Research: Page back through the bit-player archives. I find several more instances of the bug. Followup question: Was the minus-sign spacing in those earlier articles already botched when I wrote and published them? Or were they correct then, and the bug was introduced by some later change in the software environment?

Clue: In the course of rummaging through old blog posts, I discover that the spacing anomaly appears only in “inline” math expressions (those that appear within the flow of a paragraph), not in “display” equations (which are set off on a line of their own). The two rendering modes are invoked by surrounding an expression with different sets of delimiters: \( ... \) for inline and \[ ... \] for display. By merely toggling between round and square brackets, I find I can turn the bug on and off. This discovery leads me to suppose there really might be something awry within MathJax. If it formats an expression correctly in one mode, why does it fail on the same input text in another mode?

Investigation: Using browser developer tools, I examine the HTML markup that MathJax writes into the document. In display mode (where the spacing is correct), here’s the coding for the minus sign:

<span class="mo" id="MathJax-Span-15"
style="font-family: STIXGeneral-Regular;
padding-left: 0.228em;">-</span>

The phrase I have highlighted in red is the crucial bit of styling that sets the spacing on the left side of the minus operator. Here’s the corresponding markup for the minus sign in the inline version of the same expression:

<span class="mo" id="MathJax-Span-22"
style="font-family: STIXGeneral-Regular;">–</span>

The padding-left statement is absent. This is the proximate cause of the incorrect spacing. But why does MathJax supply the appropriate spacing in display mode but omit it in inline mode? That’s the puzzle.

Inquiry: I turn to the MathJax source-code repository on GitHub, and browse the issues database. Nothing relevant turns up. Likewise the MathJax user group forum. Baffling. If the problem really is a MathJax bug, someone would surely have reported it, unless it’s quite new. I consider opening a new issue, but decide to wait until I know more.

Question: The bug seems to be everywhere on bit-player.org, but what about the rest of the web? On MathOverflow (which I know uses MathJax) it doesn’t take long to find an inline equation that includes a minus sign. It is formatted perfectly. David Mumford’s blog is another MathJax site; I poke around there and find another inline equation with a correctly spaced minus sign. Uh oh. The finger of blame is pointing back toward me and away from MathJax.

Question: Am I using the same version of MathJax as those other sites, and the same configuration file? Not exactly, but when I try several other versions (including older ones, in case this is a recently introduced bug), there’s no change.

Pause for reflection: MathJax seems to be behaving differently on bit-player than it does on other sites. What could account for that difference? There are dozens of possible factors, but I have a leading candidate: bit-player is built on the WordPress blogging platform, and the other sites I’m looking at are not. I have no idea how the interaction of WordPress and MathJax could lead to this particular outcome, but they are both complicated software systems, with lots going on behind the curtains.

Experiment: I can test the WordPress hypothesis by setting up a web page that has everything in common with the bit-player site—the same server hardware and software, and the same MathJax processor—but that lives outside the WordPress system. I do exactly that, and find that minus signs are correctly formatted in both display and inline equations. Conclusion: It sure looks like WordPress is messing with my TeX!

Revelation: Throughout this diagnostic adventure, I’ve been relying heavily on the developer tools in the Chrome and Firefox browsers. These tools provide a peek into a page’s HTML encoding as it is displayed by the browser, after MathJax and any other JavaScript programs have worked their transformations on the source text. Now, for sheer lack of any better ideas, I decide to try the View Source command, which shows the HTML as received from the server, before any JavaScript programs run, and in particular before MathJax has converted TeX source code into typeset mathematical output. Instantly, the root of the problem is staring me in the face. The display-mode TeX is exactly as I wrote it: \[a + b - c\]. But the inline-mode markup is this: \(a + b &#8211; c\). The HTML entity &#8211; specifies an en-dash. Where did that come from? Actually, I’m pretty sure I know where; what I don’t know is why. WordPress has built-in functions to “prettify” text, converting typewriter quote marks ('', "") to typographer’s quotes (‘ ’, “ ”). More to the point, the program also replaces a double hyphen (--) with an en-dash (–) and a triple hyphen (---) with an em-dash (—). Although I haven’t been typing double hyphens in the math expressions, I still suspect that the WordPress character substitution process has something to do with those troublesome en-dashes.

Confirmation: Before investing more effort in this hypothesis, I try to make sure I’m on the right track. Typing my test expression with an en-dash instead of a hyphen produces output identical to the buggy version, in display mode as well as inline mode. Performing the same experiment in LaTeXiT yields a very similar result.

The culprit exposed: Searching for #8211 in the WordPress source code takes me to the file formatting.php, where I find a function called wptexturize. PHP is not my favorite programming language, but it’s easy enough to guess what these lines are about (I have simplified and abbreviated the statements for clarity):

$static_characters = array( '---', ' -- ', '--', ' - ')
$static_replacements = array( $em_dash, ' ' . $em_dash . ' ',
    $en_dash, ' ' . $en_dash . ' ')

Note the fourth element of the $static_characters array: a hyphen surrounded by spaces. The corresponding element of $static_replacements is an en-dash surrounded by spaces. I call that a smoking gun. MathJax, like other TeX processors, expects an ASCII hyphen as a minus sign; if you feed it an en-dash, it’s not going to recognize it as a mathematical operator. (When Knuth was developing TeX, circa 1980, no standard character encoding existed beyond the 96 codes of plain ASCII.)

The fix: It could be as simple as writing a+b-c instead of a + b - c! When I make that minor change to the text, it works like a charm. Why didn’t I think of trying that sooner? I guess because TeX in math mode promises to ignore whitespace in the source code, and it never occurred to me that WordPress doesn’t have to honor that promise. Thus I can solve the immediate problem just by removing spaces around minus signs. As a permanent remedy, however, changing my writing habits is not appealing. Nor is sifting through all my earlier posts to remove those spaces. The fact is, I don’t want hyphens to magically become en-dashes while I’m not looking. It may be a feature for some people, but for me it’s a bug.

What I did. The first commandment of WordPress development is “Thou shall not modify the core files.” But in that respect I’m already a sinner, and unrepentant. Yeah, I edited those two arrays in the formatting.php file, and it felt good.

Lessons learned. In hindsight, I see that I missed several opportunities to root out the problem more quickly. Next time I’ll remember View Source. And if I had done a better job of early-stage analysis, I would have been able to find help more efficiently. I am not the only one to confront this glitch, but I needed better search terms to follow the breadcrumbs of those who went before. Also, along the way I misinterpreted some important clues. When I discovered that the bug affects only inline mode and not display mode, I was quite sure that fact implicated MathJax, but I was wrong. (As it happens, I still don’t really understand why display mode is immune to the bug. Why is the hyphen converted to an en-dash when I enclose it in slashed round brackets, but not when it appears in slashed square brackets? Evidently the wptexturizing treatment is skipped in the latter case, but I lack the stamina to slog through all that PHP to figure out why.)

The big picture: I’m not mad at WordPress. I still believe it is a wonder of the age, making millions of people into instant, pushbutton publishers. According to some reports, it powers a quarter of all web sites. In this respect it may well be the most important application-layer software for fulfilling the original promise of the World Wide Web: allowing all of us to be contributors and creators rather than merely consumers of mass media. But there’s a cost: Keeping WordPress easy on the outside seems to require a dense thicket of thorns and briers on the inside. As the years go by I find I spend too much time fighting against its automation, which is a joyless task. I would prefer something simpler. I have Jekyll envy.

Yet my main takeaway after this episode is gratitude for open-source software. If MathJax and WordPress had been sealed, blackbox applications, I would have been helpless to help myself, unable to do anything about the problem beyond whining and pleading.

Posted in computing | 5 Comments

Number Factoids

The web sites numbersaplenty.com and numberworld.info dish up a smorgasbord of facts about every natural number from 1 to 999,999,999,999,999. Type in your favorite positive integer (provided it’s less than 1015) and you’ll get a list of prime factors, a list of divisors, the number’s representation in various bases, its square root, and lots more.

I first stumbled upon these sites (and several others like them) about a year ago. I revisited them recently while putting together the Carnival of Mathematics. I was looking for something cute to say about the new calendar year and was rewarded with the discovery that 2016 is a triangular number: 2016 bowling pins can be arranged in an equilateral triangle with 63 pins per side.

This incident set me to thinking: What does it take to build a web site like these? Clearly, the sites do not have 999,999,999,999,999 HTML files sitting on a disk drive waiting to be served up when a visitor arrives. Everything must be computed on the fly, in response to a query. And it all has to be done in milliseconds. The question that particularly intrigued me was how the programs recognize that a given number has certain properties or is a member of a certain class—a triangular number, a square, a Fibonacci, a factorial, and so on.

I thought the best way to satisfy my curiosity would be to build a toy number site of my own. Here it is:

Number Factoids




Prime factors:
Prime number ?
Square-free number ?
Square-root-smooth number ?
Square number ?
Triangular number ?
Factorial number ?
Fibonacci number ?
Catalan number ?
Somos-4 number ?

Elapsed time:

This one works a little differently from the number sites I’ve found on the web. The computation is done not on my server but on your computer. When you type a number into the input field above, a JavaScript program running in your web browser computes the prime factors of the number and checks off various other properties. (The source code for this program is available on GitHub, and there’s also a standalone version of the Number Factoids calculator.)

Because the computation is being done by your computer, the performance depends on what hardware and software you bring to the task. Especially important is the JavaScript engine in your browser. As a benchmark, you might try entering the number 999,999,999,999,989, which is the largest prime less than 1015. The elapsed time for the computation will be shown at the bottom of the panel. On my laptop, current versions of Chrome, Firefox, Safari, and Opera give running times in the range of 150 to 200 milliseconds. (But an antique iPad takes almost 8 seconds.)

Most of that time is spent in factoring the integer (or attempting to factor it in the case of a prime). Factoring is reputed to be a hard problem, and so you might suppose it would make this whole project infeasible. But the factoring computation bogs down only with really big numbers—and a quadrillion just isn’t that big anymore. Even a crude trial-division algorithm can do the job. In the worst case we need to try dividing by all the odd numbers less than \(\sqrt{10^{15}}\). That means the inner loop runs about 16 million times—a mere blink of the eye.

Once we have the list of prime factors for a number N, other properties come along almost for free. Primality: We can tell whether or not N is prime just by looking at the length of the factor list. Square-freeness: N is square-free if no prime appears more than once in the list. Smoothness: N is said to be square-root smooth if the largest prime factor is no greater than \(\sqrt{N}\). (For example, \(12 = 2 \times 2 \times 3\) is square-root smooth, but \(20 = 2 \times 2 \times 5\) is not.)

The factor list could also be used to detect square numbers. N is a perfect square if every prime factor appears in the list an even number of times. But there are lots of other ways to detect squares that don’t require factorization. Indeed, running on a machine that has a built-in square-rooter, the JavaScript code for recognizing perfect squares can be as simple as this:

function isSquare(N) {
  var root = Math.floor(Math.sqrt(N));
  return root * root === N;
}

If you want to test this code in the Number Factoids calculator, you might start with 999,999,961,946,176, which is the largest perfect square less than \(10^{15}\).

Note that the isSquare function is a predicate: The return statement in the last line yields a boolean value, either true or false. The program might well be more useful if it could report not only that 121 is a square but also what it’s the square of. But the Number Factoids program is just a proof of concept, so I have stuck to yes-or-no questions.

Metafactoids: The Factoids calculator tests nine boolean properties. No number can possess all of these properties, but 1 gets seven green checkmarks. Can any other number equal this score? The sequence of numbers that exhibit none of the nine properties begins 20, 44, 52, 68, 76, 88, 92,… Are they all even numbers? (Hover for answer.)


What about detecting triangular numbers? N is triangular if it is the sum of all the integers from 1 through k for some integer k. For example, \(2016 = 1 + 2 + 3 + \dots + 63\). Given k, it’s easy enough to find the kth triangular number, but we want to work in the opposite direction: Given N, we want to find out if there is a corresponding k such that \(1 + 2 + 3 + \cdots + k = N\).

Young Carl Friedrich Gauss knew a shortcut for calculating the sums of consecutive integers: \(1 + 2 + 3 + \cdots + k = N = k\,(k+1)\,/\,2\). We need to invert this formula, solving for the value of k that yields a specified N (if there is one). Rearranging the equation gives \(k^2 + k - 2N = 0\), and then we can crank up the trusty old quadratic formula to get this solution:

\[k = \frac{-1 \pm \sqrt{1 + 8N}}{2}.\]

Thus k is an integer—and N is triangular—if and only if \(8N + 1\) is an odd perfect square. (Let’s ignore the negative root, and note that if \(8N + 1\) is a square at all, it will be an odd one.) Detecting perfect squares is a problem we’ve already solved, so the predicate for detecting triangular numbers takes this simple form:

function isTriangular(N) {
  return isSquare(8 * N + 1);
}

Try testing it with 999,999,997,764,120, the largest triangular number less than \(10^{15}\).


Factorials are the multiplicative analogues of triangular numbers: If N is the kth factorial, then \(N = k! = 1 \times 2 \times 3 \times \cdots \times k\). Is there a multiplicative trick that generates factorials in the same way that Gauss’s shortcut generates triangulars? Well, there’s Stirling’s approximation:

\[k! \approx \sqrt{2 \pi k} \left( \frac{k}{e} \right)^k.\]

We might try to invert this formula to get a function of \(k!\) whose value is \(k\), but I don’t believe this is a promising avenue to explore. The reason is that Stirling’s formula is only an approximation. It predicts, for example, that 5! is equal to 118.02, whereas the true value is 120. Thus taking the output of the inverse function and rounding to the nearest integer would produce wrong answers. We could add correction terms to get a closer approximation—but surely there’s a better way.

One approach is to work with the gamma (\(\Gamma\)) function, which extends the concept of a factorial from the integers to the real and complex numbers; if \(n\) is an integer, then \(\Gamma(n+1) = n!\), but the \(\Gamma\) function also interpolates between the factorial values. A recent paper by Mitsuru Uchiyama gives an explicit, analytic, inverse of the gamma function, but I understand only fragments of the mathematics, and I don’t know how to implement it algorithmically.

Fifteen years ago David W. Cantrell came up with another inverse of the gamma function, although this one is only approximate. Cantrell’s version is much less intimidating, and it is based on one of my favorite mathematical gadgets, the Lambert W function. A Mathematica implementation of Cantrell’s idea works as advertised—when it is given the kth factorial number as input, it returns a real number very close to \(k+1\). However, the approximation is not good enough to distinguish true factorials from nearby numbers. Besides, JavaScript doesn’t come with a built-in Lambert W function, and I am loath to try writing my own.

On the whole, it seems better to retreat from all this higher mathematics and go back to the definition of the factorial as a product of successive integers. Then we can reliably detect factorials with a simple linear search, expressed in the following JavaScript function:

function isFactorial(N) {
  var d = 2, q = N, r = 0;
  while (q > 1 && r === 0) {
    r = q % d;
    q = q / d;
    d += 1;
  }
  return (q === 1 && r === 0);
}

A factorial is built by repeated multiplication, so this algorithm takes it apart by repeated division. Initially, we set \(q = N\) and \(d = 2\). Then we replace \(q\) by \(q / d\), and \(d\) by \(d + 1\), while keeping track of the remainder \(r = q \bmod d\). If we can continue dividing until \(q\) is equal to 1, and the remainder of every division is 0, then N is a factorial. This is not a closed-form solution; it requires a loop. On the other hand, the largest factorial less than \(10^{15}\) is 17! = 355,687,428,096,000, so the program won’t be going around the loop more than 17 times.


The Fibonacci numbers are dear to the hearts of number nuts everywhere (including me). The sequence is defined by the recursion \(F_0 = 0, F_1 = 1, F_k = F_{k-1} + F_{k-2}\). How best to recognize these numbers? There is a remarkable closed-form formula, named for the French mathematician J. P. M. Binet:

\[F_k = \frac{1}{\sqrt{5}} \left[ \left(\frac{1 + \sqrt{5}}{2}\right)^k - \left(\frac{1 - \sqrt{5}}{2}\right)^k\right]\]

I call it remarkable because, unlike Stirling’s approximation for factorials, this is an exact formula; if you give it an integer k and an exact value of \(\sqrt{5}\)), it returns the kth Fibonacci number as an integer.

One afternoon last week I engaged in a strenuous wrestling match with Binet’s formula, trying to turn it inside out and thereby create a function of N that returns k if and only if \(N\) is the kth Fibonacci number. With some help from Mathematica I got as far as the following expression, which gives the right answer some of the time:

\[k(N) = \frac{\log \frac{1}{2} \left( \sqrt{5 N^2 + 4} - \sqrt{5} N \right)}{\log \frac{1}{2} \left( \sqrt{5} - 1 \right)}\]

Plugging in a few values of N yields the following table of values for \(k(N)\):

N k(N)
1 2.000000000000001
2 3.209573979673092
3 4.0000000000000036
4 4.578618254581733
5 5.03325648737724
6 5.407157747499656
7 5.724476891770392
8 6.000000000000018
9 6.243411773788614
10 6.4613916654135615
11 6.658737112471047
12 6.8390081849422675
13 7.00491857188792
14 7.158583717787527
15 7.3016843734535035
16 7.435577905992959
17 7.561376165404197
18 7.680001269357004
19 7.792226410280063
20 7.8987062604216005
21 7.999999999999939

In each of the green rows, the function correctly recognizes a Fibonacci number \(F_k\), returning the value of k as an integer. (Or almost an integer; the value would be exact if we could calculate exact square roots and logarithms.) Specifically, 1 is the second Fibonacci number (though also the first), 3 is the fourth, 8 is the sixth, and 21 is the eighth Fibonacci number. So far so good. But there’s something weird going on with the other Fibonacci numbers in the table, namely those with odd-numbered indices (red rows). For N = 2, 5, and 13, the inverse Binet function returns numbers that are close to the correct k values (3, 5, 7), but not quite close enough. What’s that about?

If I had persisted in my wrestling match, would I have ultimately prevailed? I’ll never know, because in this era of Google and MathOverflow and StackExchange, a spoiler lurks around every cybercorner. Before I could make any further progress, I stumbled upon pointers to the work of Ira Gessel of Brandeis, who neatly settled the matter of recognizing Fibonacci numbers more than 40 years ago, when he was an undergraduate at Harvard. Gessel showed that N is a Fibonacci number iff either \(5N^2 + 4\) or \(5N^2 - 4\) is a perfect square. Gessel introduced this short and sweet criterion and proved its correctness in a problem published in The Fibonacci Quarterly (1972, Vol. 10, No. 6, pp. 417–419). Phillip James, in a 2009 paper, presents the proof in a way I find somewhat easier to follow.

It is not a coincidence that the expression \(5N^2 + 4\) appears in both Gessel’s formula and in my attempt to construct an inverse Binet function. Furthermore, substituting Gessel’s \(5N^2 - 4\) into the inverse function (with a few other sign adjustments) yields correct results for the odd-indexed Fibonacci numbers. Implementing the Gessel test in JavaScript is a cinch:

function gessel(N) {
  var s = 5 * N * N;
  return isSquare(s + 4) || isSquare(s - 4);
}

So that takes care of the Fibonacci numbers, right? Alas, no. Although Gessel’s criterion is mathematically unassailable, it fails computationally. The problem arises from the squaring of \(N\). If \(N\) is in the neighborhood of \(10^{15}\), then \(N^2\) is near \(10^{30}\), which is roughly \(2^{100}\). JavaScript does all of its arithmetic with 64-bit double-precision floating-point numbers, which allow 53 bits for representing the mantissa, or significand. With values above \(2^{53}\), not all integers can be represented exactly—there are gaps between them. In this range the mapping between \(N\) and \(N^2\) is no longer a bijection (one-to-one in both directions), and the gessel procedure returns many errors.

I had one more hope of coming up with a closed-form Fibonacci recognizer. In the Binet formula, the term \(((1 - \sqrt{5})\,/\,2)^k\) becomes very small in magnitude as k grows large. By neglecting that term we get a simpler formula that still yields a good approximation to Fibonacci numbers:

\[F_k \approx \frac{1}{\sqrt{5}} \left(\frac{1 + \sqrt{5}}{2}\right)^k.\]

For any integer k, the value returned by that expression is within 0.5 of the Fibonacci number \(F_k\), and so simple rounding is guaranteed to yield the correct answer. But the inverse function is not so well-behaved. Although it has no \(N^2\) term that would overflow the 64-bit format, it relies on square-root and logarithm operations whose limited precision can still introduce errors.

So how does the Factoids calculator detect Fibonacci numbers? The old-fashioned way. It starts with 0 and 1 and iterates through the sequence of additions, stopping as soon as N is reached or exceeded:

function isFibo(N) {
  var a = 0, b = 1, tmp;
  while (a < N) {
    tmp = a;
    a = b;
    b = tmp + b;
  }
  return a === N;
}

As with the factorials, this is not a closed-form solution, and its computational complexity scales in linear proportion to N rather than being constant regardless of N. There are tricks for speeding it up to \(\log N\); Edsger Dijkstra described one such approach. But optimization hardly seems worth the bother. For N < \(10^{15}\), the while loop cannot be executed more than 72 times.


I’ve included two more sequences in the Factoids calculator, just because I’m especially fond of them. The Catalan numbers (1, 1, 2, 5, 14, 42, 132, 429…) are famously useful for counting all sorts of things—ways of triangulating a polygon, paths through the Manhattan street grid, sequences of properly nested parentheses. The usual definition is in terms of binomial coefficients or factorials:

\[C_k = \frac{1}{k+1} \binom{2k}{k} = \frac{(2k)!}{(k+1)! k!}\]

But there is also a recurrence relation:

\[C_0 = 1,\qquad C_k = \frac{4k-2}{k+1} C_{k-1}\]

The recognizer function in the Factoids Calculator does a bottom-up iteration based on the recurrence relation:

function isCatalan(N) {
  var c = 1, k = 0;
  while (c < N) {
    k += 1;
    c = c * (4 * k - 2) / (k + 1);
  }
  return c === N;
}

The final sequence in the calculator is one of several discovered by Michael Somos around 1980. It is defined by this recurrence:

\[S_0 = S_1 = S_2 = S_3 = 1,\qquad S_k = \frac{S_{k-1} S_{k-3} + S_{k-2}^2}{S_{k-4}}\]

The surprise here is that the elements of the sequence are all integers, beginning 1, 1, 1, 1, 2, 3, 7, 23, 59, 314, 1529, 8209. In writing a recognizer for these numbers I have made no attempt to be clever; I simply generate the sequence from the beginning and check for equality with the given N:

function isSomos(N) {
  var next = 1, S = [1, 1, 1, 1];
  while (next < N) {
    next = (S[3] * S[1] + S[2] * S[2]) / S[0];
    S.shift();
    S.push(next);
  }
  return next === N;
}

But there’s a problem with this code. Do you see it? As N approaches the \(10^{15}\) barrier, the subexpression S[3] * S[1] + S[2] * S[2] will surely break through that barrier. In fact, the version of the procedure shown above fails for 32,606,721,084,786, the largest Somos-4 number below \(10^{15}\). For the version of the program that’s actually running in the Factoids calculator I have repaired this flaw by rearranging the sequence of operations. (For details see the GitHub repository.)


The factorial, Fibonacci, Catalan, and Somos sequences all exhibit exponential growth, which means they are sprinkled very sparsely along the number line. That’s why a simple linear search algorithm—which just keeps going until it reaches or exceeds the target—can be so effective. For the same reason, it would be easy to precompute all of these numbers up to \(10^{15}\) and have the JavaScript program do a table lookup. I have ruled out this strategy for a simple reason: It’s no fun. It’s not sporting. I want to do real computing, not just consult a table.

Other number series, such as the square and triangular numbers, are more densely distributed. There are more than 30 million square and triangular numbers up to \(10^{15}\); downloading a table of that size would take longer than recomputing quite a few squares and triangulars. And then there are the primes—all 29,844,570,422,669 of them.

What would happen if we broke out of the 64-bit sandbox and offered to supply factoids about larger numbers? A next step might be a Megafactoids calculator that doubles the digit count, accepting integers up to \(10^{30}\). Computations in this system would require multiple-precision arithmetic, capable of handling numbers with at least 128 bits. Some programming languages offer built-in support for numbers of arbitrary size, and libraries can add that capability to other languages, including JavaScript. Although there is a substantial speed penalty for extended precision, most of the algorithms running in the Factoids program would still give correct results in acceptable time. In particular, there would no problem recognizing squares and triangulars, factorials, Fibonaccis, Catalans and Somos-4 numbers.

The one real problem area in a 30-digit factoid calculator is factoring. Trial division would be useless; instead of milliseconds, the worst-case running time would be months or years. However, much stronger factoring algorithms have been devised in the past 40 years. The algorithm that would be most suitable for this purpose is called the elliptic curve method, invented by Hendrik Lenstra in the 1980s. An implementation of this method built into PARI/GP, which in turn is built into Sage, can factor 30-digit numbers in about 20 milliseconds. A JavaScript implementation of the elliptic curve method seems quite doable. Whether it’s worth doing is another question. The world is not exactly clamoring for more and better factoids.

Addendum 2016-02-05: I’ve just learned (via Hacker News) that I may need to add a few more recognition predicates: detectors for “artisanal integers” in flavors hand-crafted in the Mission District, Brooklyn, and London.


Posted in computing, mathematics | 5 Comments

Carnival of Mathematics #130

Welcome to the 130th monthly Carnival of Mathematics, a roving blog organized by The Aperiodical. The previous edition was hosted by GanitCharcha. This month’s collection includes material appearing on the Web between December 10 and January 9 (with a bit of spillover on both ends).

Tradition obliges me to say something interesting about the number 130. I could mention that 130 = 5 × 5 × 5 + 5. Is that interesting? Meh, eh? The Wikipedia page for 130 reports the following curious observation:

\(130\) is the only integer that is the sum of the squares of its first four divisors, including \(1\): \(1^2 + 2^2 + 5^2 + 10^2 = 130\).

What’s interesting here is not the fact that 130 has this property. The provocative part is the statement that 130 is the only such integer. How can one know this? It would be easy enough to check that no smaller number qualifies, but how do you rule out the possibility that somewhere far out on the number line there lurks another example? Clearly, what’s needed is a proof, but Wikipedia offers none. I spent some time trying to come up with a proof myself but failed to put the pieces together. Maybe you’ll do better. If not, Robert Munafo explains all, and traces the question to the Russian website dxdy.ru.


Enough of 130. I’d like to celebrate another number of the season: 2016. As midnight approached on December 31, Twitter brought this news:

11111011111 + 1 = 11111100000. Can't wait for 11111111111 (2047)

In other words, the year we are now beginning is equal to \(2^{11} - 2^{5}\), and in another 32 years we’ll be celebrating the turn of a binary millennium, marking the passage of two kibiyears. (Year-zero deniers are welcome to wait an extra year.)

Fernando Juan, with an animated GIF, pointed out another nonobvious fact: \(2016 = 3^3 + 4^3 + 5^3 + 6^3 + 7^3 + 8^3 + 9^3\).

Even more noteworthy, May-Li Khoe and Federico Ardillo gave graphic proof that 2016 is a triangular number:

2016 dots arranged to form a equilateral triangle

Specifically, 2016 is the sum of the integers \(1 + 2 + 3 + \cdots + 63\), making it the \(63\)rd element of the sequence \(1, 3, 6, 10, 15, \ldots\). You could go bowling with \(2016\) pins!

According to a well-known anecdote, Carl Friedrich Gauss was a schoolboy when he discovered that the nth triangular number is equal to \(n(n + 1) / 2\), which is also the value of the binomial coefficient \(\binom{n + 1}{2}\). Hence 2016 is the number of ways of choosing two items at a time from a collection of 64 objects. A tweet by John D. Cook offers this interpretation: “2016 = The number of ways to place two pawns on a chessboard.”

Patrick Honner expressed the same idea another way: 2016 is the number of edges in a complete graph with 64 vertices.

The complete graph on 64 vertices, with 2016 edges.

If you attended a New Years Eve party with 64 people present, and at midnight every guest clinked glasses with every other guest, there were 2016 clinks in all. (And probably a lot of broken glassware.)

In a New Years story of another kind, Tim Harford warns of the cost of overconfidence when it comes to making resolutions, as when you buy a year’s gym membership and stop going after six weeks. “Some companies base their business models on our tendency to overestimate our willpower.”


Triangular numbers turn up in another holiday item as well. A video by Tipping Point Math offers a quantitative analysis of the gift-giving extravaganza in “The Twelve Days of Christmas.” On the \(n\)th day of Christmas you receive \(1 + 2 + 3 + \cdots + n\) gifts, which is clearly the triangular number \(T_n\). But what’s the total number of gifts when you add up the largesse over \(n\) days? These sums of triangular numbers are the tetrahedral numbers: \(1, 4, 10, 20, 35 \ldots\); the twelfth tetrahedral number is \(364\). The video shows where to find all these numbers in Pascal’s triangle.

Vince Knight, in Un Peu de Math, approaches the Christmas gift exchange as a problem in game theory. Two friends agree not to exchange gifts, but then they are both tempted to renege on the agreement and buy a present afterall. This situation is a variant of the game-theory puzzle known as prisoner’s dilemma—which sounds like a rather grim view of holiday tradition. But in fact the conclusion is cheery: “People enjoy giving gifts a lot more than receiving them.”

The December holiday season also brings Don Knuth’s annual Christmas lecture, which has been posted on YouTube. In previous years, this talk was the Christmas Tree lecture, because it touched on some aspect of trees as a data structure or a concept in graph theory. Now Knuth has branched out from the world of trees. His subject is comma-free codes—sets of words that can be jammed together without commas or spaces to mark the boundaries between words, and yet still be read without ambiguity. A set that lacks the comma-free property is {tea, ate, eat}, because a concatenation such as teateatea has within it not just tea,tea,tea but also eat,eat and ate,ate. But the words {sat, set, tea} do qualify as comma-free, because no sequence of the words can be partitioned in more than one way to yield words in the set.

The idea of comma-free codes arose in the late 1950s, when it was thought that the genetic code might have a comma-free structure. Experiments soon showed otherwise, and interest in comma-free codes waned. But Knuth has rediscovered a paper by Willard Eastman, published in 1965, that constructs an algorithm for generating a comma-free code (if possible) from a given set of symbols. Knuth’s lecture demonstrates the algorithm and gives a computer implementation.


Stephen Wolfram with a replica of Babbage’s difference
engine and a portrait of Ada Lovelace. Image courtesy
of Stephen Wolfram.

December 10, 2015, was the 200th birthday of Ada, Countess of Lovelace, who collaborated with Charles Babbage on the proposed computing machine called the analytic engine. The anniversary was commemorated in several ways, including an exhibition at the Science Museum in London, but here I want to call attention to a 12,000-word biographical essay by Stephen Wolfram, the creator of Mathematica.

The stories of Lovelace and Babbage have been told many times, but Wolfram’s account is worth reading even if you already know how the plot comes out. The principles of the machines and the algorithm that Lovelace devised as a demo (a computation of Bernoulli numbers) are described in detail, but the heart of the story is the human drama. Today we see these two figures as pioneers and heroes, but their lives were tinged with frustration and disappointment. Lovelace, who died at age 36, never got a proper opportunity to show off her ideas and abilities; in her one published work, all of her original contibutions were relegated to footnotes. Babbage in his later years felt aggrieved with the world for failing to support his vision. Wolfram makes an interesting biographer of this pair, never hesitating to see his subjects reflected in his own experience, and vice versa.


Back in the summer of 2012 Shinichi Mochizuki of Kyoto University released four long papers that claim to resolve an important problem in number theory called the abc conjecture. (I’m not going to try to explain the conjecture here; I did so in an earlier post.) More than three years later, no one in the mathematical community has been able to understand Mochizuki’s work well enough to verify that it is indeed a proof. In December more than 50 experts gathered at the University of Oxford to try to make progress on breaking the impasse. Brian Conrad of Stanford University, one of the workshop participants, wrote up his notes as a guest post in Cathy O’Neil’s Mathbabe blog. This is an insider’s account, and parts of the discussion will not make much sense unless you have fairly deep background in modern number theory. (I don’t.) But that inaccesibility illustrates the point, in a way. Number theorists themselves are in the same situation with regard to the Mochizuchi papers, which are clotted with idiosyncratic concepts such as Inter-universal Teichmuller Theory (IUT). Conrad writes:

After [Kiran] Kedlaya’s lectures, the remaining ones devoted to the IUT papers were impossible to follow without already knowing the material: there was a heavy amount of rapid-fire new notation and language and terminology, and everyone not already somewhat experienced with IUT got totally lost…. Persistent questions from the audience didn’t help to remove the cloud of fog that overcame many lectures in the final two days. The audience kept asking for examples (in some instructive sense, even if entirely about mathematical structures), but nothing satisfactory to much of the audience along such lines was provided.

It’s still not clear when or how the status of the proof will be resolved. O’Neil herself has taken a stern position on the issue of inpenetrable purported proofs. When the Mochizuchi papers were first released, she wrote: “If I claim to have proved something, it is my responsibility to convince others I’ve done so; it’s not their responsibility to try to understand it (although it would be very nice of them to try).”

Anthony Bonato, the Intrepid Mathematician, offers a friendlier introduction to the abc conjecture and its consequences. Also see comments on the conjecture and the workshop by Evelyn Lamb in the Scientific American blog Roots of Unity and by Anna Haensch in an American Mathematical Society blog.


Another number-theory event that attracted wide notice in recent weeks was the successful defense and publication of a doctoral thesis at Princeton University by Piper Harron. The title is “The Equidistribution of Lattice Shapes of Rings of Integers of Cubic, Quartic, and Quintic Number Fields: an Artist’s Rendering,” and it’s a great read (PDF). Here is the abstract:

A fascinating tale of mayhem, mystery, and mathematics. Attached to each degree \(n\) number field is a rank \(n\, - 1\) lattice called its shape. This thesis shows that the shapes of \(S_n\)-number fields (of degree \(n = 3, 4,\) or \(5)\) become equidistributed as the absolute discriminant of the number field goes to infinity. The result for \(n = 3\) is due to David Terr. Here, we provide a unified proof for \(n = 3, 4,\) and \(5\) based on the parametrizations of low rank rings due to Bhargava and Delone–Faddeev. We do not assume any of those words make any kind of sense, though we do make certain assumptions about how much time the reader has on her hands and what kind of sense of humor she has.

This is not your grandmother’s doctoral thesis! Harron interleaves sections of “laysplaining” and “mathsplaining,” and illustrates her work with an abundance of metaphors, jokes, asides to the reader, cartoons, and commentary on the course of her life during the 10 years she spent writing the thesis (e.g., a brief interruption to give birth). In terms of expository style, Mochizuki and Harron stand at opposite poles—a fact noted by at least two bloggers. Both of the posts by Evelyn Lamb and by Anna Haensch mentioned above in connection with Mochizuki also discuss Harron’s work.

Harron has a blog of her own with a post titled “Why I Do Not Talk About Math,” and she has written a guest post for Mathbabe with this description of her thesis:

My thesis is this thing that was initially going to be a grenade launched at my ex-prison, for better or for worse, and instead turned into some kind of positive seed bomb where flowers have sprouted beside the foundations I thought I wanted to crumble.

Reading her accounts of life as “an escaped graduate student,” I am in equal measures amused and horrified (not a comfortable combination). If nothing else, Harron’s “thesis grenade” promises to broaden—to diversify—the discussion of diversity in mathematical culture. It’s not just about race and gender (although Harron cares passionately about those issues). Human diversity also ranges across many other dimensions: modes of reasoning, approaches to learning, cultural contexts, styles of explaining, and ways of living. Harron’s thesis is a declaration that one can do original research-level mathematics without adopting the vocabulary, the styles, the attitudes, and the mental apparatus of one established academic community.


If I show you a cube, you can easily place it in a three-dimensional cartesian coordinate system in such a way that all the vertices have rational \(x,y,z\) coordinates. By scaling the cube, you can make all the coordinates integers. The same is true of the regular tetrahedron and the regular octahedron, but for these objects the scaling factor includes an irrational number, \(\sqrt{2}\). For the dodecahedron and the icosahedron, some of the vertex coordinates are themselves irrational no matter how the figure is scaled; the irrational number \(\varphi = (1 + \sqrt{5})\, /\, 2\) plays an essential role in the geometry.

This intrusion of irrationality into geometry troubled the ancients, but we seem to have gotten used to it by now. However, David Eppstein, a.k.a \(0\)xDE, writes about a class of polyhedra I still find deeply disconcerting: “Polyhedra whose vertex coordinates have no closed form formula.” In this context a closed-form formula is a mathematical expression that can be evaluated in a finite number of operations. There’s no universal agreement on exactly what operations are allowed; Eppstein works with computational models that allow the familiar operations \(+, -, \times, \div\) as well as finding roots of polynomials. He constructs a polyhedron—it looks something like the teepee his children used to play with—whose vertex coordinates cannot all be calculated by a finite sequence of these operations.

With this development we land in territory even stranger than that of the irrational numbers. I cannot draw an exact equilateral triangle on a computer screen because at least one of the coordinates is irrational; nevertheless, I can tell you that the vertex ought to have the coordinate \(y = 1 / \sqrt{3}\). In Eppstein’s polyhedron I can’t give you any such compact, digestible description of where the vertex belongs; the best anyone can offer is a program for approximating it.


Brent Yorgey in The Math Less Traveled offers an unlikely question: What’s the best way to read a manuscript printed on three-sided paper? If you have a sheaf of unbound pages printed on just one side, the obvious procedure is to read the top sheet, then shuffle it to the bottom of the heap, and continue until you come back to page 1. If the pages are printed on two sides, life gets a little more complicated. You can read the top page, flip it over and read the back, then flip it again and move it to the bottom of the sheaf. An alternative (attributed to John Horton Conway) is to read the front page, move it to the bottom of the heap, then flip over the entire stack to read the back side; then flip the stack again to read the front of the following page. In either case, you must alternate two distinct operations, which means you must somehow keep track of which move comes next.

The unsolved problem is to find the best algorithm when the pages are printed on three sides. “You may not be familiar with triple-sided sheets of paper,” Yorgey writes, “so here’s how they work: they stack nicely, just like regular sheets of paper, but you have to flip one over three times before you get back to the original side.”

Yorgey gives some criteria that a successful algorithm ought to satisfy, but the question remains open.


Mr. Honner takes up a problem encountered at a Math for America banquet:

Suppose you are standing several miles from the Pentagon. What is the probability you can see three sides of the building?

He discovers that it’s one of those cases where interpreting the problem is as much of a challenge as finding the answer. “In particular, it’s a reminder of how the different ways we model random selection can make for big differences in our solutions!”


Long-tailed data distributions—where extreme values are more common than they would be in, say, a normal distribution—are notoriously tricky. John D. Cook points out that long tails become even more treacherous when the variables are discrete (e.g., integers) rather than continuous values.

Suppose \(n\) data points are drawn at random from a distribution where each possible value \(x\) appears with frequency proportional to \(x^{-\alpha}\). Working from the data sample, we want to estimate the value of the exponent \(\alpha\). If \(x\) is a continuous variable, there’s a maximum-likelihood formula that generally works well: \(\hat{\alpha} = 1 + n\, / \sum \log x\). But with discrete variables, the same method leads to disastrous errors. In a test case with \(\alpha = 3\), the formula yields \(\hat{\alpha} = 6.87\). But we needn’t lose hope. Cook presents another approach that gives quite reliable results.


A post on DRHagen.com tackles a mystery found in an xkcd cartoon:

Calendar of meaningful dates, by Randall Munroe (xkcd). License: Creative Commons Attribution-NonCommercial 2.5.

The size of each date is proportional to its frequency of occurrence in the Google Books ngram database for English-language books published since 2000. September 11 is clearly an outlier. That’s not the mystery. The mystery is that the 11th of most other months is noticeably less common than other dates. The cartoonist, Randall Munroe, was puzzled by this; hover over the image to see his message. Hagen convincingly solves the mystery. I’m not going to give away the solution, and I urge you to try to come up with a hypothesis of your own before following the link to DRHagen.com. And if you get that one right, you might work on another calendrical anomaly—that the 2nd, 3rd, 22nd, and 23rd are underrepresented in books printed before the 20th century—which Hagen solves in a followup post.


A shiny new blog called Off the Convex Path, with contributions from Sanjeev Arora, Moritz Hardt, and Nisheeth Vishnoi, “is dedicated to the idea that optimization methods—whether created by humans or nature, whether convex or nonconvex—are exciting objects of study and often lead to useful algorithms and insights into nature.” A December 21 post by Vishnoi looks at optimization mthods in the guise of dynamical systems—specifically, systems that converge to some set of fixed points. Vishnoi gives two examples, both from the life sciences: a version of Darwinian evolution, and a biological computer in which slime molds solve linear programming problems.


In the Comfortably Numbered blog, hardmath123 writes engagingly and entertainingly about numerical coincidences, like this one:

\(1337^{47168026} \approx \pi \cdot 10^{147453447}\) to within \(0.00000001\%\). It begins with the digits \(31415926 \ldots\).

The existence of such coincidences should not be a big surprise. As hardmath123 writes, “Since the rationals are dense in the reals, we can find a rational number arbitrarily close to any real number.” The question is how to find them. The answer involves continued fractions and the Dirichlet approximation theorem.


For the final acts of this carnival, we have a few items on mathematics in the arts, crafts, games, and other recreations.

In the Huffington Post arts and culture department, Dan Rockmore takes a mathematical gallery walk in New York. The first stop is the Whitney Museum of American Art, showing a retrospective of the paintings of Frank Stella, some of whose canvases are nonrectilinear, and even nonconvex. In another exhibit mathematics itself becomes art, with 10 equations and other expressions calligraphically rendered by 10 noted mathematicians and scientists.

Katherine, writing on the blog Will Knit for Math, tells how “Being a mathematician improves my knitting.” It’s not just a matter of counting stitches (although a scheme for counting modulo 18 does enter into the story). I hope we’ll someday get a followup post explaining how “Being a knitter improves my mathematics.”

Chelsea VanderZwaag, a student majoring in mathematics and elementary education, visits an arts-focused school school, where a lesson on paper-folding and fractions blends seamlessly into the curriculum. (An earlier post on the problem of creating a shape by folding paper and making a single cut with scissors provides a little more background.)

Dobble cardOn Medium, A Woman in Technology writes about Dobble, a combinatorial card game. “My children got this game for Christmas. We haven’t played it yet. We got as far as me opening the tin and reading the rules, at which point I got distracted by the maths and forgot about the game.”

There are 55 cards, each bearing eight symbols, and each card shares exactly one symbol with each of the other cards. How many distinct symbols do you need to construct such a deck of cards. The game instructions say there are 50, but the author determines through hard work, scribbling, and spreadsheets that the instructions are in error. It all comes to a happy end with the formula \(n^2 - n + 1\) and a suggested improvement to the game that the manufacturer should heed.


That’s it for Carnival of Math 130. My apologies to a few contributors whose interesting work I just couldn’t squeeze in here.

Next month the carnival makes a hometown stop with Katie at The Aperiodical.

Posted in mathematics | 4 Comments