Pumping the Primes

On your desktop is a black box. Actually it’s an orange box, because black boxes are usually painted “a highly visible vermilion colour known as international orange.” In any case, it’s an opaque box: You can’t see the whirling gears or the circuit boards or whatever else might be inside. There’s another, secret input: The tiny hole in the bottom right corner of the box. If you poke a paperclip into the hole, it will reset the mechanism.You have access only to the inputs and outputs. The input is a button labeled Next. The output is an adding machine tape.

Go ahead: Press the button. A number is printed on the tape. Press again and another number appears. Keep going. A few more. Notice anything special about those numbers? The sequence begins:

5, 3, 11, 3, 23, 3, 47, 3, 5, 3, 101, 3, 7, 11, 3, 13, 233, 3, 467, 3, 5, 3, . . .oeis.org/A137613

Yep, they’re all primes. They are not in canonical order, and some of them appear more than once, but every number in the list is certifiably indivisible by any number other than 1 and itself. Does the pattern continue? Yes, there’s a proof of that. Do all primes eventually find a place in the sequence? The very first prime, 2, is missing. Whether all odd primes eventually turn up remains a matter of conjecture. On the other hand, it’s been proved that infinitely many distinct primes are included.


So what’s inside the box? Here’s the JavaScript function that calculates the numbers printed on the tape. There’s not much to it:

  var n = 2, a = 7;   // initial values
  
  function nextG() {
    var g = gcd(n, a);
    n = n + 1;
    a = a + g;
    return g;
  }

The function gcd(n, a) computes the greatest common divisor of n and a. As it happens, gcd is not a built-in function in JavaScript, but there’s a very famous algorithm we can easily implement:

  function gcd(x, y) {
    while (y > 0) {
      var rem = x % y;    // remainder operator
      x = y;
      y = rem;
    }
    return x;
  }

The value returned by nextG is not always a prime, but it’s always either \(1\) or a prime. To see the primes alone, we can simply wrap nextG in a loop that filters out the \(1\)s. The following function is called every time you press the Next button on the orange black boxSee the rest of the source code on GitHub.:

  function nextPrime() {
    var g;
    do g = nextG() while (g === 1);     // skip 1s
    return g;
  }

For a clearer picture of where those primes (and \(1\)s) are coming from, it helps to tabulate the successive values of the three variables n, a, and g.

n    2   3   4   5   6   7   8   9  10   11   12   13   14   15   16   17   18   19   20   21   22   23
a    7   8   9  10  15  18  19  20  21   22   33   36   37   38   39   40   41   42   43   44   45   46
g    1   1   1   5   3   1   1   1   1   11    3    1    1    1    1    1    1    1    1    1    1   23

From the given initial values \(n = 2\), \(a = 7\), we first calculate \(g = \gcd(2, 7) = 1\). Then \(n\) and \(a\) are updated: \(n = n + 1\), \(a = a + g\). On the next round the gcd operation again yields a \(1\): \(g = \gcd(3, 8) = 1\). But on the fourth iteration we finally get a prime: \(g = \gcd(5, 10) = 5\). The assertion that \(g\) is always either \(1\) or a prime is equivalent to saying that \(n\) and \(a\) have at most one prime factor in common.


This curious generator of primes was discovered in 2003, during a summer school exploring Stephen Wolfram’s “New Kind of Science.” A group led by Matthew Frank investigated various nested recursions, including this one:

\[a(n) = a(n-1) + gcd(n, a(n-1)).\]

With the initial condition \(a(1) = 7\), the sequence begins:

7, 8, 9, 10, 15, 18, 19, 20, 21, 22, 33, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 69, . . .oeis.org/A106108

Participants noticed that the sequence of first differences — \(a(n) - a(n-1)\) — seemed to consist entirely of \(1\)s and primes:

1, 1, 1, 5, 3, 1, 1, 1, 1, 11, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 23, 3, . . .oeis.org/A132199

Stripping out the \(1\)s, the sequence of primes is the same as that generated by the orange black box:

5, 3, 11, 3, 23, 3, 47, 3, 5, 3, 101, 3, 7, 11, 3, 13, 233, 3, 467, 3, . . .oeis.org/A137613

During the summer school, Frank and his group computed 150 million elements of the sequence and observed no composite numbers, but their conjecture that the value is always \(1\) or prime remained unproved. One of the students present that summer was Eric S. Rowland, who had just finished his undergraduate studies and was about undertake graduate work with Doron Zeilberger at Rutgers. In 2008 Rowland took another look at the gcd-based prime generator and proved the conjecture.

The sequence beginning with \(a(1) = 7\) is not unique in this respect. Rowland’s proof applies to sequences with many other initial conditions as well—but not to all of them. For example, with the initial condition \(a(1) = 3765\), the list of “primes” begins:

53, 5, 57, 5, 9, 13, 7, 71, 3, 41, 3, 4019, 3, 8039, . . .

Neither 57 nor 9 is a prime.

A number of other mathematicians have since elaborated on this work. Vladimir Shevelev gave an alternative proof and clarified the conditions that must be met for the proof to apply. Fernando Chamizo, Dulcinea Raboso, and Serafín Ruiz-Cabello showed that even if a sequence includes composites, there is a number \(k\) beyond which all entries \(a(k)\) are \(1\) or prime. Benoit Cloitre explored several variations on the sequence, including one that depends on the least common multiple (lcm) rather than the greatest common factor; the lcm sequence is discussed further in a recent paper by Ruiz-Cabello.


Should we be surprised that a simple arithmetic procedure—two additions, a gcd, and an equality test—can pump out an endless stream of pure primality? I have been mulling over this question ever since I first heard about the Rowland sequence. I’m of two minds.

Part of the mystique of the primes is their unpredictability. We can estimate how many primes will be found in any given interval of the number line, and we can compile various other summary statistics, but no obvious rule or algorithm tells us exactly where the individual primes fall within the interval. For more on randomness and the primes, see the erudite essay of Terry Tao.There’s nothing random or indeterminate about the distribution of the primes, and yet, like a random sequence, the sequence of primes seems incompressible. The most concise way of describing the primes is just to list them. Thus a simple and compact formula that generates only primes seems a bit like magic.

But there’s another side to this story.See Underwood Dudley, “Formulas for Primes,” Mathematics Magazine, Vol. 56, No. 1 (Jan., 1983), pp. 17–22. It turns out the woods are full of prime-generating formulas. There’s this mysterious-looking formula for the nth prime, published in 1971 by J. M. Gandhi:

\[p_n = \left\lfloor 1 - \log_2 \left( -\frac{1}{2} + \sum_{d|P_{n-1}} \frac{\mu(d)}{2^d - 1} \right) \right \rfloor.\]

Here \(P_n\) is the primorial product \(p_{1}p_{2}p_{3} \ldots p_{n}\) and \(\mu\) is the Möbius function. (If you don’t know what the Möbius function is or why you should care, Peter Sarnak explains it all.)

Way back in 1947, W. H. Mills offered a formula with just three symbols and a pair of floor brackets. He proved that a real number \(A\) exists such that

\[\left \lfloor A^{3^{n}}\right \rfloor\]

is prime for all positive integers \(n\). One possible value oeis.org/A051021 of \(A\) lies somewhere in the neighborhood of 1.306377883863. The sequence of primes derived from that value begins:

2, 11, 1361, 2521008887, 16022236204009818131831320183, . . .oeis.org/A051254

A third example brings us back to the gcd function. For all \(n > 1\), \(n\) is prime if and only if \[\gcd((n - 1)!, n) = 1.\]

From this fact we can craft an algorithm that generates all the primes (and only the primes) in sequence.

The trouble with all these formulas is that they require prior knowledge of the primes, or else they have such knowledge already hidden away inside them. Solomon Golomb showed that Gandhi’s formula is just a disguised version of the sieve of Eratosthenes. The Mills formula requires us to calculate the constant \(A\) to very high accuracy, and the only known way to do that is to work backward from knowledge of the primes. As for \(\gcd((n - 1)!, n) = 1\), it’s really more of a joke than a formula; it just restates the definition that n is prime iff no integer greater than 1 divides it.

Underwood Dudley opined that formulas for the primes range “from worthless, to interesting, to astonishing.” That was back in 1983, before the Rowland sequence was known. Where shall we place this new formula on the Dudley spectrum?

Rowland argues that the sequence differs from the Gandhi and Mills formulas because it “is ‘naturally occurring’ in the sense that it was not constructed to generate primes but simply discovered to do so.” This statement is surely true historically. The group at the Wolfram summer school did not set out to find a prime generator but just stumbled upon it. However, perhaps the manner of discovery is not the most important criterion.

Let’s look again at what happens when the procedure NextG is invoked repeatedly, each time returning either \(1\) or a prime.

n    2   3   4   5   6   7   8   9  10   11   12   13   14   15   16   17   18   19   20   21   22   23
a    7   8   9  10  15  18  19  20  21   22   33   36   37   38   39   40   41   42   43   44   45   46
g    1   1   1   5   3   1   1   1   1   11    3    1    1    1    1    1    1    1    1    1    1   23

In the \(g\) row we find occasional primes or groups of consecutive primes separated by runs of \(1\)s. If the table were extended, some of these runs would become quite long. A good question to consider is how many times NextG must be called before you can expect to see a prime of a certain size—say, larger than 1,000,000. There’s an obvious lower bound. The value of \(gcd(n, a)\) cannot be greater than either \(n\) or \(a\), and so you can’t possibly produce a prime greater than a million until \(n\) is greater than a million. Since \(n\) is incremented by \(1\) on each call to NextG, at least a million iterations are needed. And that’s just a lower bound. As it happens, the Rowland sequence first produces a prime greater than 1,000,000 at \(n =\) 3,891,298; the prime is 1,945,649.

The need to invoke NextG at least \(k\) times to find a prime greater than \(k\) means that the Rowland sequence is never going to be a magic charm for generating lots of big primes with little effort. As Rowland remarks, “a prime \(p\) appears only after \(\frac{p - 3}{2}\) consecutive \(1\)s, and indeed the primality of \(p\) is being established essentially by trial division.”

Rowland also points out a shortcut, which is best explained by again printing out our table of successive \(n, a, g\) values, with an extra row for some \(a - n\) values:

n    2   3   4   5   6   7   8   9  10   11   12   13   14   15   16   17   18   19   20   21   22   23
a    7   8   9  10  15  18  19  20  21   22   33   36   37   38   39   40   41   42   43   44   45   46
g    1   1   1   5   3   1   1   1   1   11    3    1    1    1    1    1    1    1    1    1    1   23

a–n  5   5   5          11  11  11  11             23   23   23   23   23   23   23   23   23   23

Within each run of \(1\)s, \(a-n\) is a constant—necessarily so, because both \(a\) and \(n\) are being incremented by \(1\) on each iteration. What’s more, based on what we can see in this segment of the sequence, the value of \(a-n\) during a run of \(1\)s is equal to the next value of \(n\) that will yield a nontrivial gcd. This observation suggests a very simple way of skipping over all those annoying little \(1\)s. Whenever \(gcd(a, n)\) delivers a \(1\), set \(n\) equal to \(a-n\) and increment \(a\) by \(a - 2n\). Here are the first few values returned by this procedure:

5, 3, 11, 3, 23, 3, 47, 3, 95, 3, . . .

Uh oh. 95 is not my idea of a prime number. It turns out the shortcut only works when \(a-n\) is prime. To repair the defect, we could apply a primality test to each value of \(a-n\) before taking the shortcut. But if we’re going to build a primality test into our prime generator, we might as well use it directly to choose the primes.

It seems we are back where we began, and no closer to having a practical prime generator. Nevertheless, on Dudley’s scale I would not rank this idea as “worthless.” When you take a long hike around the shore of a lake, you eventually wind up exactly where you started, but that does not make the trip worthless. Along the way you may have seen something interesting, or even astonishing.

Posted in computing, mathematics | 11 Comments