Gruenberger’s prime path

Fred Gruenberger may well have been the first blogger on computational topics. When he was writing, back in the 1970s, there was no RSS, and so he distributed his musings in a monthly newsletter called Popular Computing. A typical issue was 16 or 20 typewritten pages–stapled, folded, stamped and delivered by mail. It was always worth reading.

Gruenberger had been working and playing with computers since the 1940s. For a long stretch he was at the RAND Corporation, the famous think tank in Santa Monica. Later he taught at Cal State Northridge. In addition to Popular Computing he was involved in the startup of Datamation magazine and published at least a dozen books. I haven’t been able to learn much about his later years; he died in 1998.

A slogan that appeared in some issues of Popular Computing proclaimed: “The way to learn computing is to compute.” I took this advice to heart, although I was hampered by a total lack of hardware. Later on I acquired a programmable calculator, which helped on some of the problems and exercises.

Problem 149, from Popular Computing Vol. 4 No. 12, December 1976

The problem reproduced above appeared in the December 1976 issue of Popular Computing (Vol. 4, No. 12). At the time, I made no attempt to work this one out, but evidently the problem seemed interesting enough to be worth filing away. When I came upon the old clipping recently, I gave it a closer look and realized I have no idea how to answer Gruenberger’s question, though the impediment now is not lack of hardware.

Gruenberger asks us to trace a planar path whose steps are indexed by the odd integers starting at 3. For each number N we turn right 90 degrees before taking a step if N is a prime congruent to 1 mod 6; we turn left 90 degrees before moving one unit if N is a prime congruent to −1 mod 6; otherwise we continue straight ahead in whatever direction we happen to be facing.

In his typewriter graphics, Gruenberger plotted the trajectory from N=3 through 97. Below I continue the path through N=199.

But something’s amiss here. Gruenberger wrote:

Eventually the path will cross itself, so that the cell containing 111 will also contain 147. Similarly, one cell will contain both 91 and 179.

Those two self-intersections are nowhere to be found in the diagram. When I first noticed this discrepancy, I assumed I must have made a mistake somewhere. (This eagerness to blame myself is not mere knee-jerk humility; I have years of experience to back it up.) Eventually, though, I concluded that it was Gruenberger who had made the wrong turn. I believe he mistakenly went left at 127, as shown in the brown trail below:

The brown continuation of the red path includes the two coincidences mentioned in Gruenberger’s problem statement. But the left turn at N=127 is incorrect, because 127 is a prime equal to (6×21)+1, and thus it should specify a right turn. The error is of no great consequence, but it does reveal something interesting: Gruenberger must have been plotting these paths by hand. Most likely he wrote a program to compute the series of residue classes, then traced out the trajectory on squared paper.

Setting aside this anomaly, Gruenberger was quite right that the path does intersect itself. Here’s the trail continued through N=1,001:

And if that’s not tangled enough, here’s what it looks like at N=10,001:

Gruenberger asks for “a list of the contents of those cells containing more than one number, arranged in the order of the smallest number in the cell.” It’s not hard to identify some cells that belong on such a list. The table below includes all multiply-occupied cells discovered when tracing the path up to N=1,001, sorted as Gruenberger requests:

                   x    y    values of N
                 -11   28    (137 337)
                 -15   27    (147 683)
                 -16   27    (149 349 685)
                 -18   26    (155 355)
                 -19   27    (159 691)
                 -19   28    (161 693)
                 -19   29    (163 695)
                 -17   31    (171 319)
                 -18   32    (175 315)
                 -19   32    (177 701)
                 -20   32    (179 703)
                 -22   31    (185 769)
                 -23   31    (187 771)
                 -24   31    (189 773)
                 -30   41    (245 269)
                 -30   42    (247 271)
                 -27   40    (281 733)
                 -26   40    (283 735)
                 -26   37    (289 725)
                 -23   35    (299 715)
                 -22   35    (301 761)
                 -21   35    (303 759)
                 -20   35    (305 757)
                 -17   27    (351 687)
                 -18   27    (353 689)
                 -17   24    (361 673)
                 -16   24    (363 675)
                 -15   24    (365 677)
                 -17   21    (379 667)
                 -17   22    (381 669)
                 -17   23    (383 671)
                 -20   22    (391 631)
                 -20   21    (393 633)
                 -20   20    (395 635)
                 -20   19    (397 637)
                 -22   19    (401 593)
                 -22   18    (403 591)
                 -22   17    (405 589)
                 -22   16    (407 587)
                 -27   15    (419 575)
                 -27   14    (421 573)
                 -28   14    (423 819)
                 -29   14    (425 549)
                 -32   14    (431 539)
                 -32   13    (433 537)
                 -26   10    (563 831)
                 -26   11    (565 829)
                 -27   13    (571 823)
                 -28   18    (607 811)
                 -22   32    (707 767)
                   4   -6    (923 971)
                   4   -7    (925 969)
                   4   -8    (927 967)
                   4   -9    (929 989)
                   5   -9    (931 991)

Is this list the answer to Gruenberger’s question? No, it’s not, because there’s no reason to stop at an arbitrary limit such as N=1,001. Indeed, the list above is not even a prefix of the complete answer. The smallest value of N appearing in the list is 137, but the trail will eventually revisit cells occupied by smaller values of N. For example, continuing the experiment to N=10,001 reveals a bunch of intersections quite close to the beginning of the path, including a site that’s visited five times:

                   x    y    values of N
                   1    0    (5 1621)
                   1    1    (7 1623)
                   2    1    (9 4725)
                   3    1    (11 1263)
                   3    2    (13 1265)
                   5    3    (19 1635)
                   6    3    (21 1637)
                   7    4    (25 7537)
                   7    5    (27 7319 7539)
                   7    6    (29 7505 7541)
                   6    6    (31 1643 7323 7503 7543)
                   6    7    (33 1645 7325)
                   6    8    (35 1647 7327)
                   6    9    (37 1649 7329)

One point still missing from this list is the origin–the site at x=0, y=0, N=3. Does the path ever revisit its starting point? If so, at what value (or values) of N does it come back home? Since I don’t know the answer to this question, I guess I’ll have to leave it as an exercise for the reader.

I suspect that the problem Gruenberger meant to pose (or thought he was posing) was to generate a list of self-intersection sites arranged in their natural order of occurrence–that is, the order in which the crossings are created when you construct the path starting from the origin. This natural-order list is not at all the same as a list “arranged in the order of the smallest number in the cell.” The natural-order list is easy to generate step by step. All you need to do is obey the left/right/straight rules, plot the resulting sequence of positions on the xy lattice, and leave behind a trail of breadcrumbs so you can check at each step to see if the site has been visited before. This task is a matter of straightforward computation–just the kind of assignment that Gruenberger favored. The natural-order list begins:

                   x    y    values of N
                 -30   41    (269 245)
                 -30   42    (271 247)
                 -18   32    (315 175)
                 -17   31    (319 171)
                 -11   28    (337 137)
                 -16   27    (349 149)
                 -18   26    (355 155)
                 -32   13    (537 433)
                 -32   14    (539 431)
                 -29   14    (549 425)
                 -27   14    (573 421)

Thus the prime path first crosses itself when N=269, a value that shares the same coordinates as N=245, namely x=−30, y=41. There are 56 such crossings up to N=1,001, and 112,988 self-intersections up to N=10⁶.

* * *

There is a wilder, conjectural answer to Gruenberger’s challenge–which I’m pretty sure he did not have in mind. It goes like this: Maybe the complete list of revisited values of N is simply the list of all N. In other words, maybe the Gruenberger prime path fills up the entire lattice of integers, crossing over itself everywhere many times.

In 1921 George Pólya published a celebrated proof that a random walk on the lattice of integers is recurrent in one or two dimensions, though not in higher dimensions. Recurrent means that the walk returns to each point along its length with probability 1, and indeed visits every point in its domain infinitely often. Is it possible that the prime path is also recurrent?

Pólya’s theorem is one of those mind-expanding results that seem impossible on first acquaintance, and then inevitable, and finally just so amazing that you want to go kiss a mathematician. I have to confess that I’ve never gotten all the way through Pólya’s original paper (it’s not long, but it’s in German). On the other hand, I can highly recommend a little book by Peter Doyle and Laurie Snell, Random Walks and Electric Networks, which gives several alternative proofs of the theorem; it was published in the MAA’s Carus Monograph series, and there’s a postprint available on the arXiv.

The key insight underlying Pólya’s result, as I understand it, is this: If you never revisit a former home, then you must be spending eternity somewhere else, and you can do that only if your universe has enough somewhere elses that you’ll never run out of new territories to visit. Suppose that, some eons after starting your journey, you find yourself at distance r from the origin. If you’re living in a one-dimensional universe, then there are just two places you could be at that moment, namely at +r or −r. It doesn’t matter how far you run; there are still just two points at any given distance from the origin. In two dimensions, a fugitive at distance r has a little more room to maneuver; the number of available points grows in proportion to r, forming a circle of radius r. But this is still not enough room to get lost in. Only in three dimensions or more is there a nonzero probability of escape. In three dimensions, the space available at radius r is proportional to r². In this three-dimensional world, the volume of empty space grows faster than a random walker’s expected distance from home.

What does all this have to do with Gruenberger’s prime path? Well, it’s no secret that the distribution of prime numbers looks convincingly random–if you look at it in just the right way. And in particular the distribution of primes in various residue classes, such as 6K+1 and 6K−1, seems to behave at least approximately like a random variable. All this suggests we might consider viewing the Gruenberger prime trail as if it were a random walk through the two-dimensional lattice of integers. Because the space is two-dimensional, it’s a good guess that the walk should be recurrent.

The original recurrence results of Pólya refer to a simple random walk, where at each step the walker chooses randomly among the available directions and then moves one unit in that direction. For example, in the two-dimensional lattice of integers there are four possible directions: north, south, east, west. The simple random walk is not the best model of the Gruenberger process, which is more like a nonreversing random walk–a path where on each step the walker can turn left or turn right or go straight ahead but can never make a 180-degree about-face. We can further refine the random-walk model of the Gruenberger process by biasing the choice made at each step to reflect the changing abundance of prime numbers. Primes grow scarcer as their magnitude increases; in the vicinity of a given value of N, the probability that a randomly chosen number is prime is approximately 1/log N. Since the Gruenberger path goes straight for all composite numbers and turns only when N is prime, the trail will have longer and longer straight segments, and rarer turns, as N increases. A random walk can mimic this behavior by choosing an action at each step according to this logic:

    if random(1.0) > 2/log(N)
       then go straight
       elseif randomboolean()
           then turn left
           else turn right

(The proportion of primes is given as 2/log(N) rather than 1/log(N) because the Gruenberger process is defined on odd numbers only, which immediately eliminates half of the composites.)

One way to compare the various kinds of random walks is to measure the root-mean-square displacement–the distance from the origin to the final position of the walker, averaged over many realizations of the random process. For a simple random walk, the RMS displacement for an N-step walk converges to \(\sqrt{N}\); for the nonreversing random walk the average displacement is \(\sqrt{2N}\). The biased random walk based on the distribution of primes also appears to yield an RMS distance proportional to the square root of the number of steps; numerically the curve looks something like \(\sqrt{10N}\). I’m not entirely sure that’s the true form of the curve, but the geometric details don’t really make much difference. If I understand correctly, all three of these random processes should be recurrent in the sense of Pólya.

Does the same reasoning apply to the Gruenberger prime path? There are two sides to this question.

The naysayer points out that Pólya’s theorem applies to random walks, but there’s nothing truly random about the sequence of primes. After all, we have a straightforward, deterministic algorithm for generating primes, as well as an efficient algorithm for testing whether any given integer is prime or composite. The essence of a random process is that every time you run it you get a different result, but there’s only one sequence of prime numbers, and so the Gruenberger prime path will come out exactly the same every time. According to this view of things, the kind of probabilistic reasoning that goes into the proof of Pólya’s recurrence theorem is out of bounds here. For randomness to make any sense, you need to average over some ensemble of independent instances. For example, you could average over the 50 salmon-pink paths in the graph below, which represent 50 independent realizations of a biased random walk; you can’t average over the prime path itself (green), because there’s only that one path.

The yeasayer retorts that a single path is all you need–if the path is infinitely long. Indeed, the salmon-colored trails above could be interpreted not as 50 distinct runs of a random process but as 50 segments of a single long path, which repeatedly loops around through the point at x=0, y=0, wanders off in various directions, and then comes back home yet again. In essence, everything that could possibly happen in an infinite set of random paths happens somewhere within a single infinite path; all possible variety is already present there.

I’m not sure how to settle this dispute between Dr. Yea and Professor Nay. When an argument hinges on the nature of randomness, the meaning of infinity and patterns in the distribution of the primes, I known I’m in over my head.

So I’ll leave that deep question unresolved and say a final word about a lesser curiosity. In the Gruenberger process, we’re using the congruence classes of prime numbers mod 6 as a kind of coin flip to decide which way to turn. Is it a fair coin flip? For small values of N, it certainly doesn’t look fair:

                               6x+1      6x−1
       primes <     100         11        12
       primes <    1000         80        86
       primes <   10000        611       616
       primes <  100000       4784      4806
       primes < 1000000      39231     39265

There’s a persistent excess of −1 primes, and the imbalance seems to be getting steadily larger. As a result, the prime path has a “winding number” that reaches 8.5 at N=10⁶; that is, the path makes eight and a half net counterclockwise revolutions. Does the windup continue with still larger N? I gather that the definitive answer is “Yes and No.” For more see the masterful paper by Andrew Granville and Greg Martin cited above.

[Correction 2010-02-19: reflected the accent on Pólya.]

8 Responses to Gruenberger’s prime path

Russ Abbott says:

16 February 2010 at 11:04 pm

I joined the CS Department at Cal State Northridge while Fred was there. One of the things that stands out in my mind was his advice to beginning programming students. When writing a program, first make it work; then make it pretty. Elegance and abstraction weren’t among his primary considerations when writing software.

Those were the days of the key punch and batch processing. Fred thought it was good discipline to have to wait an hour or more for each compile/execute run. That way one would look more carefully at one’s code before submitting it.

The good old days!
Cody says:

17 February 2010 at 11:20 am

An interesting problem to lift my spirits this morning. Thanks!
Craig says:

18 February 2010 at 10:12 am

Cute question. For the record, I’ve verified that the path doesn’t return to (0,0) for N up to 2,000,000,000 (which is far enough for now; my sieve will start to fail after 31 bits). Upon reaching N=2,000,000,000, we find that the path has taken 49,112,581 left turns and 49,109,703 right turns, and finds itself at position (113,156, 248,033). I want to say “it doesn’t look like it’s coming back home”, but that’s the usual naivete with respect to random walks.

I’d like to compute which location in the plane is visited most often for N up to 2,000,000,000, but that’s a bit more of a hassle (it demands much more memory).
Nate says:

18 February 2010 at 11:14 am

I like the visual style of your graphics in this post (and others). Is there a particular graphics package that you use? Thanks. Nate.
Stephan Mertens says:

18 February 2010 at 12:36 pm

My guess is that the RMS distance of the prime based, non-reversing random walk is proportional to \sqrt{n \log n}.
brian says:

18 February 2010 at 1:47 pm

@Craig: 2 x 10^9 is exactly twice as far as I got. (I’m curious about your seive. Indeed, I’m curious in general about the best practical primality test in a problem like this. When I tested a couple of simple-to-code alternatives, the results were not quite what I expected.)

@Nate: Thanks for asking about the artwork. The short answer is that it’s all home-baked. The longer answer is that I should write a separate post on this subject, since a couple of people have asked recently. I’ll try to do that in the next few days.
Craig says:

18 February 2010 at 8:34 pm

Brian, my sieve is nothing special. It’s about a dozen lines of C++. The only things worth noting are that I use a vector, for which the STL cleverly packs the bits, and that I don’t store the primality of even numbers (particularly apropos here). I can send you my code if you want.

I know there are fancier sieves, but I’m not enough of a number theorist to drop in anything more clever than what I’ve got. It seems like you’d want some sort of windowed approach, in which you deduce the primality of a billion numbers at a time as you head out on your walk.

For the record, no return to (0,0) for N <= 4e9. The walk ends at 115220,257903, and made 94983231 left turns and 94978580 turns.
Craig says:

18 February 2010 at 10:16 pm

Sorry, that’s vector<bool>. C++’s STL includes special-case code that represents that type using a packed bit array (a clever trick).