Sorry. My program and your browser are not getting along. None of the interactive elements of this page will work. Could you try a different browser? Current versions of Chrome, Firefox, and Safari seem to work.

The disks are scattered randomly, except that no disk is allowed to overlap another disk or extend beyond the boundary of the square. Once a disk has been placed, it never moves, so each later disk has to find a home somewhere in the nooks and crannies between the earlier arrivals. Can this go on forever?

The search for a vacant spot would seem to grow harder as the square gets more crowded, so you might expect the process to get stuck at some point, with no open site large enough to fit the next disk. On the other hand, because the disks get progressively smaller, later ones can squeeze into tighter quarters. In the specific filling protocol shown here, these two trends are in perfect balance. The process of adding disks, one after another, never seems to stall. Yet as the number of disks goes to infinity, they completely fill the box provided for them. There’s a place for every last dot, but there’s no blank space left over.

Or at least that’s the mathematical ideal. The computer program that fills the square above never attains this condition of perfect plenitude. It shuts down after placing just 5,000 disks, which cover about 94 percent of the square’s area. This early exit is a concession to the limits of computer precision and human patience, but we can still dream of how it would work in a world without such tiresome constraints.

This scheme for filling space with randomly placed objects is the invention of John Shier, a physicist who worked for many years in the semiconductor industry and who has also taught at Normandale Community College near Minneapolis. He explains the method and the mathematics behind it in a recent book, *Fractalize That! A Visual Essay on Statistical Geometry*. (For bibliographic details see the links and references at the end of this essay.) I learned of Shier’s work from my friend Barry Cipra.

Shier hints at the strangeness of these doings by imagining a set of 100 round tiles in graduated sizes, with a total area approaching one square meter. He would give the tiles to a craftsman with these instructions:

“Mark off an area of one square meter, either a circle or a square. Start with the largest tile, and attach it permanently anywhere you wish in the marked-off area. Continue to attach the tiles anywhere you wish, proceeding always from larger to smaller.

There will always be a place for every tile regardless of how you choose to place them.” How many experienced tile setters would believe this?

Shier’s own creations go way beyond squares and circles filled with simple shapes such as disks. He has shown that the algorithm also works with an assortment of more elaborate designs, including nonconvex figures and even objects composed of multiple disconnected pieces. We get snowflakes, nested rings, stars, butterflies, fish eating lesser fish, faces, letters of the alphabet, and visual salads bringing together multiple ingredients. Shier’s interest in these patterns is aesthetic as well as mathematical, and several of his works have appeared in art exhibits; one of them won a best-of-show award at the 2017 Joint Mathematics Meeting.

Shier and his colleagues have also shown that the algorithm can be made to work in three-dimensional space. The book’s cover is adorned with a jumble of randomly placed toruses filling the volume of a transparent cube. If you look closely, you’ll notice that some of the rings are linked; they cannot be disentangled without breaking at least one ring. (The 3D illustration was created by Paul Bourke, who has more examples online, including 3D-printed models.)

After reading Shier’s account of his adventures, and admiring the pictures, I had to try it for myself. The experiments I’m presenting in this essay have no high artistic ambitions. I stick with plain-vanilla circular disks in a square frame, all rendered with the same banal blue-to-red color scheme. My motive is merely to satisfy my curiosity—or perhaps to overcome my skepticism. When I first read the details of how these graphics are created, I couldn’t quite believe it would work. Writing my own programs and seeing them in action has helped persuade me. So has a proof by Christopher Ennis, which I’ll return to below.

Filling a region of the plane with disks is not in itself such a remarkable trick. One well-known way of doing it goes by the name Apollonian circles. Start with three disks that are all tangent to one another, leaving a spiky three-pointed vacancy between them. Draw a new disk in the empty patch, tangent to all three of the original disks; this is the largest disk that can possibly fit in the space. Adding the new disk creates three smaller triangular voids, where you can draw three more triply tangent disks. There’s nothing to stop you from going on in this way indefinitely, approaching a limiting configuration where the entire area is filled.

There are randomized versions of the Apollonian model. For example, you might place zero-diameter seed disks at random unoccupied positions and then allow them to grow until they touch one (or more) of their neighbors. This process, too, is space-filling in the limit. And it can never fail: Because the disks are custom-fitted to the space available, you can never get stuck with a disk that can’t find a home.

Shier’s algorithm is different. You are given disks one at a time in a predetermined order, starting with the largest, then the second-largest, and so on. To place a disk in the square, you choose a point at random and test to see if the disk will fit at that location without bumping into its neighbors or poking beyond the boundaries of the square. If the tests fail, you pick another random point and try again. It’s not obvious that this haphazard search will always succeed—and indeed it works only if the successive disks get smaller according to a specific mathematical rule. But if you follow that rule, you can keep adding disks forever. Furthermore, as the number of disks goes to infinity, the fraction of the area covered approaches \(1\). It’s convenient to have a name for series of disks that meet these two criteria; I have taken to calling them *fulfilling* series.

In exploring these ideas computationally, it makes sense to start with the simplest case: disks that are all the same size. This version of the process clearly *cannot* be fulfilling. No matter how the disks are arranged, their aggregate area will eventually exceed that of any finite container. Click in the gray square below to start filling it with equal-size disks. The square box has area \(A_{\square} = 4\). The slider in the control panel determines the area of the individual disks \(A_k\), in a range from \(0.0001\) to \(1.0\).

Sorry, the program will not run in this browser.

If you play with this program for a while, you’ll find that the dots bloom quickly at first, but the process invariably slows down and eventually ends in a state labeled “Jammed,” indicating that the program has been unable

The densest possible packing of equal-size disks places the centers on a triangular lattice with spacing equal to the disk diameter. The resulting density (for an infinite number of disks on an infinite plane) is \(\pi \sqrt{3}\, /\, 6 \approx 0.9069\), which means more than 90 percent of the area is covered. A random filling in a finite square is much looser. My first few trials all halted with a filling fraction fairly close to one-half, and so I wondered if that nice round number might be the expectation value of the probabilistic process. Further experiments suggested otherwise. Over a broad range of disk sizes, from \(0.0001\) up to about \(0.01\), the area covered varied from one run to the next, but the average was definitely above one-half—perhaps \(0.54\). After some rummaging through the voluminous literature on circle packing, I think I may have a clue to the exact expectation value: \(\pi / (3 + 2 \sqrt{2}) \approx 0.539012\). Where does that weird number come from? The answer has nothing to do with Shier’s algorithm, but I think it’s worth a digression.

Consider an adversarial process: Alice is filling a unit square with \(n\) equal-size disks and wants to cover as much of the area as possible. Bob, who wants to minimize the area covered, gets to choose \(n\). If Bob chooses \(n = 1\), Alice can produce a single disk that just fits inside the square and covers about \(79\) percent of the space. Can Bob do better? Yes, if Bob specifies \(n = 2\), Alice’s best option is to squeeze the two disks into diagonally opposite corners of the square as shown in the diagram at right. These disks are bounded by right isosceles triangles, which makes it easy to calculate their radii as \(r = 1 / (2 + \sqrt{2}) \approx 0.2929\). Their combined area works out to that peculiar number \(\pi / (3 + 2 \sqrt{2}) \approx 0.54\).

If two disks are better than one (from Bob’s point of view), could three be better still? Or four, or some larger number? Apparently not. In 2010, Erik Demaine, Sándor Fekete and Robert Lang conjectured that the two-disk configuration shown above represents the worst case for any number of equal-size disks. In 2017 Fekete, Sebastian Morr, and Christian Scheffer proved this result.

Is it just a coincidence that the worst-case density for packing disks into a square also appears to be the expected density when equal-size disks are placed randomly until no more will fit? Wish I knew.

Let us return to the questions raised in Shier’s *Fractalize That!* If we want to fit infinitely many disks into a finite square, our only hope is to work with disks that get smaller and smaller as the process goes on. The disk areas must come from some sequence of ever-diminishing numbers. Among such sequences, the one that first comes to mind is \(\frac{1}{1}, \frac{1}{2}, \frac{1}{3}, \frac{1}{4}, \ldots\) These fractions have been known since antiquity as the harmonic numbers. (They are the wavelengths of the overtones of a plucked string.)

To see what happens when successive disks are sized according to the harmonic sequence, click in the square below.

Sorry, the program will not run in this browser.

Again, the process halts when no open space is large enough to accommodate the next disk in the sequence. If you move the slider all the way to the right, you’ll see a sequence of disks with areas drawn from the start of the full harmonic sequence, \(\frac{1}{1} , \frac{1}{2}, \frac{1}{3}, \dots\); at this setting, you’ll seldom get beyond eight or nine disks. Moving the slider to the left omits the largest disks at the beginning of the sequence, leaving the infinite tail of smaller disks. For example, setting the slider to \(1/20\) skips all the disks from \(\frac{1}{1}\) through \(\frac{1}{19}\) and begins filling the square with disks of area \(\frac{1}{20}, \frac{1}{21}, \frac{1}{22}, \dots\) Such truncated series go on longer, but eventually they also end in a jammed configuration.

The slider goes no further than 1/50, but even if you omitted the first 500 disks, or the first 5 million, the result would be the same. This is a consequence of the most famous property of the harmonic numbers: Although the individual terms \(1/k\) dwindle away to zero as \(k\) goes to infinity, the sum of all the terms,

\[\sum_{k = 1}^{\infty}\frac{1}{k} = \frac{1}{1} + \frac{1}{2} + \frac{1}{3} + \cdots,\]

does not converge to a finite value. As long as you keep adding terms, the sum will keep growing, though ever more slowly. This curious fact was proved in the 14th century by the French bishop and scholar Nicole Oresme. The proof is simple but ingenious. Oresme pointed out that the harmonic sequence

\[\frac{1}{1} + \frac{1}{2} + \left(\frac{1}{3} + \frac{1}{4}\right) + \left(\frac{1}{5} + \frac{1}{6} + \frac{1}{7} + \frac{1}{8}\right) + \cdots\]

is greater than

\[\frac{1}{1} + \frac{1}{2} + \left(\frac{1}{4} + \frac{1}{4}\right) + \left(\frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8}\right) + \cdots\]

The latter series is equivalent to \(1 + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} \cdots\), and so it is clearly divergent. Since the grouped terms of the harmonic series are even greater, they too must exceed any finite bound.

The divergence of the harmonic series implies that disks whose areas are generated by the series will eventually overflow any enclosing container. Dropping a finite prefix of the sequence, such as the first 50 disks, does not change this fact.

Let me note in passing that just as the filling fraction for fixed-size disks seems to converge to a specific constant, 0.5390, disks in harmonic series also seem to have a favored filling fraction, roughly 0.71. Can this be explained by some simple geometric argument? Again, I wish I knew.

Evidently we need to make the disks shrink faster than the harmonic numbers do. Here’s an idea: Square each element of the harmonic series, yielding this:

\[\sum_{k = 1}^{\infty}\frac{1}{k^2} = \frac{1}{1^2} + \frac{1}{2^2} + \frac{1}{3^2} + \cdots.\]

Click below (or press the Start button) to see how this one turns out, again in a square of area 4.

Sorry, the program will not run in this browser.

At last we have a process that won’t get stuck in a situation where there’s no place to put another disk. *could* run forever, but of course it doesn’t. It quits when the area of the next disk shrinks down to about a tenth of the size of a single pixel on a computer display. The stopped state is labeled “Exhausted” rather than “Jammed.”*fulfilling*. The disks are scattered sparsely in the square, leaving vast open spaces unoccupied. The configuration reminds me of deep-sky images made by large telescopes.

Why does this outcome look so different from the others? Unlike the harmonic numbers, the infinite series \(1 + \frac{1}{4} + \frac{1}{9} + \frac{1}{16} + \cdots\) converges to a finite sum. In the 18th century the task of establishing this fact (and determining the exact sum) was known as the Basel Problem, after the hometown of the Bernoulli family, who put much effort into the problem but never solved it. The answer came in 1735 from Leonhard Euler (another native of Basel, though he was working in St. Petersburg), who showed that the sum is equal to \(\pi^2 / 6\). This works out to about \(1.645\); since the area of the square we want to fill is \(4\), even an infinite series of disks would cover only about \(41\) percent of the territory.

Given that the numbers \(\frac{1}{1^1}, \frac{1}{2^1}, \frac{1}{3^1}, \dots\) diminish too slowly, whereas \(\frac{1}{1^2}, \frac{1}{2^2}, \frac{1}{3^2}, \dots\) shrink too fast, it makes sense to try an exponent somewhere between \(1\) and \(2\) in the hope of finding a Goldilocks solution. The computation performed below in Program 4 is meant to facilitate the search for such a happy medium. Here the disk sizes are elements of the sequence \(\frac{1}{1^s}, \frac{1}{2^s}, \frac{1}{3^s}, \dots\), where the value of the exponent \(s\) is determined by the setting of the slider, with a range of \(1 \lt s \le 2\). We already know what happens at the extremes of this range. What is the behavior in the middle?

Sorry, the program will not run in this browser.

If you try the default setting of \(s = 1.5\), you’ll find you are still in the regime where the disks dwindle away so quickly that the box never fills up; if you’re willing to wait long enough, the program will end in an exhausted state rather than a jammed one. Reducing the exponent to \(s = 1.25\) puts you on the other side of the balance point, where the disks remain too large and at some point one of them will not fit into any available space. By continuing to shuttle the slider back and forth, you could carry out a binary search, closing in, step by step, on the “just right” value of \(s\). This strategy can succeed, but it’s not quick. As you get closer to the critical value, the program will run longer and longer before halting. (After all, running forever is the behavior we’re seeking.) To save you some tedium, I offer a spoiler: the optimum setting is between 1.29 and 1.30.

At this point we have wandered into deeper mathematical waters. A rule of the form \(A_k = 1/k^s\) is called a power law, since each \(k\) is raised to the same power. And series of the form \(\sum 1/k^s\) are known as zeta functions, denoted \(\zeta(s)\). Zeta functions have quite a storied place in mathematics. The harmonic numbers correspond to \(\zeta(1) = \sum 1/k^1\), which does not converge.

Today, Riemann’s version of the zeta function is the engine (or enigma!) driving a major mathematical industry. Shier’s use of this apparatus in making fractal art is far removed from that heavy-duty research enterprise—but no less fascinating. Think of it as the zeta function on vacation.

If a collection of disks are to fill a square exactly, their aggregate area must equal the area of the square. This is a necessary condition though not a sufficient one. In all the examples I’ve presented so far, the containing square has an area of 4, so what’s needed is to find a value of \(s\) that satisfies the equation:

\[\zeta(s) = \sum_{k = 1}^{\infty}\frac{1}{k^s} = 4\]

Except for isolated values of \(s\),

Having this result in hand solves one part of the square-filling problem. It tells us how to construct an infinite set of disks whose total area is just enough to cover a square of area \(4\), with adequate precision for graphical purposes. We assign each disk \(k\) (starting at \(k = 1\)) an area of \(1/k^{1.2939615}.\) This sequence begins 1.000, 0.408, 0.241, 0.166, 0.125, 0.098,…

In the graph above, the maroon curve with \(s = 1.29396\) converges to a sum very close to 4. Admittedly, the rate of convergence is not quick. More than 3 million terms are needed to get within 1 percent of the target.

Our off-label use of the zeta function defines an infinite sequence of disks whose aggregate area is equal to \(4\). The disks in this unique collection will exactly fill our square box (assuming they can be properly arranged). It’s satisfying to have a way of reliably achieving this result, after our various earlier failures. On the other hand, there’s something irksome about that number \(4\) appearing in the equation. It’s so arbitrary! I don’t dispute that \(4\) is a perfectly fine and foursquare number, but there are many other sizes of squares we might want to fill with dots. Why give all our attention to the \(2 \times 2\) variety?

This is all my fault. When I set out to write some square-filling programs, I knew I couldn’t use the unit square—which seems like the obvious default choice—because of the awkward fact that \(\zeta(s) = 1\) has no finite solution. The unit square is also troublesome in the case of the harmonic numbers; the first disk, with area \(A_1 = 1\), is too large to fit. So I picked the next squared integer for the box size in those first programs. Having made my choice, I stuck with it, but now I feel hemmed in by that decision made with too little forethought.

We have all the tools we need to fill squares of other sizes (as long as the size is greater than \(1\)). Given a square of area \(A_{\square}\), we just solve for \(s\) in \(\zeta(s) = A_{\square}\). A square of area 8 can be covered by disks sized according to the rule \(A_k = 1/k^s\) with \(s = \zeta(8) \approx 1.1349\). For \(A_{\square} = 100\), the corresponding value of \(s\) is \(\zeta(100) \approx 1.0101\). For any \(A_{\square} \gt 1\) there is an \(s\) that yields a fulfilling set of disks, and vice versa for any \(s \gt 1\).

This relation between the exponent \(s\) and the box area \(A_{\square}\) suggests a neat way to evade the whole bother of choosing a specific container size. We can just scale the disks to fit the box, or else scale the box to accommodate the disks. Shier adopts the former method. Each disk in the infinite set is assigned an area of

\[A_k = \frac{A_{\square}}{\zeta(s)} \frac{1}{k^s},\]

where the first factor is a scaling constant that adjusts the disk sizes to fit the container. In my first experiments with these programs I followed the same approach. Later, however, when I began writing this essay, it seemed easier to think about the scaling—and explain it—if I transformed the size of the box rather than the sizes of the disks. In this scheme, the area of disk \(k\) is simply \(1 / k^s\), and the area of the container is \(A_{\square} = \zeta(s)\). (The two scaling procedures are mathematically equivalent; it’s only the ratio of disk size to container size that matters.)

Program 5 offers an opportunity to play with such scaled zeta functions.

Sorry, the program will not run in this browser.

At the other end of the scale, if you push the value of \(s\) up beyond about \(1.40\), you’ll discover something else: The program more often than not halts after placing just a few disks. At \(s = 1.50\) or higher, it seldom gets beyond the first disk. This failure is similar to what we saw with the harmonic numbers, but more interesting. In the case of the harmonic numbers, the total area of the disks is unbounded, making an overflow inevitable. With this new scaled version of the zeta function, the total area of the disks is always equal to that of the enclosing square. In principle, all the disks could all be made to fit, if you could find the right arrangement. I’ll return below to the question of why that doesn’t happen.

In *Fractalize That!* Shier introduces another device for taming space-filling sets. He not only scales the object sizes so that their total area matches the space available; he also adopts a variant zeta function that has two adjustable parameters rather than just one:

This is the Hurwitz zeta function, named for the German mathematician Adolf Hurwitz (1859–1919). Before looking into the details of the function, let’s play with the program and see what happens. Try a few settings of the \(s\) and \(a\) controls:

Sorry, the program will not run in this browser.

Different combinations of \(s\) and \(a\) produce populations of disks with different size distributions. The separate contributions of the two parameters are not always easy to disentangle, but in general decreasing \(s\) or increasing \(a\) leads to a pattern dominated by smaller disks. Here are snapshots of four outcomes:

Within the parameter range shown in these four panels, the filling process always continues to exhaustion, but at higher values of \(s\) it can jam, just as it does with the scaled Riemann zeta function.

Hurwitz wrote just one paper on the zeta function. It was published in 1882, when he was still quite young and just beginning his first academic appointment, at the University of Göttingen. (The paper is available from the Göttinger Digitalisierungszentrum; see pp. 86–101.)

Hurwitz modified the Riemann zeta function in two ways. First, the constant \(a\) is added to each term, turning \(1/k^s\) into \(1/(a + k)^s\). Second, the summation begins with \(k = 0\) rather than \(k = 1\). By letting \(a\) take on any value in the range \(0 \lt a \le 1\) we gain access to a continuum of zeta functions. The elements of the series are no longer just reciprocals of integers but reciprocals of real numbers. Suppose \(a = \frac{1}{3}\). Then \(\zeta(s, a)\) becomes:

\[\frac{1}{\left(\frac{1}{3} + 0\right)^s} + \frac{1}{\left(\frac{1}{3} + 1\right)^s} + \frac{1}{\left(\frac{1}{3} + 2\right)^s} + \cdots\ = \left(\frac{3}{1}\right)^s + \left(\frac{3}{4}\right)^s + \left(\frac{3}{7}\right)^s + \cdots\]

The Riemann zeta function and the Hurwitz zeta function differ substantially only for small values of \(k\) or large values of \(a\). When \(k\) is large, adding a small \(a\) to it makes little difference in the value of the function. Thus as \(k\) grows toward infinity, the two functions are asymptotically equal, as suggested in the graph at right. When the Hurwitz function is put to work packing disks into a square, a rule with \(a > 1\) causes the first several disks to be smaller than they would be with the Riemann rule. A value of \(a\) between \(0\) and \(1\) enlarges the early disks. In either case, the later disks in the sequence are hardly affected at all.

If \(a\) is a positive integer, the interpretation of \(\zeta(s, a)\) is even simpler. The case \(a = 1\) corresponds to the Riemann zeta sum. When \(a\) is a larger integer, the effect is to omit the first \(a - 1\) entries, leaving only the tail of the series. For example,

\[\zeta(s, 5) = \frac{1}{5^s} + \frac{1}{6^s} + \frac{1}{7^s} + \cdots.\]

In his fractal artworks, Shier chooses various values of \(a\) as a way of controlling the size distribution of the placed objects, and thereby fine-tuning the appearance of the patterns. Having this adjustment knob available is very convenient, but in the interests of simplicity, I am going to revert to the Riemann function in the rest of this essay.

Before going on, however, I also have to confess that I don’t really understand the place of the Hurwitz zeta function in modern mathematical research, or what Hurwitz himself had in mind when he formulated it. Zeta functions have been an indispensable tool in the long struggle to understand how the prime numbers are sprinkled among the integers. The connection between these two realms was made by Euler, with his remarkable equation linking a sum of powers of integers with a product of powers of primes:

*my* motor.

Riemann went further, showing that everything we might want to know about the distribution of primes is encoded in the undulations of the zeta function over the complex plane. Indeed, if we could simply pin down all the complex values of \(s\) for which \(\zeta(s) = 0\), we would have a master key to the primes. Hurwitz, in his 1882 paper, was clearly hoping to make some progress toward this goal, but I have not been able to figure out how his work fits into the larger story. The Hurwitz zeta function gets almost no attention in standard histories and reference works (in contrast to the Riemann version, which is everywhere). Wikipedia notes: “At rational arguments the Hurwitz zeta function may be expressed as a linear combination of Dirichlet *L*-functions and vice versa”—which sounds interesting, but I don’t know if it’s useful or important. A recent article by Nicola Oswald and Jörn Steuding puts Hurwitz’s work in historical context, but it does not answer these questions—at least not in a way I’m able to understand.

But again I digress. Back to dots in boxes.

If a set of circular disks and a square container have the same total area, can you always arrange the disks so that they completely fill the square without overflowing? Certainly not! Suppose the set consists of a single disk with area equal to that of the square; the disk’s diameter is greater than the side length of the square, so it will bulge through the sides while leaving the corners unfilled. A set of two disks won’t work either, no matter how you apportion the area between them. Indeed, when you are putting round pegs in a square hole, no finite set of disks can ever fill all the crevices.

Only an infinite set—a set with no smallest disk—can possibly fill the square completely. But even with an endless supply of ever-smaller disks, it seems like quite a delicate task to find just the right arrangement, so that every gap is filled and every disk has a place to call home. It’s all the more remarkable, then, that simply plunking down the disks at random locations seems to produce exactly the desired result. This behavior is what intrigued and troubled me when I first saw Shier’s pictures and read about his method for generating them. If a *random* arrangement works, it’s only a small step to the proposition that *any* arrangement works. Could that possibly be true?

Computational experiments offer strong hints on this point, but they can never be conclusive. What we need is a proof. *Math Horizons*, a publication of the Mathematical Association of America, which keeps it behind a paywall. If you have no library access and won’t pay the $50 ransom, I can recommend a video of Ennis explaining his proof in a talk at St. Olaf College.

As a warm-up exercise, Ennis proves a one-dimensional version of the area-filling conjecture, where the geometry is simpler and some of the constraints are easier to satisfy. In one dimension a disk is merely a line segment; its area is its length, and its radius is half that length. As in the two-dimensional model, disks are placed in descending order of size at random positions, with the usual proviso that no disk can overlap another disk or extend beyond the end points of the containing interval. In Program 7 you can play with this scheme.

Sorry, the program will not run in this browser.

I have given the line segment some vertical thickness to make it visible. The resulting pattern of stripes may look like a supermarket barcode or an atomic spectrum, but please imagine it as one-dimensional.

If you adjust the slider in this program, you’ll notice a difference from the two-dimensional system. In 2D, the algorithm is fulfilling only if the exponent \(s\) is less than a critical value, somewhere in the neighborhood of 1.4. In one dimension, the process continues without impediment for all values of \(s\) throughout the range \(1 \lt s \lt 2\). Try as you might, you won’t find a setting that produces a jammed state. (In practice, the program halts after placing no more than 10,000 disks, but the reason is exhaustion rather than jamming.)

Ennis titles his *Math Horizons* article “(Always) room for one more.” He proves this assertion by keeping track of the set of points where the center of a new disk can legally be placed, and showing the set is never empty. Suppose \(n - 1\) disks have already been randomly scattered in the container. The next disk to be placed, disk \(n\), will have an area (or length) of \(A_n = 1 / n^s\). Since the geometry is one-dimensional, the corresponding disk radius is simply \(r_n = A_n / 2\). The center of this new disk cannot lie any closer than \(r_n\) to the perimeter of another disk. It must also be at a distance of at least \(r_n\) from the boundary of the containing segment. We can visualize these constraints by adding bumpers, or buffers, of thickness \(r_n\) to the outside of each existing disk and to the inner edges of the containing segment. A few stages of the process are illustrated below.

Placed disks are blue, the excluded buffer areas are orange, and open areas—the set of all points where the center of the next disk could be placed—are black. In the top line, before any disks have been placed, the entire containing segment is open except for the two buffers at the ends. Each of these buffers has a length equal to \(r_1\), the radius of the first disk to be placed; the center of that disk cannot lie in the orange regions because the disk would then overhang the end of the containing segment. After the first disk has been placed *(second line)*, the extent of the open area is reduced by the area of the disk itself and its appended buffers. On the other hand, all of the buffers have also shrunk; each buffer is now equal to the radius of disk \(2\), which is smaller than disk \(1\). The pattern continues as subsequent disks are added. Note that although the blue disks cannot overlap, the orange buffers can.

For another view of how this process evolves, click on the *Next* button in Program 8. Each click inserts one more disk into the array and adjusts the buffer and open areas accordingly.

Sorry, the program will not run in this browser.

Because the blue disks are never allowed to overlap, the total blue area must increase monotonically as disks are added. It follows that the orange and black areas, taken together, must steadily decrease. But there’s nothing steady about the process when you keep an eye on the separate area measures for the orange and black regions. Changes in the amount of buffer overlap cause erratic, seesawing tradeoffs between the two subtotals. If you keep clicking the *Next* button (especially with \(s\) set to a high value), you may see the black area falling below \(1\) percent. Can we be sure it will never vanish entirely, leaving no opening at all for the next disk?

Ennis answers this question through worst-case analysis. He considers only configurations in which no buffers overlap, thereby squeezing the black area to its smallest possible extent. If the black area is always positive under these conditions, it cannot be smaller when buffer overlaps are allowed.

The basic idea of the proof

\[A_{\square} = \zeta(s), \quad A_{\color{blue}{\mathrm{blue}}} = \sum_{k=1}^{k = n - 1} \frac{1}{k^s}, \quad A_{\color{orange}{\mathrm{orange}}} = 2(n-1)r_{n}.\]

Then we need to prove that

\[A_{\square} - (A_{\color{blue}{\mathrm{blue}}} + A_{\color{orange}{\mathrm{orange}}}) \gt 0.\]

A direct proof of this statement would require an exact, closed-form expression for \(\zeta(s)\), which we already know is problematic. Ennis evades this difficulty by turning to calculus. He needs to evaluate the remaining tail of the zeta series, \(\sum_{k = n}^\infty 1/k^s\), but this discrete sum is intractable. On the other hand, by shifting from a sum to an integral, the problem becomes an exercise in undergraduate calculus. Exchanging the discrete variable \(k\) for a continuous variable \(x\), we want to find the area under the curve \(1/x^s\) in the interval from \(n\) to infinity; this will provide a lower bound on the corresponding discrete sum. Evaluating the integral yields:

\[\int_{x = n}^{\infty} \frac{1}{x^{s}} d x = \frac{1}{(s-1) n^{s-1}}.\]

Some further manipulation reveals that the area of the black regions is never smaller than

\[\frac{2 - s}{(s - 1)n^{s - 1}}.\]

If \(s\) lies strictly between \(1\) and \(2\), this expression must be greater than zero, since both the numerator and the denominator will be positive. Thus for all \(n\) there is at least one black point where the center of a new disk can be placed.

Ennis’s proof is a stronger one than I expected. When I first learned there was a proof, I guessed that it would take a probabilistic approach, showing that although a jammed configuration may exist, it has probability zero of turning up in a random placement of the disks. Instead, Ennis shows that no such arrangement exists at all. Even if you replaced the randomized algorithm with an adversarial one that tries its best to block every disk, the process would still run to fulfillment.

The proof for a two-dimensional system follows the same basic line of argument, but it gets more complicated for geometric reasons. In one dimension, as the successive disk areas get smaller, the disk radii diminish in simple proportion: \(r_k = A_k / 2\). In two dimensions, disk radius falls off only as the square root of the disk area: \(r_k = \sqrt{A_k / \pi}\). As a result, the buffer zone surrounding a disk excludes neighbors at a greater distance in two dimensions than it would in one dimension. There is still a range of \(s\) values where the process is provably unstoppable, but it does not extend across the full interval from \(s \gt 1\) to \(s \lt 2\).

Program 9, running in the panel below, is one I find very helpful in gaining intuition into the behavior of Shier’s algorithm. As in the one-dimensional model of Program 8, each press of the *Next* button adds a single disk to the containing square, and shows the forbidden buffer zones surrounding the disks.

Sorry, the program will not run in this browser.

Move the \(s\) slider to a position somewhere near 1.40. *Next*. Shier describes this phenomenon as “infant mortality”: If the placement process survives the high-risk early period, it is all but immortal.

There’s a certain whack-a-mole dynamic to the behavior of this system. Maybe the first disk covers all but one small corner of the black zone. It looks like the next disk will completely obliterate that open area. And so it does—but at the same time the shrinking of the orange buffer rings opens up another wedge of black elsewhere. The third disk blots out that spot, but again the narrowing of the buffers allows a black patch to peek out from still another corner. Later on, when there are dozens of disks, there are also dozens of tiny black spots where there’s room for another disk. You can often guess which of the openings will be filled next, because the random search process is likely to land in the largest of them. Again, however, as these biggest targets are buried, many smaller ones are born.

Ennis’s two-dimensional proof addresses the case of circular disks inside a circular boundary, rather than a square one. (The higher symmetry and the absence of corners streamlines certain calculations.) The proof strategy, again, is to show that after \(n - 1\) disks have been placed, there is still room for the \(n\)th disk, for any value of \(n \ge 1\). The argument follows the same logic as in one dimension, relying on an integral to provide a lower bound for the sum of a zeta series. But because of the \(\pi r^2\) area relation, the calculation now includes quadratic as well as linear terms. As a result, the proof covers only a part of the range of \(s\) values. The black area is provably nonempty if \(s\) is greater than \(1\) but less than roughly \(1.1\); outside that interval, the proof has nothing to say.

As mentioned above, Ennis’s proof applies only to circular disks in a circular enclosure. Nevertheless, in what follows I am going to assume the same ideas carry over to disks in a square frame, although the location of the boundary will doubtless be somewhat different. I have recently learned that Ennis has written a further paper on the subject, expected to be published in the *American Mathematical Monthly*. Perhaps he addresses this question there.

With Program 9, we can explore the entire spectrum of behavior for packing disks into a square. The possibilities are summarized in the candybar graph below.

- The leftmost band, in darker green, is the interval for which Ennis’s proof might hold. The question mark at the upper boundary line signifies that we don’t really know where it lies.
- In the lighter green region no proof is known, but in Shier’s extensive experiments the system never jams there.
- The transition zone sees the probability of jamming rise from \(0\) to \(1\) as \(s\) goes from about \(1.3\) to about \(1.5\).
- Beyond \(s \approx 1.5\), experiments suggest that the system
*always*halts in a jammed configuration. - At \(s \approx 1.6\) we enter a regime where the buffer zone surrounding the first disk invariably blocks the entire black region, leaving nowhere to place a second disk. Thus we have a simple proof that the system always jams.
- Still another barrier arises at \(s \approx 2.7\). Beyond this point, not even one disk will fit. The diameter of a disk with area \(1\) is greater than the side length of the enclosing square.

Can we pin down the exact locations of the various threshold points in the diagram above? This problem is tractable in those situations where the placement of the very first disk determines the outcome. At high values of \(s\) (and thus low values of \(\zeta(s)\), the first disk can obliterate the black zone and thereby preclude placement of a second disk. What is the lowest value of \(s\) for which this can happen? As in the image at right, the disk must lie at the center of the square box, and the orange buffer zone surrounding it must extend just far enough out to cover the corners of the inner black square, which defines the locus of all points that could accommodate the center of the second disk. Finding the value of \(s\) that satisfies this condition is a messy but straightforward bit of geometry and algebra. With the help of SageMath I get the answer \(s = 1.282915\). This value—let’s call it \(\overline{s}\)—is an upper bound on the “never jammed” region. Above this limit there is always a nonzero probability that the filling process will end after placing a single disk.

The value of \(\overline{s}\) lies quite close to the experimentally observed boundary between the never-jammed range and the transition zone, where jamming first appears. Is it possible that \(\overline{s}\) actually marks the edge of the transition zone—that below this value of \(s\) the program can never fail? To prove that conjecture, you would have to show that when the first disk is successfully placed, the process never stalls on a subsequent disk. That’s certainly not true in higher ranges of \(s\). Yet the empirical evidence near the threshold is suggestive. In my experiments I have yet to see a jammed outcome at \(s \lt \overline{s}\), not even in a million trials just below the threshold, at \(s = 0.999 \overline{s}\). In contrast, at \(s = 1.001 \overline{s}\), a million trials produced 53 jammed results—all of them occuring immediately after the first disk was placed.

The same kind of analysis leads to a lower bound on the region where *every* run ends after the first disk *(medium pink in the diagram above)*. In this case the critical situation puts the first disk as close as possible to a corner of the square frame, rather than in the middle. If the disk and its orange penumbra are large enough to block the second disk in this extreme configuration, then they will also block it in any other position. Putting a number on this bound again requires some fiddly equation wrangling; the answer I get is \(\underline{s} = 1.593782\). No process with higher \(s\) can possibly live forever, since it will die with the second disk. In analogy with the lower-bound conjecture, one might propose that the probability of being jammed remains below \(1\) until \(s\) reaches \(\underline{s}\). If both conjectures were true, the transition region would extend from \(\overline{s}\) to \(\underline{s}\).

The final landmark, way out at \(s \approx 2.7\), marks the point where the first disk threatens to burst the bounds of the enclosing square. In this case the game is over before it begins. In program 9, if you push the slider far to the right, you’ll find that the black square in the middle of the orange field shrinks away and eventually winks out of existence. This extinction event comes when the diameter of the disk equals the side length of the square. Given a disk of area \(1\), and thus radius \(1/\sqrt{\pi}\), we want to find the value of \(s\) that satisfies the equation

\[\frac{2}{\sqrt{\pi}} = \sqrt{\zeta(s)}.\]

Experiments with Program 9 show that the value is just a tad more than 2.7. That’s an interesting numerical neighborhood, no? A famous number lives nearby. Do you suppose?

Another intriguing set of questions concerns the phenomenon that Shier calls infant mortality. If you scroll back up to Program 5 and set the slider to \(s = 1.45\), you’ll find that roughly half the trials jam. The vast majority of these failures come early in the process, after no more than a dozen disks have been placed. At \(s = 1.50\) death at an early age is even more common; three-fourths of all the trials end with the very first disk. On the other hand, if a sequence of disks does manage to dodge all the hazards of early childhood, it may well live on for a very long time—perhaps forever.

Should we be surprised by this behavior? I am. As Shier points out, the patterns formed by our graduated disks are fractals, and one of their characteristic properties is self-similarity, or scale invariance. If you had a fully populated square—one filled with infinitely many disks—you could zoom in on any region to any magnification, and the arrangement of disks would look the same as it does in the full-size square. By “look the same” I don’t mean the disks would be in the same positions, but they would have the same size distribution and the same average number of neighbors at the same distances. This is a statistical concept of identity. And since the pattern looks the same and has the same statistics, you would think that the challenge of finding a place for a new disk would also be the same at any scale. Slipping in a tiny disk late in the filling operation would be no different from plopping down a large disk early on. The probability of jamming ought to be constant from start to finish.

But there’s a rejoinder to this argument: Scale invariance is broken by the presence of the enclosing square. The largest disks are strongly constrained by the boundaries, whereas most of the smaller disks are nowhere near the edges and are little influenced by them. The experimental data offer some support for this view. The graph below summarizes the outcomes of \(20{,}000\) trials at \(s = 1.50\). The red bars show the absolute numbers of trials ending after placing \(n\) disks, for each \(n\) from \(0\) through \(35\). The blue lollipops indicate the proportion of trials reaching disk \(n\) that halted after placing disk \(n\). This ratio can be interpreted (if you’re a frequentist!) as the probability of stopping at \(n\).

It certainly looks like there’s something odd happening on the left side of this graph. More than three fourths of the trials end after a single disk, but none at all jam at the second or third disks, and very few (a total of \(23\)) at disks \(4\) and \(5\). Then, suddenly, \(1{,}400\) more fall by the wayside at disk \(6\), and serious attrition continues through disk \(11\).

Geometry can explain some of this weirdness. It has to do with the squareness of the container; other shapes would produce different results.

At \(s = 1.50\) we are between \(\overline{s}\) and \(\underline{s}\), in a regime where the first disk is large enough to block off the entire black zone but not so large that it *must* do so. This is enough to explain the tall red bar at \(n = 1\): When you place the first disk randomly, roughly \(75\) percent of the time it will block the entire black region, ending the parade of disks. If the first disk *doesn’t* foreclose all further action, it must be tucked into one of the four corners of the square, leaving enough room for a second disk in the diagonally opposite corner. The sequence of images below (made with Program 9) tells the rest of the story.

The placement of the second disk blocks off the open area in that corner, but the narrowing of the orange buffers also creates two tiny openings in the cross-diagonal corners. The third and fourth disks occupy these positions, and simultaneously allow the black background to peek through in two other spots. Finally the fifth and sixth disks close off the last black pixels, and the system jams.

This stereotyped sequence of disk placements accounts for the near absence of mortality at ages \(n = 2\) through \(n = 5\), and the sudden upsurge at age \(6\). The elevated levels at \(n = 7\) through \(11\) are part of the same pattern; depending on the exact positioning of the disks, it may take a few more to expunge the last remnants of black background.

At still higher values of \(n\)—for the small subset of trials that get there—the system seems to shift to a different mode of behavior. Although numerical noise makes it hard to draw firm conclusions, it doesn’t appear that any of the \(n\) values beyond \(n = 12\) are more likely jamming points than others. Indeed, the data are consistent with the idea that the probability of jamming remains constant as each additional disk is added to the array, just as scale invariance would suggest.

A much larger data set would be needed to test this conjecture, and collecting such data is painfully slow. Furthermore, when it comes to rare events, I don’t have much trust in the empirical data. During one series of experiments, I noticed a program run that stalled after \(290\) disks—unusually late. The 290-disk configuration, produced at \(s = 1.47\), is shown at left below.

I wondered if it was *truly* jammed. My program gives up on finding a place for a disk after \(10^7\) random attempts. Perhaps if I had simply persisted, it would have gone on. So I reset the limit on random attempts to \(10^9\), and sat back to wait. After some minutes the program discovered a place where disk \(291\) would fit, and then another for disk \(292\), and kept going as far as 300 disks. The program had an afterlife! Could I revive it again? Upping the limit to \(10^{10}\) allowed another \(14\) disks to squueze in. The final configuration is shown at right above (with the original \(290\) disks faded, in order to make the \(24\) posthumous additions more conspicuous).

Is it really finished now, or is there still room for one more? I have no reliable way to answer that question. Checking \(10\) billion random locations sounds like a lot, but it is still a very sparse sampling of the space inside the square box. Using 64-bit floating-point numbers to define the coordinate system allows for more than \(10^{30}\) distinguishable points. And to settle the question mathematically, we would need unbounded precision.

We know from Ennis’s proof that at values of \(s\) not too far above \(1.0\), the filling process can always go on forever. And we know that beyond \(s \approx 1.6\), every attempt to fill the square is doomed. There must be some kind of transition between these two conditions, but the details are murky. The experimental evidence gathered so far suggests a smooth transition along a sigmoid curve, with the probability of jamming gradually increasing from \(0\) to \(1\). As far as I can tell, however, nothing we know for certain rules out a single hard threshold, below which all disk sequences are immortal and above which all of them die. Thus the phase diagram would be reduced to this simple form:

The softer transition observed in computational experiments would be an artifact of our inability to perform infinite random searches or place infinite sequences of disks.

Here’s a different approach to understanding the random dots-in-a-box phenomenon. It calls for a mental reversal of figure and ground. Instead of placing disks on a square surface, we drill holes in a square metal plate. And the focus of attention is not the array of disks or holes but rather the spaces between them. Shier has a name for the perforated plate: the gasket.

Program 10 allows you to observe a gasket as it evolves from a solid black square to a delicate lace doily with less than 1 percent of its original substance.

Sorry, the program will not run in this browser.

The gasket is quite a remarkable object. When the number of holes becomes infinite, the gasket must disappear entirely; its area falls to zero. Up until that very moment, however, it retains its structural integrity.

As the gasket is etched away, can we measure the average thickness of the surviving wisps and tendrils? I can think of several methods that involve elaborate sampling schemes. Shier has a much simpler and more ingenious proposal: To find the average thickness of the gasket, divide its area by its perimeter. It was not immediately obvious to me why this number would serve as an appropriate measure of the width, but at least the units come out right: We are dividing a length squared by a length and so we get a length. And the operation does make basic sense: The area of the gasket represents the amount of substance in it, and the perimeter is the distance over which it is stretched. (The widths calculated in Program 10 differ slightly from those reported by Shier. The reason, I think, is that I include the outer boundary of the square in the perimeter, and he does not.)

Calculating the area and perimeter of a complicated shape such as a many-holed gasket looks like a formidable task, but it’s easy if we just keep track of these quantities as we go along. Initially (before any holes are drilled), the gasket area \(A_0^g\) is the area of the full square, \(A_\square\). The initial gasket perimeter \(P_0^g\) is four times the side length of the square, which is \(\sqrt{A_\square}\). Thereafter, as each hole is drilled, we subtract the new hole’s area from \(A^g\) and add its perimeter to \(P^g\). The quotient of these quantities is our measure of the average gasket width after drilling hole \(k\): \(\widehat{W}_k^g\). Since the gasket area is shrinking while the perimeter is growing, \(\widehat{W}_k^g\) must dwindle away as \(k\) increases.

The importance of \(\widehat{W}_k^g\) is that it provides a clue to how large a vacant space we’re likely to find for the next disk or hole. If we take the idea of “average” seriously, there must always be at least one spot in the gasket with a width equal to or greater than \(\widehat{W}_k^g\). From this observation Shier makes the leap to a whole new space-filling algorithm. Instead of choosing disk diameters according to a power law and then measuring the resulting average gasket width, he determines the radius of the next disk from the observed \(\widehat{W}_k^g\):

\[r_{k+1} = \gamma \widehat{W}_k^g = \gamma \frac{A_k^g}{P_k^g}.\]

Here \(\gamma\) is a fixed constant of proportionality that determines how tightly the new disks or holes fit into the available openings.

The area-perimeter algorithm has a recursive structure, in which each disk’s radius depends on the state produced by the previous disks. This raises the question of how to get started: What is the size of the first disk? Shier has found that it doesn’t matter very much. Initial disks in a fairly wide range of sizes yield jam-proof and aesthetically pleasing results.

Graphics produced by the original power-law algorithm and by the new recursive one look very similar. One way to understand why is to rearrange the equation of the recursion:

On the right side of this equation we are dividing the average gasket width by the diameter of the next disk to be placed. The result is a dimensionless number—dividing a length by a length cancels the units. More important, the quotient is a constant, unchanging for all \(k\). If we calculate this same dimensionless gasket width when using the power-law algorithm, it also turns out to be nearly constant in the limit of karge \(k\), showing that the two methods yield sequences with similar statistics.

Setting aside Shier’s recursive algorithm, all of the patterns we’ve been looking at are generated by a power law (or zeta function), with the crucial requirement that the series must converge to a finite sum. The world of mathematics offers many other convergent series in addition to power laws. Could some of them also create fulfilling patterns? The question is one that Ennis discusses briefly in his talk at St. Olaf and that Shier also mentions.

Among the obvious candidates are geometric series such as \(\frac{1}{1}, \frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \dots\) A geometric series is a close cousin of a power law, defined in a similar way but exchanging the roles of \(s\) and \(k\). That is, a geometric series is the sum:

\[\sum_{k=0}^{\infty} \frac{1}{s^k} = \frac{1}{s^0} + \frac{1}{s^1} + \frac{1}{s^2} + \frac{1}{s^3} + \cdots\]

For any \(s > 1\), the infinite geiometric series has a finite sum, namely \(\frac{s}{s - 1}\). Thus our task is to construct an infinite set of disks with individual areas \(1/s^k\) that we can pack into a square of area \(\frac{s}{s - 1}\). Can we find a range of \(s\) for which the series is fulfilling? As it happens, this is where Shier began his adventures; his first attempts were not with power laws but with geometric series. They didn’t turn out well. You are welcome to try your own hand in Program 11.

Sorry, the program will not run in this browser.

There’s a curious pattern to the failures you’ll see in this program. No matter what value you assign to \(s\) (within the available range \(1 \lt s \le 2\)), the system jams when the number of disks reaches the neighborhood of \(A_\square = \frac{s}{s-1}\). For example, at \(s = 1.01\), \(\frac{s}{s - 1}\) is 101 and the program typically gets stuck somewhere between \(k = 95\) and \(k = 100\). At \(s = 1.001\), \(\frac{s}{s - 1}\) is \(1{,}001\) and there’s seldom progress beyond about \(k = 1,000\).

For a clue to what’s going wrong here, consider the graph at right, plotting the values of \(1 / k^s\) *(red)* and \(1 / s^k\) *(blue)* for \(s = 1.01\). These two series converge on nearly the same sum (roughly \(100\)), but they take very different trajectories in getting there. On this log-log plot, the power-law series \(1 / s^k\) is a straight line. The geometric series \(1 / s^k\) falls off much more slowly at first, but there’s a knee in the curve at about \(k = 100\) *(dashed mauve line)*, where it steepens dramatically. If only we could get beyond this turning point, it looks like the rest of the filling process would be smooth sledding, but in fact we never get there. Whereas the first \(100\) disks of the power-law series fill up only about \(5\) percent of the available area, they occuy 63 percent in the geometric case. This is where the filling process stalls.

Even in one dimension, the geometric series quickly succumbs. (This is in sharp contrast to the one-dimensional power-law model, where any \(s\) between \(1\) and \(2\) yields a provably infinite progression of disks.)

Sorry, the program will not run in this browser.

And just in case you think I’m pulling a fast one here, let me demonstrate that those same one-dimensional disks will indeed fit in the available space, if packed efficiently. In Program 13 they are placed in order of size from left to right.

Sorry, the program will not run in this browser.

I have made casual attempts to find fulfillment with a few other convergent series, such as the reciprocals of the Fibonacci numbers (which converge to about \(3.36\)) and the reciprocals of the factorials (whose sum is \(e \approx 2.718\)). Both series jam after the first disk. There are plenty of other convergent series one might try, but I doubt this is a fruitful line of inquiry.

All the variations discussed above leave one important factor unchanged: The objects being fitted together are all circular. Exploring the wider universe of shapes has been a major theme of Shier’s work. He asks: What properties of a shape make it suitable for forming a statistical fractal pattern? And what shapes (if any) refuse to cooperate with this treatment? (The images in this section were created by John Shier and are reproduced here with his permission.)

Shier’s first experiments were with circular disks and axis-parallel squares; the filling algorithm worked splendidly in both cases. He also succeeded with axis-parallel rectangles of various aspect ratios, even when he mixed vertical and horizontal orientations in the same tableau. In collaboration with Paul Bourke he tried randomizing the orientation of squares as well as their positions. Again the outcome was positive, as the illustration above left shows.

Equilateral triangles were less cooperative, and at first Shier believed the algorithm would consistently fail with this shape. The triangles tended to form orderly arrays with the sharp point of one triangle pressed close against the broad side of another, leaving little “wiggle room.” Further efforts showed that the algorithm was not truly getting stuck but merely slowing down. With an appropriate choice of parameters in the Hurwitz zeta function, and with enough patience, the triangles did come together in boundlessly extendable space-filling patterns.

The casual exploration of diverse shapes eventually became a deliberate quest to stake out the limits of the space-filling process. Surely there must be *some* geometric forms that the algorithm would balk at, failing to pack an infinite number of objects into a finite area. Perhaps nonconvex shapes such as stars and snowflakes and flowers would expose a limitation—but no, the algorithm worked just fine with these figures, fitting smaller stars into the crevices between the points of larger stars. The next obvious test was “hollow” objects, such as annular rings, where an internal void is not part of the object and is therefore available to be filled with smaller copies. The image at right is my favorite example of this phenomenon. The bowls of the larger nines have smaller nines within them. It’s nines all the way down. When we let the process continue indefinitely, we have a whimsical visual proof of the proposition that \(.999\dots = 1\).

These successes with nonconvex forms and objects with holes led to an *Aha* moment, as Shier describes it. The search for a shape that would break the algorithm gave way to a suspicion that no such shape would be found, and then the suspicion gradually evolved into a conviction that any “reasonably compact” object is suitable for the *Fractalize That!* treatment. The phrase “reasonably compact” would presumably exclude shapes that are in fact dispersed sets of points, such as Cantor dust. But Shier has shown that shapes formed of disconnected pieces, such as the words in the pair of images below, present no special difficulty.

*Fractalize That!* is not all geometry and number theory. Shier is eager to explain the mathematics behind these curious patterns, but he also presents the algorithm as a tool for self-expression. MATH and ART both have their place.

Finally, I offer some notes on what’s needed to turn these algorithms into computer programs. Shier’s book includes a chapter for do-it-yourselfers that explains his strategy and provides some crucial snippets of code (written in C). My own source code (in JavaScript) is available on GitHub. And if you’d like to play with the programs without all the surrounding verbiage, try the GitHub Pages version.

The inner loop of a typical program looks something like this:

```
let attempt = 1;
while (attempt <= maxAttempts) {
disk.x = randomCoord();
disk.y = randomCoord();
if (isNotOverlapping(disk)) {
return disk;
}
attempt++;
}
return false;
```

We generate a pair of random \(x\) and \(y\) coordinates, which mark the center point of the new disk, and check for overlaps with other disks already in place. If no overlaps are discovered, the disk stays put and the program moves on. Otherwise the disk is discarded and we jump back to the top of the loop to try a new \(xy\) pair.

The main computational challenge lies in testing for overlaps. For any two specific disks, the test is easy enough: They overlap if the sum of their radii is greater than the distance between their centers. The problem is that the test might have to be repeated many millions of times. My program makes \(10\) million attempts to place a disk before giving up. If it has to test for overlap with \(100{,}000\) other disks on each attempt, that’s a trillion tests. A trillion is too many for an interactive program where someone is staring at the screen waiting for things to happen. To speed things up a little I divide the square into a \(32 \times 32\) grid of smaller squares. The largest disks—those whose diameter is greater than the width of a grid cell—are set aside in a special list, and all new candidate disks are checked for overlap with them. Below this size threshold, each disk is allocated to the grid cell in which its center lies. A new candidate is checked against the disks in its own cell and in that cell’s eight neighbors. The net result is an improvement by two orders of magnitude—lowering the worst-case total from \(10^{12}\) overlap tests to about \(10{10}\).

All of this works smoothly with circular disks. Devising overlap tests for the variety of shapes that Shier has been working with is much harder.

From a theoretical point of view, the whole rigmarole of overlap testing is hideously wasteful and unnecessary. If the box is already 90 percent full, then we know that 90 percent of the random probes will fail. A smarter strategy would be to generate random points only in the “black zone” where new disks can legally be placed. If you could do that, you would never need to generate more than one point per disk, and there’d be no need to check for overlaps. But keeping track of the points that comprise the black zone—scattered throughout multiple, oddly shaped, transient regions—would be a serious exercise in computational geometry.

For the actual drawing of the disks, Shier relies on the technology known as SVG, or scalable vector graphics. As the name suggests, these drawings retain full resolution at any size, and they are definitely the right choice if you want to create works of art. They are less suitable for the interactive programs embedded in this document, mainly because they consume too much memory. The images you see here rely on the HTML *canvas* element, which is simply a fixed-size pixel array.

Another point of possible interest is the evaluation of the zeta function. If we want to scale the disk sizes to match the box size (or vice versa), we need to compute a good approximation of the Riemann function \(\zeta(s)\) or the Hurwitz function \(\zeta(s, a)\). I didn’t know how to do that, and most of the methods I read about seemed overwhelming. Before I could get to zeta, I’d have to hack my way through thickets of polygamma functions and Stieltjes constants. For the Riemann zeta function I found a somewhat simpler algorithm published by Peter Borwein in 1995. It’s based on a polynomial approximation that yields ample precision and runs in less than a millisecond. For the Hurwitz zeta function I stayed with a straightforward translation of Shier’s code, which takes more of a brute-force approach. (There are alternatives for Hurwitz too, but I couldn’t understand them well enough to make them work.)

The JavaScript file in the GitHub repository has more discussion of implementation details.

Shier, John. 2018. *Fractalize That! A Visual Essay on Statistical Geometry*. Singapore: World Scientific. Publisher’s website.

Shier, John. Website: http://www.john-art.com/

Shier, John. 2011. The dimensionless gasket width \(b(c,n)\) in statistical geometry. http://www.john-art.com/gasket_width.pdf

Shier, John. 2012. Random fractal filling of a line segment. http://www.john-art.com/gasket_width.pdf

Dunham, Douglas, and John Shier. 2014. The art of random fractals. In *Proceedings of Bridges 2014: Mathematics, Music, Art, Architecture, Culture* pp. 79–86. PDF.

Shier, John. 2015. A new recursion for space-filling geometric fractals. http://www.john-art.com/gasket_width.pdf

Dunham, Douglas, and John Shier. 2015. An algorithm for creating aesthetic random fractal patterns. Talk delivered at the Joint Mathematics Meetings January 2015, San Antonio, Texas.

Dunham, Douglas, and John Shier. 2018. A property of area and perimeter. In *ICGG 2018: Proceedings of the 18th International Conference on Geometry and Graphics*, Milano, August 2018, pp. 228–237.

Dunham, Douglas, and John Shier. 2017. New kinds of fractal patterns. In *Proceedings of Bridges 2017: Mathematics, Art, Music, Architecture, Education, Culture*,

pp. 111–116. Preprint.

Shier, John, and Paul Bourke. 2013. An algorithm for random fractal filling of space. *Computer Graphics Forum* 32(8):89–97. PDF. Preprint.

Ennis, Christopher. 2016. (Always) room for one more. *Math Horizons* 23(3):8–12. PDF (paywalled).

Dodds, Peter Sheridan, and Joshua S. Weitz. 2002. Packing-limited growth. Physical Review E 65: 056108.

Lagarias, Jeffrey C., Colin L. Mallows, and Allan R. Wilks. 2001. Beyond the Descartes circle theorem. https://arxiv.org/abs/math/0101066. (Also published in *American Mathematical Monthly*, 2002, 109:338–361.)

Mackenzie, Dana. 2010. A tisket, a tasket, an Apollonian gasket. *American Scientist* 98:10–14. https://www.americanscientist.org/article/a-tisket-a-tasket-an-apollonian-gasket.

Manna, S. S. 1992. Space filling tiling by random packing of discs. *Physica A* 187:373–377.

Bailey, David H., and Jonathan M. Borwein. 2015. Crandall’s computation of the incomplete Gamma function and the Hurwitz zeta function, with applications to Dirichlet L-series. *Applied Mathematics and Computation*, 268, 462–477.

Borwein, Peter. 1995. An efficient algorithm for the Riemann zeta function. http://www.cecm.sfu.ca/personal/pborwein/PAPERS/P155.pdf

Coffey, Mark W. 2009. An efficient algorithm for the Hurwitz zeta and related functions. *Journal of Computational and Applied Mathematics* 225:338–346.

Hurwitz, Adolf. 1882. Einige Eigenschaften der Dirichletschen Funktionen \(F(s) = \sum \left(\frac{D}{n} \frac{1}{n^s}\right)\), die bei der Bestimmung der Klassenzahlen binärer quadratischer Formen auftreten. *Zeitschrift für Mathematik und Physik* 27:86–101. https://gdz.sub.uni-goettingen.de/id/PPN599415665_0027.

Oswald, Nicola, and Jörn Steuding. 2015. Aspects of zeta-function theory in the mathematical works of Adolf Hurwitz. https://arxiv.org/abs/1506.00856.

Xu, Andy. 2018. Approximating the Hurwitz zeta function. PDF.

Disclaimer: The investigations of the MAX 8 disasters are in an early stage, so much of what follows is based on secondary sources—in other words, on leaks and rumors and the speculations of people who may or may not know what they’re talking about. As for my own speculations: I’m not an aeronautical engineer, or an airframe mechanic, or a control theorist. I’m not even a pilot. Please keep that in mind if you choose to read on.

Early on the morning of October 29, 2018, Lion Air Flight 610 departed Jakarta, Indonesia, with 189 people on board. The airplane was a four-month-old 737 MAX 8—the latest model in a line of Boeing aircraft that goes back to the 1960s. Takeoff and climb were normal to about 1,600 feet, where the pilots retracted the flaps (wing extensions that increase lift at low speed). At that point the aircraft unexpectedly descended to 900 feet. In radio conversations with air traffic controllers, the pilots reported a “flight control problem” and asked about their altitude and speed as displayed on the controllers’ radar screens. Cockpit instruments were giving inconsistent readings. The pilots then redeployed the flaps and climbed to 5,000 feet, but when the flaps were stowed again, the nose dipped and the plane began to lose altitude. Over the next six or seven minutes the pilots engaged in a tug of war with their own aircraft, as they struggled to keep the nose level but the flight control system repeatedly pushed it down. In the end the machine won. The airplane plunged into the sea at high speed, killing everyone aboard.

The second crash happened March 8, when Ethiopian Airlines Flight 302 went down six minutes after taking off from Addis Ababa, killing 157. The aircraft was another MAX 8, just two months old. The pilots reported control problems, and data from a satellite tracking service showed sharp fluctuations in altitude. The similarities to the Lion Air crash set off alarm bells: If the same malfunction or design flaw caused both accidents, it might also cause more. Within days, the worldwide fleet of 737 MAX aircraft was grounded. Data recovered since then from the Flight 302 wreckage has reinforced the suspicion that the two accidents are closely related.

The grim fate of Lion Air 610 can be traced in brightly colored squiggles extracted from the flight data recorder. (The chart was published in November in a preliminary report from the Indonesian National Committee on Transportation Safety.)

The outline of the story is given in the altitude traces at the bottom of the chart. The initial climb is interrupted by a sharp dip; then a further climb is followed by a long, erratic roller coaster ride. At the end comes the dive, as the aircraft plunges 5,000 feet in a little more than 10 seconds. (Why are there two altitude curves, separated by a few hundred feet? I’ll come back to that question at the end of this long screed.)

All those ups and downs were caused by movements of the horizontal stabilizer, the small winglike control surface at the rear of the fuselage. The stabilizer controls the airplane’s pitch attitude—nose-up vs. nose-down. On the 737 it does so in two ways. A mechanism for pitch *trim* tilts the entire stabilizer, whereas pushing or pulling on the pilot’s control yoke moves the elevator, a hinged tab at the rear of the stabilizer. In either case, moving the trailing edge of the surface upward tends to force the nose of the airplane up, and vice versa. Here we’re mainly concerned with trim changes rather than elevator movements.

Commands to the pitch-trim system and their effect on the airplane are shown in three traces from the flight data, which I reproduce here for convenience:

The line labeled “trim manual” *(light blue)* reflects the pilots’ inputs, “trim automatic” *(orange)* shows commands from the airplane’s electronic systems, and “pitch trim position” *(dark blue)* represents the tilt of the stabilizer, with higher position on the scale denoting a nose-up command. This is where the tug of war between man and machine is clearly evident. In the latter half of the flight, the automatic trim system repeatedly commands nose down, at intervals of roughly 10 seconds. In the breaks between those automated commands, the pilots dial in nose-up trim, using buttons on the control yoke. In response to these conflicting commands, the position of the horizontal stabilizer oscillates with a period of 15 or 20 seconds. The see-sawing motion continues for at least 20 cycles, but toward the end the unrelenting automatic nose-down adjustments prevail over the briefer nose-up commands from the pilots. The stabilizer finally reaches its limiting nose-down deflection and stays there as the airplane plummets into the sea.

What’s to blame for the perverse behavior of the automatic pitch trim system? The accusatory finger is pointing at something called MCAS, a new feature of the 737 MAX series. MCAS stands for Maneuvering Characteristics Augmentation System—an impressively polysyllabic name that tells you nothing about what the thing is or what it does. As I understand it, MCAS is not a piece of hardware; there’s no box labeled MCAS in the airplane’s electronic equipment bays. MCAS consists entirely of software. It’s a program running on a computer.

MCAS has just one function. It is designed to help prevent an aerodynamic stall, a situation in which an airplane has its nose pointed up so high with respect to the surrounding airflow that the wings can’t keep it aloft. A stall is a little like what happens to a bicyclist climbing a hill that keeps getting steeper and steeper: Eventually the rider runs out of oomph, wobbles a bit, and then rolls back to the bottom. Pilots are taught to recover from stalls, but it’s not a skill they routinely practice with a planeful of passengers. In commercial aviation the emphasis is on *avoiding* stalls—forestalling them, so to speak. Airliners have mechanisms to detect an imminent stall and warn the pilot with lights and horns and a “stick shaker” that vibrates the control yoke. On Flight 610, the captain’s stick was shaking almost from start to finish.

Some aircraft go beyond mere warnings when a stall threatens. If the aircraft’s nose continues to pitch upward, an automated system intervenes to push it back down—if necessary overriding the manual control inputs of the pilot. MCAS is designed to do exactly this. It is armed and ready whenever two criteria are met: The flaps are up (generally true except during takeoff and landing) and the airplane is under manual control (not autopilot). Under these conditions the system is triggered whenever an aerodynamic quantity called angle of attack, or AoA, rises into a dangerous range.

Angle of attack is a concept subtle enough to merit a diagram:

The various angles at issue are rotations of the aircraft body around the pitch axis, a line parallel to the wings, perpendicular to the fuselage, and passing through the airplane’s center of gravity. If you’re sitting in an exit row, the pitch axis might run right under your seat. Rotation about the pitch axis tilts the nose up or down. *Pitch attitude* is defined as the angle of the fuselage with respect to a horizontal plane. The *flight-path angle* is measured between the horizontal plane and the aircraft’s velocity vector, thus showing how steeply it is climbing or descending. *Angle of attack* is the difference between pitch attitude and flight-path angle. It is the angle at which the aircraft is moving through the surrounding air (assuming the air itself is motionless, *i.e.*, no wind).

AoA affects both lift (the upward force opposing the downward tug of gravity) and drag (the dissipative force opposing forward motion and the thrust of the engines). As AoA increases from zero, lift is enhanced because of air impinging on the underside of the wings and fuselage. For the same reason, however, drag also increases. As the angle of attack grows even steeper, the flow of air over the wings becomes turbulent; beyond that point lift diminishes but drag continues increasing. That’s where the stall sets in. The critical angle for a stall depends on speed, weight, and other factors, but usually it’s no more than 15 degrees.

Neither the Lion Air nor the Ethiopian flight was ever in danger of stalling, so if MCAS was activated, it must have been by mistake. The working hypothesis mentioned in many press accounts is that the system received and acted upon erroneous input from a failed AoA sensor.

A sensor to measure angle of attack is conceptually simple. It’s essentially a weathervane poking out into the airstream. In the photo below, the angle-of-attack sensor is the small black vane just forward of the “737 MAX” legend. Hinged at the front, the vane rotates to align itself with the local airflow and generates an electrical signal that represents the vane’s angle with respect to the axis of the fuselage. The 737 MAX has two angle-of-attack vanes, one on each side of the nose. (The protruding devices above the AoA vane are pitot tubes, used to measure air speed. Another device below the word MAX is probably a temperature sensor.)

Angle of attack was not among the variables displayed to the pilots of the Lion Air 737, but the flight data recorder did capture signals derived from the two AoA sensors:

There’s something dreadfully wrong here. The left sensor is indicating an angle of attack about 20 degrees steeper than the right sensor. That’s a huge discrepancy. There’s no plausible way those disparate readings could reflect the true state of the airplane’s motion through the air, with the left side of the nose pointing sky-high and the right side near level. One of the measurements must be wrong, and the higher reading is the suspect one. If the true angle of attack ever reached 20 degrees, the airplane would already be in a deep stall. Unfortunately, on Flight 610 MCAS was taking data only from the left-side AoA sensor. It interpreted the nonsensical measurement as a valid indicator of aircraft attitude, and worked relentlessly to correct it, up to the very moment the airplane hit the sea.

The tragedies in Jakarta and Addis Ababa are being framed as a cautionary tale of automation run amok, with computers usurping the authority of pilots. The *Washington Post* editorialized:

A second fatal airplane accident involving a Boeing 737 MAX 8 may have been a case of man vs. machine…. The debacle shows that regulators should apply extra review to systems that take control away from humans when safety is at stake.

Tom Dieusaert, a Belgian journalist who writes often on aviation and computation, offered this opinion:

What can’t be denied is that the Boeing of Flight JT610 had serious computer problems. And in the hi-tech, fly-by-wire world of aircraft manufacturers, where pilots are reduced to button pushers and passive observers, these accidents are prone to happen more in the future.

The button-pushing pilots are particularly irate. Gregory Travis, who is both a pilot and software developer, summed up his feelings in this acerbic comment:

“Raise the nose, HAL.”

“I’m sorry, Dave, I can’t do that.”

Even Donald Trump tweeted on the issue:

Airplanes are becoming far too complex to fly. Pilots are no longer needed, but rather computer scientists from MIT. I see it all the time in many products. Always seeking to go one unnecessary step further, when often old and simpler is far better. Split second decisions are….

….needed, and the complexity creates danger. All of this for great cost yet very little gain. I don’t know about you, but I don’t want Albert Einstein to be my pilot. I want great flying professionals that are allowed to easily and quickly take control of a plane!

There’s considerable irony in the complaint that the 737 is too automated; in many respects the aircraft is in fact quaintly old-fashioned. The basic design goes back more than 50 years, and even in the latest MAX models quite a lot of 1960s technology survives. The primary flight controls are hydraulic, with a spider web of high-pressure tubing running directly from the control yokes in the cockpit to the ailerons, elevator, and rudder. If the hydraulic systems should fail, there’s a purely mechanical backup, with cables and pulleys to operate the various control surfaces. For stabilizer trim the primary actuator is an electric motor, but again there’s a mechanical fallback, with crank wheels near the pilots’ knees pulling on cables that run all the way back to the tail.

Other aircraft are much more dependent on computers and electronics. The 737′s principal competitor, the Airbus A320, is a thoroughgoing fly-by-wire vehicle. The pilot flies the computer, and the computer flies the airplane. Specifically, the pilot decides where to go—up, down, left, right—but the computer decides how to get there, choosing which control surfaces to deflect and by how much. Boeing’s own more recent designs, the 777 and 787, also rely on digital controls. Indeed, the latest models from both companies go a step beyond fly-by-wire to fly-by-network. Most of the communication from sensors to computers and onward to control surfaces consists of digital packets flowing through a variant of Ethernet. The airplane is a computer peripheral.

Thus if you want to gripe about the dangers and indignities of automation on the flight deck, the 737 is not the most obvious place to start. And a Luddite campaign to smash all the avionics and put pilots back in the seat of their pants would be a dangerously off-target response to the current predicament. There’s no question the 737 MAX has a critical problem. It’s a matter of life and death for those who would fly in it and possibly also for the Boeing Company. But the problem didn’t start with MCAS. It started with earlier decisions that made MCAS necessary. Furthermore, the problem may not end with the remedy that Boeing has proposed—a software update that will hobble MCAS and leave more to the discretion of pilots.

The 737 flew its first passengers in 1968. It was (and still is) the smallest member of the Boeing family of jet airliners, and it is also the most popular by far. More than 10,000 have been sold, and Boeing has orders for another 4,600. Of course there have been changes over the years, especially to engines and instruments. A 1980s update came to be known as 737 Classic, and a 1997 model was called 737 NG, for “next generation.” (Now, with the MAX, the NG has become the *previous* generation.) Through all these revisions, however, the basic structure of the airframe has hardly changed.

Ten years ago, it looked like the 737 had finally come to the end of its life. Boeing announced it would develop an all-new design as a replacement, with a hull built of lightweight composite materials rather than aluminum. Competitive pressures forced a change of course. Airbus had a head start on the A320neo, an update that would bring more efficient engines to their entry in the same market segment. The revised Airbus would be ready around 2015, whereas Boeing’s clean-slate project would take a decade. Customers were threatening to defect. In particular, American Airlines—long a Boeing loyalist—was negotiating a large order of A320neos.

In 2011 Boeing scrapped the plan for an all-new design and elected to do the same thing Airbus was doing: bolt new engines onto an old airframe. This would eliminate most of the up-front design work, as well as the need to build tooling and manufacturing facilities. Testing and certification by the FAA would also go quicker, so that the first deliveries might be made in five or six years, not too far behind Airbus.

*(left)* Bryan via Wikimedia, CC BY 2.0; *(right)* Steve Lynes via Wikimedia, CC BY 2.0.

The original 1960s 737 had two cigar-shaped engines, long and skinny, tucked up under the wings *(left photo above)*. Since then, jet engines have grown fat and stubby. They derive much of their thrust not from the jet exhaust coming out of the tailpipe but from “bypass” air moved by a large-diameter fan. Such engines would scrape on the ground if they were mounted under the wings of the 737; instead they are perched on pylons that extend forward from the leading edge of the wing. The engines on the MAX models *(right photo)* are the fattest yet, with a fan 69 inches in diameter. Compared with the NG series, the MAX engines are pushed a few inches farther forward and hang a few inches lower.

A New York Times article by David Gelles, Natalie Kitroeff, Jack Nicas, and Rebecca R. Ruiz describes the plane’s development as hurried and hectic.

Months behind Airbus, Boeing had to play catch-up. The pace of the work on the 737 Max was frenetic, according to current and former employees who spoke with

The New York Times…. Engineers were pushed to submit technical drawings and designs at roughly double the normal pace, former employees said.

The *Times* article also notes: “Although the project had been hectic, current and former employees said they had finished it feeling confident in the safety of the plane.”

Sometime during the development of the MAX series, Boeing got an unpleasant surprise. The new engines were causing unwanted pitch-up movements under certain flight conditions. When I first read about this problem, soon after the Lion Air crash, I found the following explanation is an article by Sean Broderick and Guy Norris in *Aviation Week and Space Technology* (Nov. 26–Dec. 9, 2018, pp. 56–57):

Like all turbofan-powered airliners in which the thrust lines of the engines pass below the center of gravity (CG), any change in thrust on the 737 will result in a change of flight path angle caused by the vertical component of thrust.

In other words, the low-slung engines not only push the airplane forward but also tend to twirl it around the pitch axis. It’s like a motorcycle doing wheelies. Because the MAX engines are mounted farther below and in front of the center of gravity, they act through a longer lever arm and cause more severe pitch-up motions.

I found more detail on this effect in an earlier *Aviation Week* article, a 2017 pilot report by Fred George, describing his first flight at the controls of the new MAX 8.

The aircraft has sufficient natural speed stability through much of its flight envelope. But with as much as 58,000 lb. of thrust available from engines mounted well below the center of gravity, there is pronounced thrust-versus-pitch coupling at low speeds, especially with aft center of gravity (CG) and at light gross weights. Boeing equips the aircraft with a speed-stability augmentation function that helps to compensate for the coupling by automatically trimming the horizontal stabilizer according to indicated speed, thrust lever position and CG. Pilots still must be aware of the effect of thrust changes on pitching moment and make purposeful control-wheel and pitch-trim inputs to counter it.

The reference to an “augmentation function” that works by “automatically trimming the horizontal stabilizer” sounded awfully familiar, but it turns out this is *not* MCAS. The system that compensates for thrust-pitch coupling is known as *speed-trim*. Like MCAS, it works “behind the pilot’s back,” making adjustments to control surfaces that were not directly commanded. There’s yet another system of this kind called *mach-trim* that silently corrects a different pitch anomally when the aircraft reaches transonic speeds, at about mach 0.6. Neither of these systems is new to the MAX series of aircraft; they have been part of the control algorithm at least since the NG came out in 1997. MCAS runs on the same computer as speed-trim and mach-trim and is part of the same software system, but it is a distinct function. And according to what I’ve been reading in the past few weeks, it addresses a different problem—one that seems more sinister.

Most aircraft have the pleasant property of static stability. When an airplane is properly trimmed for level flight, you can let go of the controls—at least briefly—and it will continue on a stable path. Moreover, if you pull back on the control yoke to point the nose up, then let go again, the pitch angle should return to neutral. The layout of the airplane’s various airfoil surfaces accounts for this behavior. When the nose goes up, the tail goes down, pushing the underside of the horizontal stabilizer into the airstream. The pressure of the air against this tail surface provides a restoring force that brings the tail back up and the nose back down. (That’s why it’s called a *stabilizer*!) This negative feedback loop is built in to the structure of the airplane, so that any departure from equilibrium creates a force that opposes the disturbance.

However, the tail surface, with its helpful stablizing influence, is not the only structure that affects the balance of aerodynamic forces. Jet engines are not designed to contribute lift to the airplane, but at high angles of attack they can do so, as the airstream impinges on the lower surface of each engine’s outer covering, or nacelle. When the engines are well forward of the center of gravity, the lift creates a pitch-up turning moment. If this moment exceeds the counterbalancing force from the tail, the aircraft is unstable. A nose-up attitude generates forces that raise the nose still higher, and positive feedback takes over.

Is the 737 MAX vulnerable to such runaway pitch excursions? The possibility had not occurred to me until I read a commentary on MCAS on the Boeing 737 Technical Site, a web publication produced by Chris Brady, a former 737 pilot and flight instructor. He writes:

MCAS is a longitudinal stability enhancement. It is not for stall prevention or to make the MAX handle like the NG; it was introduced to counteract the non-linear lift of the LEAP-1B engine nacelles and give a steady increase in stick force as AoA increases. The LEAP engines are both larger and relocated slightly up and forward from the previous NG CFM56-7 engines to accommodate their larger fan diameter. This new location and size of the nacelle cause the vortex flow off the nacelle body to produce lift at high AoA; as the nacelle is ahead of the CofG this lift causes a slight pitch-up effect (ie a reducing stick force) which could lead the pilot to further increase the back pressure on the yoke and send the aircraft closer towards the stall. This non-linear/reducing stick force is not allowable under

FAR = Federal Air Regulations. Part 25 deals with airworthiness standards for transport category airplanes. FAR §25.173 “Static longitudinal stability”. MCAS was therefore introduced to give an automatic nose down stabilizer input during steep turns with elevated load factors (high AoA) and during flaps up flight at airspeeds approaching stall.

Brady cites no sources for this statement, and as far as I know Boeing has neither confirmed nor denied. But *Aviation Week*, which earlier mentioned the thrust-pitch linkage, has more recently (issue of March 20) gotten behind the nacelle-lift instability hypothesis:

The MAX’s larger CFM Leap 1 engines create more lift at high AOA and give the aircraft a greater pitch-up moment than the CFM56-7-equipped NG. The MCAS was added as a certification requirement to minimize the handling difference between the MAX and NG.

Assuming the Brady account is correct, an interesting question is when Boeing noticed the instability. Were the designers aware of this hazard from the outset? Did it emerge during early computer simulations, or in wind tunnel testing of scale models? A story by Dominic Gates in the *Seattle Times* hints that Boeing may not have recognized the severity of the problem until flight tests of the first completed aircraft began in 2015.

According to Gates, the safety analysis that Boeing submitted to the FAA specified that MCAS would be allowed to move the horizontal stabilizer by no more than 0.6 degree. In the airplane ultimately released to the market, MCAS can go as far as 2.5 degrees, and it can act repeatedly until reaching the mechanical limit of motion at about 5 degrees. Gates writes:

That limit was later increased after flight tests showed that a more powerful movement of the tail was required to avert a high-speed stall, when the plane is in danger of losing lift and spiraling down.

The behavior of a plane in a high angle-of-attack stall is difficult to model in advance purely by analysis and so, as test pilots work through stall-recovery routines during flight tests on a new airplane, it’s not uncommon to tweak the control software to refine the jet’s performance.

The high-AoA instability of the MAX appears to be a property of the aerodynamic form of the entire aircraft, and so a direct way to suppress it would be to alter that form. For example, enlarging the tail surface might restore static stability. But such airframe modifications would have delayed the delivery of the airplane, especially if the need for them was discovered only after the first prototypes were already flying. Structural changes might also jeopardize inclusion of the new model under the old type certificate. Modifying software instead of aluminum must have looked like an attractive alternative. Someday, perhaps, we’ll learn how the decision was made.

By the way, according to Gates, the safety document filed with the FAA specifying a 0.6 degree limit has yet to be amended to reflect the true range of MCAS commands.

Instability is not necessarily the kiss of death in an airplane. There have been at least a few successful unstable designs, starting with the 1903 Wright Flyer. The Wright brothers deliberately put the horizontal stabilizer in front of the wing rather than behind it because their earlier experiments with kites and gliders had shown that what we call stability can also be described as sluggishness. The Flyer’s forward control surfaces (known as canards) tended to amplify any slight nose-up or nose-down motions. Maintaining a steady pitch attitude demanded high alertness from the pilot, but it also allowed the airplane to respond more quickly when the pilot *wanted* to pitch up or down. (The pros and cons of the design are reviewed in a 1984 paper by Fred E. C. Culick and Henry R. Jex.)

Another dramatically unstable aircraft was the Grumman X-29, a research platform designed in the 1980s. The X-29 had its wings on backwards; to make matters worse, the primary surfaces for pitch control were canards mounted in front of the wings, as in the Wright Flyer. The aim of this quirky project was to explore designs with exceptional agility, sacrificing static stability for tighter maneuvering. No unaided human pilot could have mastered such a twitchy vehicle. It required a digital fly-by-wire system that sampled the state of the airplane and adjusted the control surfaces up to 80 times per second. The controller was successful—perhaps too much so. It allowed the airplane to be flown safely, but in taming the instability it also left the plane with rather tame handling characteristics.

I have a glancing personal connection with the X-29 project. In the 1980s I briefly worked as an editor with members of the group at Honeywell who designed and built the X-29 control system. I helped prepare publications on the control laws and on their implementation in hardware and software. That experience taught me just enough to recognize something odd about MCAS: It is way too slow to be suppressing aerodynamic instability in a jet aircraft. Whereas the X-29 controller had a response time of 25 milliseconds, MCAS takes 10 seconds to move the 737 stabilizer through a 2.5-degree adjustment. At that pace, it cannot possibly keep up with forces that tend to flip the nose upward in a positive feedback loop.

There’s a simple explanation. MCAS is not meant to control an unstable aircraft. It is meant to restrain the aircraft from entering the regime where it becomes unstable. This is the same strategy used by other mechanisms of stall prevention—intervening before the angle of attack reaches the critical point. However, if Brady is correct about the instability of the 737 MAX, the task is more urgent for MCAS. Instability implies a steep and slippery slope. MCAS is a guard rail that bounces you back onto the road when you’re about to drive over the cliff.

Which brings up the question of Boeing’s announced plan to fix the MCAS problem. Reportedly, the revised system will not keep reactivating itself so persistently, and it will automatically disengage if it detects a large difference between the two AoA sensors. These changes should prevent a recurrence of the recent crashes. But do they provide adequate protection against the kind of mishap that MCAS was designed to prevent in the first place? With MCAS shut down, either manually or automatically, there’s nothing to stop an unwary or misguided pilot from wandering into the corner of the flight envelope where the MAX becomes unstable.

Without further information from Boeing, there’s no telling how severe the instability might be—if indeed it exists at all. The Brady article at the Boeing 737 Technical Site implies the problem is partly pilot-induced. Normally, to make the nose go higher and higher you have to pull harder and harder on the control yoke. In the unstable region, however, the resistance to pulling suddenly fades, and so the pilot may unwittingly pull the yoke to a more extreme position.

Is this human interaction a *necessary* part of the instability, or is it just an exacerbating factor? In other words, without the pilot in the loop, would there still be positive feedback causing runaway nose-up pitch? I have yet to find answers.

Another question: If the root of the problem is a deceptive change in the force resisting a nose-up movements of the control yoke, why not address that issue directly?

Even after the spurious activation of MCAS on Lion Air 610, the crash and the casualties would have been avoided if the pilots had simply turned the damn thing off. Why didn’t they? Apparently because they had never heard of MCAS, and didn’t know it was installed on the airplane they were flying, and had not received any instruction on how to disable it. There’s no switch or knob in the cockpit labeled “MCAS ON/OFF.” The Flight Crew Operation Manual does not mention it (except in a list of abbreviations), and neither did the transitional training program the pilots had completed before switching from the 737 NG to the MAX. The training consisted of either one or two hours (reports differ) with an iPad app.

Boeing’s explanation of these omissions was captured in a *Wall Street Journal* story:

One high-ranking Boeing official said the company had decided against disclosing more details to cockpit crews due to concerns about inundating average pilots with too much information—and significantly more technical data—than they needed or could digest.

To call this statement disingenuous would be disingenuous. What it is is preposterous. In the first place, Boeing did not withhold “more details”; they failed to mention the very existence of MCAS. And the too-much-information argument is silly. I don’t have access to the Flight Crew Operation Manual for the MAX, but the NG edition runs to more than 1,300 pages, plus another 800 for the Quick Reference Handbook. A few paragraphs on MCAS would not have sunk any pilot who wasn’t already drowning in TMI. Moreover, the manual carefully documents the speed-trim and mach-trim features, which seem to fall in the same category as MCAS: They act autonomously, and offer the pilot no direct interface for monitoring or adjusting them.

In the aftermath of the Lion Air accident, Boeing stated that the procedure for disabling MCAS was spelled out in the manual, even though MCAS itself wasn’t mentioned. That procedure is given in a checklist for “runaway stabilizer trim.” It is not complicated: Hang onto the control yoke, switch off the autopilot and autothrottles if they’re on; then, if the problem persists, flip two switches labeled “STAB TRIM” to the “CUTOUT” position. Only the last step will actually matter in the case of an MCAS malfunction.

This checklist is considered a “memory item”; pilots must be able to execute the steps without looking it up in the handbook. The Lion Air crew should certainly have been familiar with it. But could they recognize that it was the right checklist to apply in an airplane whose behavior was unlike anything they had seen in their training or previous 737 flying experience? According to the handbook, the condition that triggers use of the runaway checklist is “Uncommanded stabilizer trim movement occurs continuously.” The MCAS commands were not continuous but repetitive, so some leap of inference would have been needed to make this diagnosis.

By the time of the Ethiopian crash, 737 pilots everywhere knew all about MCAS and the procedure for disabling it. A preliminary report issued last week by Ethiopian Airlines indicates that after a few minutes of wrestling with the control yoke, the pilots on Flight 302 did invoke the checklist procedure, and moved the STAB TRIM switches to CUTOUT. The stabilizer then stopped responding to MCAS nose-down commands, but the pilots were unable to regain control of the airplane.

It’s not entirely clear why they failed or what was going on in the cockpit in those last minutes. One factor may be that the cutout switch disables not only automatic pitch trim movements but also manual ones requested through the buttons on the control yoke. The switch cuts all power to the electric motor that moves the stabilizer. In this situation the only way to adjust the trim is to turn the hand crank wheels near the pilots’ knees. During the crisis on Flight 302 that mechanism may have been too slow to correct the trim in time, or the pilots may have been so fixated on pulling the control yoke back with maximum force that they did not try the manual wheels. It’s also possible that they flipped the switches back to the NORMAL setting, restoring power to the stabilizer motor. The report’s narrative doesn’t mention this possibility, but the graph from the flight data recorder suggests it *(see below)*.

There’s room for debate on whether the MCAS system is a good idea when it is operating correctly, but when it activates *mistakenly* and sends an airplane diving into the sea, no one would defend it. By all appearances, the rogue behavior in both the Lion Air and the Ethiopian accidents was triggered by a malfunction in a single sensor. That’s not supposed to happen in aviation. It’s unfathomable that any aircraft manufacturer would knowingly build a vehicle in which the failure of a single part would lead to a fatal accident.

Protection against single failures comes from redundancy, and the 737 is so committed to this principle that it almost amounts to two airplanes wrapped up in a single skin. *three* of everything—sensors, computers, and actuators.

There’s one asterisk in this roster of redundancy: A device called the flight control computer, or FCC, apparently gets special treatment. There are two FCCs, but according to the Boeing 737 Technical Site only one of them operates during any given flight. All the other duplicated components run in parallel, receiving independent inputs, doing independent computations, emitting independent control actions. But for each flight just one FCC does all the work, and the other is put on standby. The scheme for choosing the active computer seems strangely arbitrary. Each day when the airplane is powered up, the left side FCC gets control for the first flight, then the right side unit takes over for the second flight of the day, and the two sides alternate until the power is shut off. After a restart, the alternation begins again with the left FCC.

Aspects of this scheme puzzle me. I don’t understand why redundant FCC units are treated differently from other components. If one FCC dies, does the other automatically take over? Can the pilots switch between them in flight? If so, would that be an effective way to combat MCAS misbehavior? I’ve tried to find answers in the manuals, but I don’t trust my interpretation of what I read.

I’ve also had a hard time learning anything about the FCC itself. I don’t know who makes it, or what it looks like, or how it is programmed. On a website called Closet Wonderfuls an item identified as a 737 flight control computer is on offer for $43.82, with free shipping.

In the context of the MAX crashes, the flight control computer is important for two reasons. First, it’s where MCAS lives; this is the computer on which the MCAS software runs. Second, the curious procedure for choosing a different FCC on alternating flights also winds up choosing which AoA sensor is providing input to MCAS. The left and right sensors are connected to the corresponding FCCs.

If the two FCCs are used in alternation, that raises an interesting question about the history of the aircraft that crashed in Indonesia. The preliminary crash report describes trouble with various instruments and controls on five flights over four days (including the fatal flight). All of the problems were on the left side of the aircraft or involved a disagreement between the left and right sides.

date | route | trouble reports | maintenance |
---|---|---|---|

Oct 26 | Tianjin → Manado | left side: no airspeed or altitude indications |
test left Stall Management and Yaw Damper computer; passed |

? | Manado → Denpasar | ? | ? |

Oct 27 | Denpasar → Manado | left side: no airspeed or altitude indications speed trim and mach trim warning lights |
test left Stall Management and Yaw Damper computer; failed reset left Air Data and Inertial Reference Unit retest left Stall Management and Yaw Damper computer; passed clean electrical connections |

Oct 27 | Manado → Denpasar | left side: no airspeed or altitude indications speed trim and mach trim warning lights autothrottle disconnect |
test left Stall Management and Yaw Damper computer; failed reset left Air Data and Inertial Reference Unit replace left AoA sensor |

Oct 28 | Denpasar → Jakarta | left/right disagree warning on airspeed and altitude stick shaker [MCAS activation] |
flush left pitot tube and static port clean electrical connectors on elevator “feel” computer |

Oct 29 | Jakarta → Pangkal Pinang | stick shaker [MCAS activation] |

Which of the five flights had the left-side FCC as active computer? The final two flights *(red)*, where MCAS activated, were both first-of-the-day flights and so presumably under control of the left FCC. For the rest it’s hard to tell, especially since maintenance operations may have entailed full shutdowns of the aircraft, which would have reset the alternation sequence.

The revised MCAS software will reportedly consult signals from both AoA sensors. What will it do with the additional information? Only one clue has been published so far: If the readings differ by more than 5.5 degrees, MCAS will shut down. What if the readings differ by 4 or 5 degrees?

The present MCAS system, with its alternating choice of left and right, has a 50 percent chance of disaster when a single random failure causes an AoA sensor to spew out falsely high data. With the same one-sided random failure, the updated MCAS will have a 100 percent chance of ignoring a pilot’s excursion into stall territory. Is that an improvement?

Although a faulty sensor should not bring down an airplane, I would still like to know what went wrong with the AoA vane.

It’s no surprise that AoA sensors can fail. They are mechanical devices operating in a harsh environment: winds exceeding 500 miles per hour and temperatures below –40. A common failure mode is a stuck vane, often caused by ice (despite a built-in de-icing heater). But a seized vane would produce a constant output, regardless of the real angle of attack, which is not the symptom seen in Flight 610. The flight data recorder shows small fluctuations in the signals from both the left and the right instruments. Furthermore, the jiggles in the two curves are closely aligned, suggesting they are both tracking the same movements of the aircraft. In other words, the left-hand sensor appears to be functioning; it’s just giving measurements offset by a constant deviation of roughly 20 degrees.

Is there some other failure mode that might produce the observed offset? Sure: Just bend the vane by 20 degrees. Maybe a catering truck or an airport jetway blundered into it. Another creative thought is that the sensor might have been installed wrong, with the entire unit rotated by 20 degrees. Several writers on a website called the Professional Pilots Rumour Network explored this possibility, but they ultimately concluded it was impossible. The manufacturer, doubtless aware of the risk, placed the mounting screws and locator pins asymmetrically, so the unit will only go into the hull opening one way.

You might get the same effect through an assembly error during the manufacture of the sensor. The vane could be incorrectly attached to the shaft, or else the internal transducer that converts angular position into an electrical signal might be mounted wrong. Did the designers also ensure that such mistakes are impossible? I don’t know; I haven’t been able to find any drawings or photographs of the sensor’s innards.

Looking for other ideas about what might have gone wrong, I made a quick, scattershot survey of FAA airworthiness directives that call for servicing or replacing AoA sensors. I found dozens of them, including several that discuss the same sensor installed on the 737 MAX (the Rosemount 0861). But none of the reports I read describes a malfunction that could cause a consistent 20-degree error.

For a while I thought that the fault might lie not in the sensor itself but farther along the data path. It could be something as simple as a bad cable or connector. Signals from the AoA sensor go to the Air Data and Inertial Reference Unit (ADIRU), where the sine and cosine components are combined and digitized to yield a number representing the measured angle of attack. The ADIRU also receives inputs from other sensors, including the pitot tubes for measuring airspeed and the static ports for air pressure. And it houses the gyroscopes and accelerometers of an inertial guidance system, which can keep track of aircraft motion without reference to external cues. (There’s a separate ADIRU for each side of the airplane.) Maybe there was a problem with the digitizer—a stuck bit rather than a stuck vane.

Further information has undermined this idea. For one thing, the AoA sensor removed by the Lion Air maintenance crew on October 27 is now in the hands of investigators. According to news reports, it was “deemed to be defective,” though I’ve heard no hint of what the defect might be. Also, it turns out that one element of the control system, the Stall Management and Yaw Damper (SMYD) computer, receives the raw sine and cosine voltages directly from the sensor, not a digitized angle calculated by the ADIRU. It is the SMYD that controls the stick-shaker function. On both the Lion Air and the Ethiopian flights the stick shaker was active almost continuously, so those undigitized sine and cosine voltages must have been indicating a high angle of attack. In other words the error already existed before the signals reached the ADIRU.

I’m still stumped by the fixed angular offset in the Lion Air data, but the question now seems a little less important. The release of the preliminary report on Ethiopian Flight 302 shows that the left-side AoA sensor on that aircraft also failed badly, but in a way that looks totally different. Here are the relevant traces from the flight data recorder:

The readings from the AoA sensors are the uppermost lines, red for the left sensor and blue for the right. At the left edge of the graph they differ somewhat when the airplane has just begun to move, but they fall into close coincidence once the roll down the runway has built up some speed. At takeoff, however, they suddenly diverge dramtically, as the left vane begins reading an utterly implausible 75 degrees nose up. Later it comes down a few degrees but otherwise shows no sign of the ripples that would suggest a response to airflow. At the very end of the flight there are some more unexplained excursions.

By the way, in this graph the light blue trace of automatic trim commands offers another clue to what might have happened in the last moments of Flight 302. Around the middle of the graph, the STAB TRIM switches were pulled, with the result that an automatic nose-down command had no effect on the stabilizer position. But at the far right, another automatic nose-down command does register in the trim-position trace, suggesting that the cutout switches may have been turned on again.

There’s so much I still don’t understand.

Puzzle 1. If the Lion Air and Ethiopian accidents were both caused by faulty AoA sensors, then there were three parts with similar defects in brand new aircraft (including the replacement sensor installed by Lion Air on October 27). A recent news item says the replacement was not a new part but one that had been refurbished by a Florida shop called XTRA Aerospace. This fact offers us somewhere else to point the accusatory finger, but presumably the two sensors installed by Boeing were not retreads, so XTRA can’t be blamed for all of them.

There are roughly 400 MAX aircraft in service, with 800 AoA sensors. Is a failure rate of 3 out of 800 unusual or unacceptable? Does that judgment depend on whether or not it’s the same defect in all three cases?

Puzzle 2. Let’s look again at the traces for pitch trim and angle of attack in the Lion Air 610 data. The conflicting manual and automatic commands in the second half of the flight have gotten lots of attention, but I’m also baffled by what was going on in the first few minutes.

During the roll down the runway, the pitch trim system was set near its maximum pitch-up position *(dark blue line)*. Immediately after takeoff, the automatic trim system began calling for further pitch-up movement, and the stabilizer probably reached its mechanical limit. At that point the pilots manually trimmed it in the pitch-down direction, and the automatic system replied with a rapid sequence of up adjustments. In other words, there was already a tug-of-war underway, but the pilots and the automated controls were pulling in directions opposite to those they would choose later on. All this happened while the flaps were still deployed, which means that MCAS could not have been active. Some other element of the control system must have been issuing those automatic pitch-up orders. Deepening the mystery, the left side AoA sensor was already feeding its spurious high readings to the left-side flight control computer. If the FCC was acting on that data, it should not have been commanding nose-up trim.

Puzzle 3. The AoA readings are not the only peculiar data in the chart from the Lion Air preliminary report. Here are the altitude and speed traces:

The left-side altitude readings *(red)* are low by at least a few hundred feet. The error looks like it might be multiplicative rather than additive, perhaps 10 percent. The left and right computed airspeeds also disagree, although the chart is too squished to allow a quantitative comparison. It was these discrepancies that initially upset the pilots of Flight 610; they could see them on their instruments. (They had no angle of attack indicators in the cockpit, so that conflict was invisible to them.)

Altitude, airspeed, and angle of attack are all measured by different sensors. Could they all have gone haywire at the same time? Or is there some common point of failure that might explain all the weird behavior? In particular, is it possible a single wonky AoA sensor caused all of this havoc? My guess is yes. The sensors for altitude and airspeed and even temperature are influenced by angle of attack. The measured speed and pressure are therefore adjusted to compensate for this confounding variable, using the output of the AoA sensor. That output was wrong, and so the adjustments allowed one bad data stream to infect all of the air data measurements.

Six months ago, I was writing about another disaster caused by an out-of-control control system. In that case the trouble spot was a natural gas distribution network in Massachusetts, where a misconfigured pressure-regulating station caused fires and explosions in more than 100 buildings, with one fatality and 20 serious injuries. I lamented: “The special pathos of technological tragedies is that the engines of our destruction are machines that we ourselves design and build.”

In a world where defective automatic controls are blowing up houses and dropping aircraft out of the sky, it’s hard to argue for *more* automation, for adding further layers of complexity to control systems, for endowing machines with greater autonomy. Public sentiment leans the other way. Like President Trump, most of us trust pilots more than we trust computer scientists. We don’t want MCAS on the flight deck. We want Chesley Sullenberger III, the hero of USAir Flight 1549, who guided his crippled A320 to a dead-stick landing in the Hudson River and saved all 155 souls on board. No amount of cockpit automation could have pulled off that feat.

Nevertheless, a cold, analytical view of the statistics suggests a different reaction. The human touch doesn’t always save the day. On the contrary, pilot error is responsible for more fatal crashes than any other cause. One survey lists pilot error as the initiating event in 40 percent of fatal accidents, with equipment failure accounting for 23 percent. No one is (yet) advocating a pilotless cockpit, but at this point in the history of aviation technology that’s a nearer prospect than a computer-free cockpit.

The MCAS system of the 737 MAX represents a particularly awkward compromise between fully manual and fully automatic control. The software is given a large measure of responsibility for flight safety and is even allowed to override the decisions of the pilot. And yet when the system malfunctions, it’s entirely up to the pilot to figure out what went wrong and how to fix it—and the fix had better be quick, before MCAS can drive the plane into the ground.

Two lost aircraft and 346 deaths are strong evidence that this design was not a good idea. But what to do about it? Boeing’s plan is a retreat from automatic control, returning more responsibility and authority to the pilots:

- Flight control system will now compare inputs from both AOA sensors. If the sensors disagree by 5.5 degrees or more with the flaps retracted, MCAS will not activate. An indicator on the flight deck display will alert the pilots.
- If MCAS is activated in non-normal conditions, it will only provide one input for each elevated AOA event. There are no known or envisioned failure conditions where MCAS will provide multiple inputs.
- MCAS can never command more stabilizer input than can be counteracted by the flight crew pulling back on the column. The pilots will continue to always have the ability to override MCAS and manually control the airplane.

A statement from Dennis Muilenburg, Boeing’s CEO, says the software update “will ensure accidents like that of Lion Air Flight 610 and Ethiopian Airlines Flight 302 never happen again.” I hope that’s true, but what about the accidents that MCAS was designed to prevent? I also hope we will not be reading about a 737 MAX that stalled and crashed because the pilots, believing MCAS was misbehaving, kept hauling back on the control yokes.

If Boeing were to take the opposite approach—not curtailing MCAS but enhancing it with still more algorithms that fiddle with the flight controls—the plan would be greeted with hoots of outrage and derision. Indeed, it seems like a terrible idea. MCAS was installed to prevent pilots from wandering into hazardous territory. A new supervisory system would keep an eye on MCAS, stepping in if it began acting suspiciously. Wouldn’t we then need another custodian to guard the custodians, ad infinitum? Moreoever, with each extra layer of complexity we get new side effects and unintended consequences and opportunities for something to break. The system becomes harder to test, and impossible to prove correct.

Those are serious objections, but the problem being addressed is also serious.

Suppose the 737 MAX didn’t have MCAS but did have a cockpit indicator of angle of attack. On the Lion Air flight, the captain would have felt the stick-shaker warning him of an incipient stall and would have seen an alarmingly high angle of attack on his instrument panel. His training would have impelled him to do the same thing MCAS did: Push the nose down to get the wings working again. Would he have continued pushing it down until the plane crashed? Surely not. He would have looked out the window, he would have cross-checked the instruments on the other side of the cockpit, and after some scary moments he would have realized it was a false alarm. (In darkness or low visibility, where the pilot can lose track of the horizon, the outcome might be worse.)

I see two lessons in this hypothetical exercise. First, erroneous sensor data is dangerous, whether the airplane is being flown by a computer or by Chesley Sullenberger. A prudently designed instrument and control system would take steps to detect (and ideally correct) such errors. At the moment, redundancy is the only defense against these failures—and in the unpatched version of MCAS even that protection is compromised. It’s not enough. One key to the superiority of human pilots is that they exercise judgment and sometimes skepticism about what the instruments tell them. That kind of reasoning is not beyond the reach of automated systems. There’s plenty of information to be exploited. For example, inconsistencies between AoA sensors, pitot tubes, static pressure ports, and air temperature probes not only signal that something’s wrong but can offer clues about *which* sensor has failed. The inertial reference unit provides an independent check on aircraft attitude; even GPS signals might be brought to bear. Admittedly, making sense of all this data and drawing a valid conclusion from it—a problem known as sensor fusion—is a major challenge.

Second, a closed-loop controller has yet another source of information: an implicit model of the system being controlled. If you change the angle of the horizontal stabilizer, the state of the airplane is expected to change in known ways—in angle of attack, pitch angle, airspeed, altitude, and in the rate of change in all these parameters. If the result of the control action is not consistent with the model, something’s not right. To persist in issuing the same commands when they don’t produce the expected results is not reasonable behavior. Autopilots include rules to deal with such situations; the lower-level control laws that run in manual-mode flight could incorporate such sanity checks as well.

I don’t claim to have the answer to the MCAS problem. And I don’t want to fly in an airplane I designed. (Neither do you.) But there’s a general principle here that I believe should be taken to heart: If an autonomous system makes life-or-death decisions based on sensor data, it ought to verify the validity of the data.

Boeing continues to insist that MCAS is “not a stall-protection function and not a stall-prevention function. It is a handling-qualities function. There’s a misconception it is something other than that.” This statement comes from Mike Sinnett, who is vice president of product development and future airplane development at Boeing; it appears in an *Aviation Week* article by Guy Norris published online April 9.

I don’t know exactly what “handling qualities” means in this context. To me the phrase connotes something that might affect comfort or aesthetics or pleasure more than safety. An airplane with different handling qualities would feel different to the pilot but could still be flown without risk of serious mishap. Is Sinnett implying something along those lines? If so—if MCAS is not critical to the safety of flight—I’m surprised that Boeing wouldn’t simply disable it temporarily, as a way of getting the fleet back in the air while they work out a permanent solution.

The Norris article also quote Sinnett as saying: “The thing you are trying to avoid is a situation where you are pulling back and all of a sudden it gets easier, and you wind up overshooting and making the nose higher than you want it to be.” That situation, with the nose higher than you want it to be, sounds to me like an airplane that might be approaching a stall.

A story by Jack Nicas, David Gelles, and James Glanz in today’s *New York Times* offers a quite different account, suggesting that “handling qualities” may have motivated the first version of MCAS, but stall risks were part of the rationale for later beefing it up.

The system was initially designed to engage only in rare circumstances, namely high-speed maneuvers, in order to make the plane handle more smoothly and predictably for pilots used to flying older 737s, according to two former Boeing employees who spoke on the condition of anonymity because of the open investigations.

For those situations, MCAS was limited to moving the stabilizer—the part of the plane that changes the vertical direction of the jet—about 0.6 degrees in about 10 seconds.

It was around that design stage that the F.A.A. reviewed the initial MCAS design. The planes hadn’t yet gone through their first test flights.

After the test flights began in early 2016, Boeing pilots found that just before a stall at various speeds, the Max handled less predictably than they wanted. So they suggested using MCAS for those scenarios, too, according to one former employee with direct knowledge of the conversations

Finally, another *Aviation Week* story by Guy Norris, published yesterday, gives a convincing account of what happened to the angle of attack sensor on Ethiopian Airlines Flight 302. According to Norris’s sources, the AoA vane was sheared off moments after takeoff, probably by a bird strike. This hypothesis is consistent with the traces extracted from the flight data recorder, including the strange-looking wiggles at the very end of the flight. I wonder if there’s hope of finding the lost vane, which shouldn’t be far from the end of the runway.

The moment I saw it, I had to stop in my tracks, grab a scratch pad, and check out the formula. The result made sense in a rough-and-ready sort of way. Since the multiplicative version of \(n!\) goes to infinity as \(n\) increases, the “divisive” version should go to zero. And \(\frac{n^2}{n!}\) does exactly that; the polynomial function \(n^2\) grows slower than the exponential function \(n!\) for large enough \(n\):

\[\frac{1}{1}, \frac{4}{2}, \frac{9}{6}, \frac{16}{24}, \frac{25}{120}, \frac{36}{720}, \frac{49}{5040}, \frac{64}{40320}, \frac{81}{362880}, \frac{100}{3628800}.\]

But why does the quotient take the particular form \(\frac{n^2}{n!}\)? Where does the \(n^2\) come from?

To answer that question, I had to revisit the long-ago trauma of learning to divide fractions, but I pushed through the pain. Proceeding from left to right through the formula in the tweet, we first get \(\frac{n}{n-1}\). Then, dividing that quantity by \(n-2\) yields

\[\cfrac{\frac{n}{n-1}}{n-2} = \frac{n}{(n-1)(n-2)}.\]

Continuing in the same way, we ultimately arrive at:

\[n \mathbin{/} (n-1) \mathbin{/} (n-2) \mathbin{/} (n-3) \mathbin{/} \cdots \mathbin{/} 1 = \frac{n}{(n-1) (n-2) (n-3) \cdots 1} = \frac{n}{(n-1)!}\]

To recover the tweet’s stated result of \(\frac{n^2}{n!}\), just multiply numerator and denominator by \(n\). (To my taste, however, \(\frac{n}{(n-1)!}\) is the more perspicuous expression.)

I am a card-carrying factorial fanboy. You can keep your fancy Fibonaccis; *this* is my favorite function. Every time I try out a new programming language, my first exercise is to write a few routines for calculating factorials. Over the years I have pondered several variations on the theme, such as replacing \(\times\) with \(+\) in the definition (which produces triangular numbers). But I don’t think I’ve ever before considered substituting \(\mathbin{/}\) for \(\times\). It’s messy. Because multiplication is commutative and associative, you can define \(n!\) simply as the product of all the integers from \(1\) through \(n\), without worrying about the order of the operations. With division, order can’t be ignored. In general, \(x \mathbin{/} y \ne y \mathbin{/}x\), and \((x \mathbin{/} y) \mathbin{/} z \ne x \mathbin{/} (y \mathbin{/} z)\).

The Fermat’s Library tweet puts the factors in descending order: \(n, n-1, n-2, \ldots, 1\). The most obvious alternative is the ascending sequence \(1, 2, 3, \ldots, n\). What happens if we define the divisive factorial as \(1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n\)? Another visit to the schoolroom algorithm for dividing fractions yields this simple answer:

\[1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n = \frac{1}{2 \times 3 \times 4 \times \cdots \times n} = \frac{1}{n!}.\]

In other words, when we repeatedly divide while counting up from \(1\) to \(n\), the final quotient is the reciprocal of \(n!\). (I wish I could put an exclamation point at the end of that sentence!) If you’re looking for a canonical answer to the question, “What do you get if you divide instead of multiplying in \(n!\)?” I would argue that \(\frac{1}{n!}\) is a better candidate than \(\frac{n}{(n - 1)!}\). Why not embrace the symmetry between \(n!\) and its inverse?

Of course there are many other ways to arrange the *n* integers in the set \(\{1 \ldots n\}\). How many ways? As it happens, \(n!\) of them! Thus it would seem there are \(n!\) distinct ways to define the divisive \(n!\) function. However, looking at the answers for the two permutations discussed above suggests there’s a simpler pattern at work. Whatever element of the sequence happens to come first winds up in the numerator of a big fraction, and the denominator is the product of all the other elements. As a result, there are really only \(n\) different outcomes—assuming we stick to performing the division operations from left to right. For any integer \(k\) between \(1\) and \(n\), putting \(k\) at the head of the queue creates a divisive \(n!\) equal to \(k\) divided by all the other factors. We can write this out as:

\[\cfrac{k}{\frac{n!}{k}}, \text{ which can be rearranged as } \frac{k^2}{n!}.\]

And thus we also solve the minor mystery of how \(\frac{n}{(n-1)!}\) became \(\frac{n^2}{n!}\) in the tweet.

It’s worth noting that all of these functions converge to zero as \(n\) goes to infinity. Asymptotically speaking, \(\frac{1^2}{n!}, \frac{2^2}{n!}, \ldots, \frac{n^2}{n!}\) are all alike.

Ta dah! Mission accomplished. Problem solved. Done and dusted. Now we know everything there is to know about divisive factorials, right?

Well, maybe there’s one more question. What does the computer say? If you take your favorite factorial algorithm, and do as the tweet suggests, replacing any appearance of the \(\times\) (or `*`

) operator with `/`

, what happens? Which of the \(n\) variants of divisive \(n!\) does the program produce?

Here’s *my* favorite algorithm for computing factorials, in the form of a Julia program:

```
function mul!(n)
if n == 1
return 1
else
return n * mul!(n - 1)
end
end
```

This is the algorithm that has introduced generations of nerds to the concept of recursion. In narrative form it says: If \(n\) is \(1\), then \(mul!(n)\) is \(1\). Otherwise, evaluate the function \(mul!(n-1)\), then multiply the result by \(n\).

The function can be written more tersely using Julia’s one-liner style of definition:.

`mul!(n) = n == 1 ? 1 : n * mul!(n - 1)`

The right side of the assignment statement is a conditional expression, or ternary operator, which has the form `a ? b : c`

. Here `a`

is a boolean test clause, which must return a value of either `true`

or `false`

. If `a`

is `true`

, clause `b`

is evaluated, and the result becomes the value of the entire expression. Otherwise clause `c`

is evaluated.

Just to be sure I’ve got this right, here are the first 10 factorials, as calculated by this program:

```
[mul!(n) for n in 1:10]
10-element Array{Int64,1}:
1
2
6
24
120
720
5040
40320
362880
3628800
```

Now let’s edit that definition and convert the single occurence of `*`

to a `/`

, leaving everything else (except the name of the function) unchanged.

`div!(n) = n == 1 ? 1 : n / div!(n - 1)`

And here’s what comes back when we run the program for values of \(n\) from \(1\) through \(20\):

```
[div!(n) for n in 1:20]
20-element Array{Real,1}:
1
2.0
1.5
2.6666666666666665
1.875
3.2
2.1875
3.657142857142857
2.4609375
4.063492063492063
2.70703125
4.432900432900433
2.9326171875
4.773892773892774
3.14208984375
5.092152292152292
3.338470458984375
5.391690662278897
3.523941040039063
5.675463855030418
```

Huh? That sure doesn’t look like it’s converging to zero—not as \(\frac{1}{n!}\) or as \(\frac{n}{n - 1}\). As a matter of fact, it doesn’t look like it’s going to converge at all. The graph below suggests the sequence is made up of two alternating components, both of which appear to be slowly growing toward infinity as well as diverging from one another.

In trying to make sense of what we’re seeing here, it helps to change the output type of the `div!`

function. Instead of applying the division operator `/`

, which returns the quotient as a floating-point number, we can substitute the `//`

operator, which returns an exact rational quotient, reduced to lowest terms.

`div!(n) = n == 1 ? 1 : n // div!(n - 1)`

Here’s the sequence of values for `n in 1:20`

:

```
20-element Array{Real,1}:
1
2//1
3//2
8//3
15//8
16//5
35//16
128//35
315//128
256//63
693//256
1024//231
3003//1024
2048//429
6435//2048
32768//6435
109395//32768
65536//12155
230945//65536
262144//46189
```

The list is full of curious patterns. It’s a double helix, with even numbers and odd numbers zigzagging in complementary strands. The even numbers are not just even; they are all powers of \(2\). Also, they appear in pairs—first in the numerator, then in the denominator—and their sequence is nondecreasing. But there are gaps; not all powers of \(2\) are present. The odd strand looks even more complicated, with various small prime factors flitting in and out of the numbers. (The primes *have* to be small—smaller than \(n\), anyway.)

This outcome took me by surprise. I had really expected to see a much tamer sequence, like those I worked out with pencil and paper. All those jagged, jitterbuggy ups and downs made no sense. Nor did the overall trend of unbounded growth in the ratio. How could you keep dividing and dividing, and wind up with bigger and bigger numbers?

At this point you may want to pause before reading on, and try to work out your own theory of where these zigzag numbers are coming from. If you need a hint, you can get a strong one—almost a spoiler—by looking up the sequence of numerators or the sequence of denominators in the Online Encyclopedia of Integer Sequences.

Here’s another hint. A small edit to the `div!`

program completely transforms the output. Just flip the final clause, changing `n // div!(n - 1)`

into `div!(n - 1) // n`

.

`div!(n) = n == 1 ? 1 : div!(n - 1) // n`

Now the results look like this:

```
10-element Array{Real,1}:
1
1//2
1//6
1//24
1//120
1//720
1//5040
1//40320
1//362880
1//3628800
```

This is the inverse factorial function we’ve already seen, the series of quotients generated when you march left to right through an ascending sequence of divisors \(1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n\).

It’s no surprise that flipping the final clause in the procedure alters the outcome. After all, we know that division is not commutative or associative. What’s not so easy to see is why the sequence of quotients generated by the original program takes that weird zigzag form. What mechanism is giving rise to those paired powers of 2 and the alternation of odd and even?

I have found that it’s easier to explain what’s going on in the zigzag sequence when I describe an iterative version of the procedure, rather than the recursive one. (This is an embarrassing admission for someone who has argued that recursive definitions are easier to reason about, but there you have it.) Here’s the program:

```
function div!_iter(n)
q = 1
for i in 1:n
q = i // q
end
return q
end
```

I submit that this looping procedure is operationally identical to the recursive function, in the sense that if `div!(n)`

and `div!_iter(n)`

both return a result for some positive integer `n`

, it will always be the same result. Here’s my evidence:

```
[div!(n) for n in 1:20] [div!_iter(n) for n in 1:20]
1 1//1
2//1 2//1
3//2 3//2
8//3 8//3
15//8 15//8
16//5 16//5
35//16 35//16
128//35 128//35
315//128 315//128
256//63 256//63
693//256 693//256
1024//231 1024//231
3003//1024 3003//1024
2048//429 2048//429
6435//2048 6435//2048
32768//6435 32768//6435
109395//32768 109395//32768
65536//12155 65536//12155
230945//65536 230945//65536
262144//46189 262144//46189
```

To understand the process that gives rise to these numbers, consider the successive values of the variables \(i\) and \(q\) each time the loop is executed. Initially, \(i\) and \(q\) are both set to \(1\); hence, after the first passage through the loop, the statement `q = i // q`

gives \(q\) the value \(\frac{1}{1}\). Next time around, \(i = 2\) and \(q = \frac{1}{1}\), so \(q\)’s new value is \(\frac{2}{1}\). On the third iteration, \(i = 3\) and \(q = \frac{2}{1}\), yielding \(\frac{i}{q} \rightarrow \frac{3}{2}\). If this is still confusing, try thinking of \(\frac{i}{q}\) as \(i \times \frac{1}{q}\). The crucial observation is that on every passage through the loop, \(q\) is inverted, becoming \(\frac{1}{q}\).

If you unwind these operations, and look at the multiplications and divisions that go into each element of the series, a pattern emerges:

\[\frac{1}{1}, \quad \frac{2}{1}, \quad \frac{1 \cdot 3}{2}, \quad \frac{2 \cdot 4}{1 \cdot 3}, \quad \frac{1 \cdot 3 \cdot 5}{2 \cdot 4} \quad \frac{2 \cdot 4 \cdot 6}{1 \cdot 3 \cdot 5}\]

The general form is:

\[\frac{1 \cdot 3 \cdot 5 \cdot \cdots \cdot n}{2 \cdot 4 \cdot \cdots \cdot (n-1)} \quad (\text{odd } n) \qquad \frac{2 \cdot 4 \cdot 6 \cdot \cdots \cdot n}{1 \cdot 3 \cdot 5 \cdot \cdots \cdot (n-1)} \quad (\text{even } n).

\]

The functions \(1 \cdot 3 \cdot 5 \cdot \cdots \cdot n\) for odd \(n\) and \(2 \cdot 4 \cdot 6 \cdot \cdots \cdot n\) for even \(n\) have a name! They are known as double factorials, with the notation \(n!!\). *n* is defined as the product of *n* and all smaller positive integers of the same parity. Thus our peculiar sequence of zigzag quotients is simply \(\frac{n!!}{(n-1)!!}\).

A 2012 article by Henry W. Gould and Jocelyn Quaintance (behind a paywall, regrettably) surveys the applications of double factorials. They turn up more often than you might guess. In the middle of the 17th century John Wallis came up with this identity:

\[\frac{\pi}{2} = \frac{2 \cdot 2 \cdot 4 \cdot 4 \cdot 6 \cdot 6 \cdots}{1 \cdot 3 \cdot 3 \cdot 5 \cdot 5 \cdot 7 \cdots} = \lim_{n \rightarrow \infty} \frac{((2n)!!)^2}{(2n + 1)!!(2n - 1)!!}\]

An even weirder series, involving the cube of a quotient of double factorials, sums to \(\frac{2}{\pi}\). That one was discovered by (who else?) Srinivasa Ramanujan.

Gould and Quaintance also discuss the double factorial counterpart of binomial coefficients. The standard binomial coefficient is defined as:

\[\binom{n}{k} = \frac{n!}{k! (n-k)!}.\]

The double version is:

\[\left(\!\binom{n}{k}\!\right) = \frac{n!!}{k!! (n-k)!!}.\]

Note that our zigzag numbers fit this description and therefore qualify as double factorial binomial coefficients. Specifically, they are the numbers:

\[\left(\!\binom{n}{1}\!\right) = \left(\!\binom{n}{n - 1}\!\right) = \frac{n!!}{1!! (n-1)!!}.\]

The regular binomial \(\binom{n}{1}\) is not very interesting; it is simply equal to \(n\). But the doubled version \(\left(\!\binom{n}{1}\!\right)\), as we’ve seen, dances a livelier jig. And, unlike the single binomial, it is not always an integer. (The only integer values are \(1\) and \(2\).)

Seeing the zigzag numbers as ratios of double factorials explains quite a few of their properties, starting with the alternation of evens and odds. We can also see why all the even numbers in the sequence are powers of 2. Consider the case of \(n = 6\). The numerator of this fraction is \(2 \cdot 4 \cdot 6 = 48\), which acquires a factor of \(3\) from the \(6\). But the denominator is \(1 \cdot 3 \cdot 5 = 15\). The \(3\)s above and below cancel, leaving \(\frac{16}{5}\). Such cancelations will happen in every case. Whenever an odd factor \(m\) enters the even sequence, it must do so in the form \(2 \cdot m\), but at that point \(m\) itself must already be present in the odd sequence.

Is the sequence of zigzag numbers a reasonable answer to the question, “What happens when you divide instead of multiply in \(n!\)?” Or is the computer program that generates them just a buggy algorithm? My personal judgment is that \(\frac{1}{n!}\) is a more intuitive answer, but \(\frac{n!!}{(n - 1)!!}\) is more interesting.

Furthermore, the mere existence of the zigzag sequence broadens our horizons. As noted above, if you insist that the division algorithm must always chug along the list of \(n\) factors in order, at each stop dividing the number on the left by the number on the right, then there are only \(n\) possible outcomes, and they all look much alike. But the zigzag solution suggests wilder possibilities. We can formulate the task as follows. Take the set of factors \(\{1 \dots n\}\), select a subset, and invert all the elements of that subset; now multiply all the factors, both the inverted and the upright ones. If the inverted subset is empty, the result is the ordinary factorial \(n!\). If *all* of the factors are inverted, we get the inverse \(\frac{1}{n!}\). And if every second factor is inverted, starting with \(n - 1\), the result is an element of the zigzag sequence.

These are only a few among the many possible choices; in total there are \(2^n\) subsets of \(n\) items. For example, you might invert every number that is prime or a power of a prime \((2, 3, 4, 5, 7, 8, 9, 11, \dots)\). For small \(n\), the result jumps around but remains consistently less than \(1\):

If I were to continue this plot to larger \(n\), however, it would take off for the stratosphere. Prime powers get sparse farther out on the number line.

Here’s a question. We’ve seen factorial variants that go to zero as \(n\) goes to infinity, such as \(1/n!\). We’ve seen other variants grow without bound as \(n\) increases, including \(n!\) itself, and the zigzag numbers. Are there any versions of the factorial process that converge to a finite bound other than zero?

My first thought was this algorithm:

```
function greedy_balance(n)
q = 1
while n > 0
q = q > 1 ? q /= n : q *= n
n -= 1
end
return q
end
```

We loop through the integers from \(n\) down to \(1\), calculating the running product/quotient \(q\) as we go. At each step, if the current value of \(q\) is greater than \(1\), we divide by the next factor; otherwise, we multiply. This scheme implements a kind of feedback control or target-seeking behavior. If \(q\) gets too large, we reduce it; too small and we increase it. I conjectured that as \(n\) goes to infinity, \(q\) would settle into an ever-narrower range of values near \(1\).

Running the experiment gave me another surprise:

That sawtooth wave is not quite what I expected. One minor peculiarity is that the curve is not symmetric around \(1\); the excursions above have higher amplitude than those below. But this distortion is more visual than mathematical. Because \(q\) is a ratio, the distance from \(1\) to \(10\) is the same as the distance from \(1\) to \(\frac{1}{10}\), but it doesn’t look that way on a linear scale. The remedy is to plot the log of the ratio:

Now the graph is symmetric, or at least approximately so, centered on \(0\), which is the logarithm of \(1\). But a larger mystery remains. The sawtooth waveform is very regular, with a period of \(4\), and it shows no obvious signs of shrinking toward the expected limiting value of \(\log q = 0\). Numerical evidence suggests that as \(n\) goes to infinity the peaks of this curve converge on a value just above \(q = \frac{5}{3}\), and the troughs approach a value just below \(q = \frac{3}{5}\). (The corresponding base-\(10\) logarithms are roughly \(\pm0.222\). I have not worked out why this should be so. Perhaps someone will explain it to me.

The failure of this greedy algorithm doesn’t mean we can’t find a divisive factorial that converges to \(q = 1\).

I have computed the optimal partitionings up to \(n = 30\), where there are a billion possibilities to choose from.

The graph is clearly flatlining. You could use the same method to force convergence to any other value between \(0\) and \(n!\).

And thus we have yet another answer to the question in the tweet that launched this adventure. What happens when you divide instead of multiply in n!? Anything you want.

]]>On my visit to Baltimore for the Joint Mathematics Meetings a couple of weeks ago, I managed to score a hotel room with a spectacular scenic view. My seventh-floor perch overlooked the Greene Street substation of the Baltimore Gas and Electric Company, just around the corner from the Camden Yards baseball stadium.

Some years ago, writing about such technological landscapes, I argued that you can understand what you’re looking at if you’re willing to invest a little effort:

At first glance, a substation is a bewildering array of hulking steel machines whose function is far from obvious. Ponderous tanklike or boxlike objects are lined up in rows. Some of them have cooling fins or fans; many have fluted porcelain insulators poking out in all directions…. If you look closer, you will find there is a logic to this mélange of equipment. You can make sense of it. The substation has inputs and outputs, and with a little study you can trace the pathways between them.

If I were writing that passage now, I would hedge or soften my claim that an electrical substation will yield its secrets to casual observation. Each morning in Baltimore I spent a few minutes peering into the Greene Street enclosure. I was able to identify all the major pieces of equipment in the open-air part of the station, and I know their basic functions. But making sense of the circuitry, finding the logic in the arrangement of devices, tracing the pathways from inputs to outputs—I have to confess, with a generous measure of chagrin, that I failed to solve the puzzle. I think I have the answers now, but finding them took more than eyeballing the hardware.

Basics first. A substation is not a generating plant. BGE does not “make” electricity here. The substation receives electric power in bulk from distant plants and repackages it for retail delivery. At Greene Street the incoming supply is at 115,000 volts (or 115 kV). The output voltage is about a tenth of that: 13.8 kV. How do I know the voltages? Not through some ingenious calculation based on the size of the insulators or the spacing between conductors. In an enlargement of one of my photos I found an identifying plate with the blurry and partially obscured but still legible notation “115/13.8 KV.”

The biggest hunks of machinery in the yard are the transformers *(photo below)*, which do the voltage conversion. Each transformer is housed in a steel tank filled with oil, which serves as both insulator and coolant. Immersed in the oil bath are coils of wire wrapped around a massive iron core. Stacks of radiator panels, with fans mounted underneath, help cool the oil when the system is under heavy load. A bed of crushed stone under the transformer is meant to soak up any oil leaks and reduce fire hazards.

Electricity enters and leaves the transformer through the ribbed gray posts, called bushings, mounted atop the casing. A bushing is an insulator with a conducting path through the middle. It works like the rubber grommet that protects the power cord of an appliance where it passes through the steel chassis. The high-voltage inputs attach to the three tallest bushings, with red caps; the low-voltage bushings, with dark gray caps, are shorter and more closely spaced. Notice that each high-voltage input travels over a single slender wire, whereas each low-voltage output has three stout conductors. That’s because reducing the voltage to one-tenth increases the current tenfold.

What about the three slender gray posts just to the left of the high-voltage bushings? They are lightning arresters, shunting sudden voltage surges into the earth to protect the transformer from damage.

Perhaps the most distinctive feature of this particular substation is what’s *not* to be seen. There are no tall towers carrying high-voltage transmission lines to the station. Clearing a right of way for overhead lines would be difficult and destructive in an urban center, so the high-voltage “feeders” run underground. In the photo at right, near the bottom left corner, a bundle of three metal-sheathed cables emerges from the earth. Each cable, about as thick as a human forearm, has a copper or aluminum conductor running down the middle, surrounded by insulation. I suspect these cables are insulated with layers of paper impregnated with oil under pressure; some of the other feeders entering the station may be of a newer design, with solid plastic insulation. Each cable plugs into the bottom of a ceramic bushing, which carries the current to a copper wire at the top. (You can tell the wire is copper because of the green patina.)

Connecting the feeder input to the transformer is a set of three hollow aluminum conductors called bus bars, held high overhead on steel stanchions and ceramic insulators. At both ends of the bus bars are mechanical switches that open like hinged doors to break the circuit. I don’t know whether these switches can be opened when the system is under power or whether they are just used to isolate components for maintenance after a feeder has been shut down. Beyond the bus bars, and hidden behind a concrete barrier, we can glimpse the bushings atop a different kind of switch, which I’ll return to below.

At this point you might be asking, why does everything come in sets of three—the bus bars, the feeder cables, the terminals on the transformer? It’s because electric power is distributed as three-phase alternating current. Each conductor carries a voltage oscillating at 60 Hertz, with the three waves offset by one-third of a cycle. If you recorded the voltage between each of the three pairs of conductors *(AB, AC, BC)*, you’d see a waveform like the one above at left.

At the other end of the conducting pathway, connected to three more bus bars on the low-voltage side of the transformer, is an odd-looking stack of three large drums. These

are current-limiting reactors (no connection with nuclear reactors). They are coils of thick conductors wound on a stout concrete armature. Under normal operating conditions they have little effect on the transmission of power, but in the milliseconds following a short circuit, the sudden rush of current generates a strong magnetic field in the coils, absorbing the energy of the fault current and preventing damage to other equipment.

So those are the main elements of the substation I was able to spot from my hotel window. They all made sense individually, and yet I realized over the course of a few days that I didn’t really understand how it all works together. My doubts are easiest to explain with the help of a bird’s eye view of the substation layout, cribbed from Google Maps:

My window vista was from off to the right, beyond the eastern edge of the compound. In the Google Maps view, the underground 115 kV feeders enter at the bottom or southern edge, and power flows northward through the transformers and the reactor coils, finally entering the building that occupies the northeast corner of the lot. Neither Google nor I can see inside this windowless building, but I know what’s in there, in a general way. That’s where the low-voltage (13.8 kV) distribution lines go underground and fan out to their various destinations in the neighborhood.

Let’s look more closely at the outdoor equipment. There are four high-voltage feeders, four transformers, and four sets of reactor coils. Apart from minor differences in geometry (and one newer-looking, less rusty transformer), these four parallel pathways all look alike. It’s a symmetric four-lane highway. Thus my first hypothesis was that four independent 115 kV feeders supply power to the station, presumably bringing it from larger substations and higher-voltage transmission lines outside the city.

However, something about the layout continued to bother me. If we label the four lanes of the highway from left to right, then on the high-voltage side, toward the bottom of the map view, it looks like there’s something connecting lanes 1 and 2 and, and there’s a similar link between lanes 3 and 4. From my hotel window the view of this device is blocked by a concrete barricade, and unfortunately the Google Maps image does not show it clearly either. (If you zoom in for a closer view, the goofy Google compression algorithm will turn the scene into a dreamscape where all the components have been draped in Saran Wrap.) Nevertheless, I’m quite sure of what I’m looking at. The device connecting the pairs of feeders is a high-voltage three-phase switch, or circuit breaker, something like the ones seen in the image at right (photographed at another substation, in Missouri.) The function of this device is essentially the same as that of a circuit breaker in your home electrical panel. You can turn it off manually to shut down a circuit, but it may also “trip” automatically in response to an overload or a short circuit. The concrete barriers flanking the two high-voltage breakers at Greene Street hint at one of the problems with such switches. Interrupting a current of hundreds of amperes at more than 100,000 volts is like stopping a runaway truck: It requires absorbing a lot of energy. The switch does not always survive the experience.

When I first looked into the Greene Street substation, I was puzzled by the *absence* of breakers at the input end of each main circuit. I expected to see them there to protect the transformers and other components from overloads or lightning strikes. I think there are breakers on the low-voltage side, tucked in just behind the transformers and thus not clearly visible from my window. But there’s nothing on the high side. I could only guess that such protection is provided by breakers near the output of the next substation upstream, the one that sends the 115 kV feeders into Greene Street.

That leaves the question of why pairs of circuits within the substation are cross-linked by breakers. I drew a simplified diagram of how things are wired up:

Two adjacent 115 kV circuits run from bottom to top; the breaker between them connects corresponding conductors—left to left, middle to middle, right to right. But what’s the point of doing so?

I had some ideas. If one transformer were out of commission, the pathway through the breaker could allow power to be rerouted through the remaining transformer (assuming it could handle the extra load). Indeed, maybe the entire design simply reflects a high level of redundancy. There are four incoming feeders and four transformers, but perhaps only two are expected to operate at any given time. The breaker provides a means of switching between them, so that you could lose a circuit (or maybe two) and still keep all the lights on. After all, this is a substation supplying power to many large facilities—the convention center (where the math meetings were held), a major hospital, large hotels, the ball park, theaters, museums, high-rise office buildings. Reliability is important here.

After further thought, however, this scheme seemed highly implausible. There are other substation layouts that would allow any of the four feeders to power any of the four transformers, allowing much greater flexibility in handling failures and making more efficient use of all the equipment. Linking the incoming feeders in pairs made no sense.

I would love to be able to say that I solved this puzzle on my own, just by dint of analysis and deduction, but it’s not true. When I got home and began looking at the photographs, I was still baffled. The answer eventually came via Google, though it wasn’t easy to find. Before revealing where I went wrong, I’ll give a couple of hints, which might be enough for you to guess the answer.

Hint 1. I was led astray by a biased sample. I am much more familiar with substations out in the suburbs or the countryside, partly because they’re easier to see into. Most of them are surrounded by a chain-link fence rather than a brick wall. But country infrastructure differs from the urban stuff.

Hint 2. I was also fooled by geometry when I should have been thinking about topology. To understand what you’re seeing in the Greene Street compound, you have to get beyond individual components and think about how it’s all connected to the rest of the network.

The web offers marvelous resources for the student of infrastructure, but finding them can be a challenge. You might suppose that the BGE website would have a list of the company’s facilities, and maybe a basic tutorial on where Baltimore’s electricity comes from. There’s nothing of the sort (although the utility’s parent company does offer thumbnail descriptions of some of their generating plants). Baltimore City websites were a little more helpful—not that they explained any details of substation operation, but they did report various legal and regulatory filings concerned with proposals for new or updated facilities. From those reports I learned the names of several BGE installations, which I could take back to Google to use as search terms.

One avenue I pursued was figuring out where the high-voltage feeders entering Greene Street come from. I discovered a substation called Pumphrey about five miles south of the city, near BWI airport, which seemed to be a major nexus of transmission lines. In particular, four 115 kV feeders travel north from Pumphrey to a substation in the Westport neighborhood, which is about a mile south of downtown. The Pumphrey-Westport feeders are overhead lines, and I had seen them already. Their right of way parallels the light rail route I had taken into town from the airport. Beyond the Westport substation, which is next to a light rail stop of the same name, the towers disappear. An obvious hypothesis is that the four feeders dive underground at Westport and come up at Greene Street. This guess was partly correct: Power does reach Greene Street from Westport, but not exclusively.

At Westport BGE has recently built a small, gas-fired generating plant, to help meet peak demands. The substation is also near the Baltimore RESCO waste-to-energy power plant *(photo above)*, which has become a local landmark. (It’s the only garbage burner I know that turns up on postcards sold in tourist shops.) Power from both of these sources could also make its way to the Greene Street substation, via Westport.

I finally began to make sense of the city’s wiring diagram when I stumbled upon some documents published by the PJM Interconnection, the administrator and coordinator of the power “pool” in the mid-Atlantic region. PJM stands for Pennsylvania–New Jersey–Maryland, but it covers a broader territory, including Delaware, Ohio, West Virginia, most of Virginia, and parts of Kentucky, Indiana, Michigan, and Illinois. Connecting to such a pool has important advantages for a utility. If an equipment failure means you can’t meet your customers’ demands for electricity, you can import power from elsewhere in the pool to make up the shortage; conversely, if you have excess generation, you can sell the power to another utility. The PJM supervises the market for such exchanges.

The idea behind power pooling is that neighbors can prop each other up in times of trouble; however, they can also knock each other down. As a condition of membership in the pool, utilities have to maintain various standards for engineering and reliability. PJM committees review plans for changes or additions to a utility’s network. It was a set of Powerpoint slides prepared for one such committee that first alerted me to my fundamental misconception. One of the slides included the map below, tracing the routes of 115 kV feeders *(green lines)* in and around downtown Baltimore.

I had been assuming—even though I should have known better—that the distribution network is essentially treelike, with lines radiating from each node to other nodes but never coming back together. For low-voltage distribution lines in sparsely settled areas, this assumption is generally correct. If you live in the suburbs or in a small town, there is one power line that runs from the local substation to your neighborhood; if a tree falls on it, you’re in the dark until the problem is fixed. There is no alternative route of supply. But that is *not* the topology of higher-voltage circuits. The Baltimore network consists of rings, where power can reach most nodes by following either of two pathways.

In the map we can see the four 115 kV feeders linking Pumphrey to Westport. From Westport, two lines run due north to Greene Street, then make a right turn to another station named Concord Street.

This double-ring architecture calls for a total reinterpretation of how the Greene Street substation works. I had imagined the four 115 kV inputs as four lanes of one-way traffic, all pouring into the substation and dead-ending in the four transformers. In reality we have just two roadways, both of which enter the substation and then leave again, continuing on to further destinations. And they are not one-way; they can both carry traffic in either direction. The transformers are like exit ramps that siphon off a portion of the traffic while the main stream passes by.

At Greene Street, two of the underground lines entering the compound come from Westport, but the other two proceed to Concord Street, the next station around the ring. What about the breakers that sit between the incoming and outgoing branches of each circuit? They open up the ring to isolate any section that experiences a serious failure. For example, a short circuit in one of the cables running between Greene Street and Concord Street would cause breakers at both of those stations to open up, but both stations would continue to receive power coming around the other branch of the loop.

This revised interpretation was confirmed by another document made available by PJM, this one written by BGE engineers as an account of their engineering practices for transmission lines and substations. It includes a schematic diagram of a typical downtown Baltimore substation. The diagram makes no attempt to reproduce the geometric layout of the components; it rearranges them to make the topology clearer.

The two 115 kV feeders that run through the substation are shown as horizontal lines; the solid black squares in the middle are the breakers that join the pairs of feeders and thereby close the two rings that run through all the downtown substations. The transformers are the W-shaped symbols at the ends of the branch lines.

A mystery remains. The symbol represents a disconnect switch, a rather simple mechanical device that generally cannot be operated when the power line is under load. The symbol is identified in the BGE document as a *circuit switcher*, a more elaborate device capable of interrupting a heavy current. In the Greene Street photos, however, the switches at the two ends of the high-voltage bus bars appear almost identical. I’m not seeing any circuit switchers there. But, as should be obvious by now, I’m capable of misinterpreting what I see.

Before digging into the dynamics, however, let us pause for a few words about the man himself, drawn largely from the obituaries in the *New York Times* and the *Harvard Crimson*.

Glauber was a member of the first class to graduate from the Bronx High School of Science, in 1941. From there he went to Harvard, but left in his sophomore year, at age 18, to work in the theory division at Los Alamos, where he helped calculate the critical mass of fissile material needed for a bomb. After the war he finished his degree at Harvard and went on to complete a PhD under Julian Schwinger. After a few brief adventures in Princeton and Pasadena, he was back at Harvard in 1952 and never left. A poignant aspect of his life is mentioned briefly in a 2009 interview, where Glauber discusses the challenge of sustaining an academic career while raising two children as a single parent.

Here’s a glimpse of Glauber dynamics in action. Click the *Go* button, then try fiddling with the slider.

In the computer program that drives this animation, the slider controls a variable representing temperature. At high temperature (slide the control all the way to the right), you’ll see a roiling, seething mass of colored squares, switching rapidly and randomly between light and dark shades. There are no large-scale or long-lived structures.

What we’re looking at here is a simulation of a model of a ferromagnet—the kind of magnet that sticks to the refrigerator. The model was introduced almost 100 years ago by Wilhelm Lenz and his student Ernst Ising. They were trying to understand the thermal behavior of ferromagnetic materials such as iron. If you heat a block of magnetized iron above a certain temperature, called the Curie point, it loses all traces of magnetization. Slow cooling below the Curie point allows it to spontaneously magnetize again, perhaps with the poles in a different orientation. The onset of ferromagnetism at the Curie point is an abrupt phase transition.

Lenz and Ising created a stripped-down model of a ferromagnet. In the two-dimensional version shown here, each of the small squares represents the spin vector of an unpaired electron in an iron atom. The vector can point in either of two directions, conventionally called *up* and *down*, which for graphic convenience are represented by two contrasting colors. There are \(100 \times 100 = 10{,}000\) spins in the array. This would be a minute sample of a real ferromagnet. On the other hand, the system has \(2^{10{,}000}\) possible states—quite an enormous number.

The essence of ferromagnetism is that adjacent spins “prefer” to point in the same direction. To put that more formally: The energy of neighboring spins is lower when they are parallel, rather than antiparallel. For the array as a whole, the energy is minimized if all the spins point the same way, either up or down. Each spin contributes a tiny magnetic moment. When the spins are parallel, all the moments add up and the system is fully magnetized.

If energy were the only consideration, the Ising model would always settle into a magnetized configuration, but there is a countervailing influence: Heat tends to randomize the spin directions. At infinite temperature, thermal fluctuations completely overwhelm the spins’ tendency to align, and all states are equally likely. Because the vast majority of those \(2^{10{,}000}\) configurations have nearly equal numbers of *up* and *down* spins, the magnetization is negligible. At zero temperature, nothing prevents the system from condensing into the fully magnetized state. The interval between these limits is a battleground where energy and entropy contend for supremacy. Clearly, there must be a transition of some kind. For Lenz and Ising in the 1920s, the crucial question was whether the transition comes at a sharply defined critical temperature, as it does in real ferromagnets. A more gradual progression from one regime to the other would signal the model’s failure to capture important aspects of ferromagnet physics.

In his doctoral dissertation Ising investigated the one-dimensional version of the model—a chain or ring of spins, each one holding hands with its two nearest neighbors. The result was a disappointment: He found no abrupt phase transition. And he speculated that the negative result would also hold in higher dimensions. The Ising model seemed to be dead on arrival.

It was revived a decade later by Rudolf Peierls, who gave suggestive evidence for a sharp transition in the two-dimensional lattice. Then in 1944 Lars Onsager “solved” the two-dimensional model, showing that the phase transition does exist. The phase diagram looks like this:

As the system cools, the salt-and-pepper chaos of infinite temperature evolves into a structure with larger blobs of color, but the *up* and *down* spins remain balanced on average (implying zero magnetization) down to the critical temperature \(T_C\). At that point there is a sudden bifurcation, and the system will follow one branch or the other to full magnetization at zero temperature.

If a model is classified as *solved*, is there anything more to say about it? In this case, I believe the answer is yes. The solution to the two-dimensional Ising model gives us a prescription for calculating the probability of seeing any given configuration at any given temperature. That’s a major accomplishment, and yet it leaves much of the model’s behavior unspecified. The solution defines the probability distribution at equilibrium—after the system has had time to settle into a statistically stable configuration. It doesn’t tell us anything about how the lattice of spins reaches that equilibrium when it starts from an arbitrary initial state, or how the system evolves when the temperature changes rapidly.

It’s not just the solution to the model that has a few vague spots. When you look at the finer details of how spins interact, the model itself leaves much to the imagination. When a spin reacts to the influence of its nearest neighbors, and those neighbors are also reacting to one another, does everything happen all at once? Suppose two antiparallel spins both decide to flip at the same time; they will be left in a configuration that is still antiparallel. It’s hard to see how they’ll escape repeating the same dance over and over, like people who meet head-on in a corridor and keep making mirror-image evasive maneuvers. This kind of standoff can be avoided if the spins act sequentially rather than simultaneously. But if they take turns, how do they decide who goes first?

Within the intellectual traditions of physics and mathematics, these questions can be dismissed as foolish or misguided. After all, when we look at the procession of the planets orbiting the sun, or at the colliding molecules in a gas, we don’t ask who takes the first step; the bodies are all in continuous and simultaneous motion. Newton gave us a tool, calculus, for understanding such situations. If you make the steps small enough, you don’t have to worry so much about the sequence of marching orders.

However, if you want to write a computer program simulating a ferromagnet (or simulating planetary motions, for that matter), questions of sequence and synchrony cannot be swept aside. With conventional computer hardware, “let everything happen at once” is not an option. The program must consider each spin, one at a time, survey the surrounding neighborhood, apply an update rule that’s based on both the state of the neighbors and the temperature, and then decide whether or not to flip. Thus the program must choose a sequence in which to visit the lattice sites, as well as a sequence in which to visit the neighbors of each site, and those choices can make a difference in the outcome of the simulation. So can other details of implementation. Do we look at all the sites, calculate their new spin states, and then update all those that need to be flipped? Or do we update each spin as we go along, so that spins later in the sequence will see an array already modified by earlier actions? The original definition of the Ising model is silent on such matters, but the programmer must make a commitment one way or another.

This is where Glauber dynamics enters the story. Glauber presented a version of the Ising model that’s somewhat more explicit about how spins interact with one another and with the “heat bath” that represents the influence of temperature. It’s a theory of Ising *dynamics* because he describes the spin system not just at equilibrium but also during transitional stages. I don’t know if Glauber was the first to offer an account of Ising dynamics, but the notion was certainly not commonplace in 1963.

There’s no evidence Glauber was thinking of his method as an algorithm suitable for computer implementation. The subject of simulation doesn’t come up in his 1963 paper, where his primary aim is to find analytic expressions for the distribution of *up* and *down* spins as a function of time. (He did this only for the one-dimensional model.) Nevertheless, Glauber dynamics offers an elegant approach to programming an interactive version of the Ising model. Assume we have a lattice of \(N\) spins. Each spin \(\sigma\) is indexed by its coordinates \(x, y\) and takes on one of the two values \(+1\) and \(-1\). Thus flipping a spin is a matter of multiplying \(\sigma\) by \(-1\). The algorithm for a updating the lattice looks like this:

Repeat \(N\) times:

- Choose a spin \(\sigma_{x, y}\) at random.
- Sum the values of the four neighboring spins, \(S = \sigma_{x+1, y} + \sigma_{x-1, y} + \sigma_{x, y+1} + \sigma_{x, y-1}\). The possible values of \(S\) are \(\{-4, -2, 0, +2, +4\}\).
- Calculate \(\Delta E = 2 \, \sigma_{x, y} \, S\), the change in interaction energy if \(\sigma_{x, y}\) were to flip.
- If \(\Delta E \lt 0\), set \(\sigma_{x, y} = -\sigma_{x, y}\).
- Otherwise, set \(\sigma_{x, y} = -\sigma_{x, y}\) with probability \(\exp(-\Delta E/T)\), where \(T\) is the temperature.
Display the updated lattice.

Step 4 says: If flipping a spin will reduce the overall energy of the system, flip it. Step 5 says: Even if flipping a spin raises the energy, go ahead and flip it in a randomly selected fraction of the cases. The probability of such spin flips is the Boltzmann factor \(\exp(-\Delta E/T)\). This quantity goes to \(0\) as the temperature \(T\) falls to \(0\), so that energetically unfavorable flips are unlikely in a cold lattice. The probability approaches \(1\) as \(T\) goes to infinity, which is why the model is such a seething mass of fluctuations at high temperature.

(If you’d like to take a look at real code rather than pseudocode—namely the JavaScript program running the simulation above—it’s on GitHub.)

Glauber dynamics belongs to a family of methods called Markov chain Monte Carlo algorithms (MCMC). The idea of Markov chains was an innovation in probability theory in the early years of the 20th century, extending classical probability to situations where the the next event depends on the current state of the system. Monte Carlo algorithms emerged at post-war Los Alamos, not long after Glauber left there to resume his undergraduate curriculum. He clearly kept up with the work of Stanislaw Ulam and other former colleagues in the Manhattan Project.

Within the MCMC family, the distinctive feature of Glauber dynamics is choosing spins at random. The obvious alternative is to march methodically through the lattice by columns and rows, examining every spin in turn. That procedure can certainly be made to work, but it requires care in implementation. At low temperature the Ising process is very nearly deterministic, since unfavorable flips are extremely rare. When you combine a deterministic flip rule with a deterministic path through the lattice, it’s easy to get trapped in recurrent patterns. For example, a subtle bug yields the same configuration of spins on every step, shifted left by a single lattice site, so that the pattern seems to slide across the screen. Another spectacular failure gives rise to a blinking checkerboard, where every spin is surrounded by four opposite spins and flips on every time step. Avoiding these errors requires much fussy attention to algorithmic details. (My personal experience is that the first attempt is never right.)

Choosing spins by throwing random darts at the lattice turns out to be less susceptible to clumsy mistakes. Yet, at first glance, the random procedure seems to have hazards of its own. In particular, choosing 10,000 spins at random from a lattice of 10,000 sites does *not* guarantee that every site will be visited once. On the contrary, a few sites will be sampled six or seven times, and you can expect that 3,679 sites (that’s \(1/e \times 10{,}000)\) will not be visited at all. Doesn’t that bias distort the outcome of the simulation? No, it doesn’t. After many iterations, all the sites will get equal attention.

The nasty bit in all Ising simulation algorithms is updating pairs of adjacent sites, where each spin is the neighbor of the other. Which one goes first, or do you try to handle them simultaneously? The column-and-row ordering maximizes exposure to this problem: Every spin is a member of such a pair. Other sequential algorithms—for example, visiting all the black squares of a checkerboard followed by all the white squares—avoid these confrontations altogether, never considering two adjacent spins in succession. Glauber dynamics is the Goldilocks solution. Pairs of adjacent spins do turn up as successive elements in the random sequence, but they are rare events. Decisions about how to handle them have no discernible influence on the outcome.

Years ago, I had several opportunities to meet Roy Glauber. Regrettably, I failed to take advantage of them. Glauber’s office at Harvard was in the Lyman Laboratory of Physics, a small isthmus building connecting two larger halls. In the 1970s I was a frequent visitor there, pestering people to write articles for *Scientific American*. It was fertile territory; for a few years, the magazine found more authors per square meter in Lyman Lab than anywhere else in the world. But I never knocked on Glauber’s door. Perhaps it’s just as well. I was not yet equipped to appreciate what he had to say.

Now I can let him have the last word. This is from the introduction to the paper that introduced Glauber dynamics:

]]>If the mathematical problems of equilibrium statistical mechanics are great, they are at least relatively well-defined. The situation is quite otherwise in dealing with systems which undergo large-scale changes with time. The principles of nonequilibrium statistical mechanics remain in largest measure unformulated. While this lack persists, it may be useful to have in hand whatever precise statements can be made about the time-dependent hehavior of statistical systems, however simple they may be.

Last spring a pair of robins built two-and-a-half nests on a sheltered beam just outside my office door. They raised two chicks that fledged by the end of June, and then two more in August. Both clutches of eggs were incubated in the same nest *(middle photo below)*, which was pretty grimy by the end of the season. A second nest *(upper photo)* served as a hangout for the nonbrooding parent. I came to think of it as the man-cave, although I’m not at all sure about the sex of those birds. As for the half nest, I don’t know why that project was abandoned, or why it was started in the first place.

Elsewhere, a light fixture in the carport has served as a nesting platform for a phoebe each summer we’ve lived here. Is it the same bird every year? I like to think so, but if I can’t even identify a bird’s sex I have little hope of recognizing individuals. This year, after the tenant decamped, I discovered an egg that failed to hatch.

We also had house wrens in residence—noisy neighbors, constantly partying or quarreling, I can never tell the difference. It was like living next to a frat house. I have no photo of their dwelling: It fell apart in my hands.

Under the eaves above our front door we hosted several small colonies of paper wasps. All summer I watched the slow growth of these structures with their appealing symmetries and their equally interesting imperfections. (Skilled labor shortage? Experiments in noneuclidean geometry?) I waited until after the first frost to cut down the nests, thinking they were abandoned, but I discovered a dozen moribund wasps still huddling behind the largest apartment block. They were probably fertile females looking for a place to overwinter. If they survive, they’ll likely come back to the same spot next year—or so I’ve learned from Howard E. Evans, my go-to source of wasp wisdom.

Another mysterious dwelling unit clung to the side of a rafter in the carport. It was a smooth, fist-size hunk of mud with no visible entrances or exits. When I cracked it open, I found several hollow chambers, some empty, some occupied by decomposing larvae or prey. Last year in the same place we had a few delicate tubes built by mud-dauber wasps, but this one is an industrial-strength creation I can’t identify. Any ideas?

The friends I’ll miss most are not builders but squatters. All summer we have shared our back deck with a population of minifrogs—often six or eight at a time—who took up residence in tunnel-like spaces under flowerpots. In nice weather they would join us for lunch alfresco.

As of today two frogs are still hanging on, and I worry they will freeze in place. I should move the flowerpots, I think, but it seems so inhospitable.

May everyone return next year.

]]>I had believed such a catastrophe was all but impossible. The natural gas industry has many troubles, including chronic leaks that release millions of tons of methane into the atmosphere, but I had thought that pressure regulation was a solved problem. Even if someone turned the wrong valve, failsafe mechanisms would protect the public. Evidently not. (I am not an expert on natural gas. While working on my book *Infrastructure*, I did some research on the industry and the technology, toured a pipeline terminal, and spent a day with a utility crew installing new gas mains in my own neighborhood. The pages of the book that discuss natural gas are online here.)

The hazards of gas service were already well known in the 19th century, when many cities built their first gas distribution systems. Gas in those days was not “natural” gas; it was a product manufactured by roasting coal, or sometimes the tarry residue of petroleum refining, in an atmosphere depleted of oxygen. The result was a mixture of gases, including methane and other hydrocarbons but also a significant amount of carbon monoxide. Because of the CO content, leaks could be deadly even if the gas didn’t catch fire.

Every city needed its own gasworks, because there were no long-distance pipelines. The output of the plant was accumulated in a gasholder, a gigantic tank that confined the gas at low pressure—less than one pound per square inch above atmospheric pressure (a unit of measure known as pounds per square inch gauge, or psig). The gas was gently wafted through pipes laid under the street to reach homes at a pressure of 1/4 or 1/2 psig. Overpressure accidents were unlikely because the entire system worked at the same modest pressure. As a matter of fact, the greater risk was underpressure. If the flow of gas was interrupted even briefly, thousands of pilot lights would go out; then, when the flow resumed, unburned toxic gas would seep into homes. Utility companies worked hard to ensure that would never happen.

Gas technology has evolved a great deal since the gaslight era. Long-distance pipelines carry natural gas across continents at pressures of 1,000 psig or more. At the destination, the gas is stored in underground cavities or as a cryogenic liquid. It enters the distribution network at pressures in the neighborhood of 100 psig. The higher pressures allow smaller diameter pipes to serve larger territories. But the pressure must still be reduced to less than 1 psig before the gas is delivered to the customer. Having multiple pressure levels complicates the distribution system and requires new safeguards against the risk of high-pressure gas going where it doesn’t belong. Apparently those safeguards didn’t work last month in the Merrimack valley.

The gas system in that part of Massachusetts is operated by Columbia Gas, a subsidiary of a company called NiSource, with headquarters in Indiana. At the time of the conflagration, contractors for Columbia were upgrading distribution lines in the city of Lawrence and in two neighboring towns, Andover and North Andover. The two-tier system had older low-pressure mains—including some cast-iron pipes dating back to the early 1900s—fed by a network of newer lines operating at 75 psig. Fourteen regulator stations handled the transfer of gas between systems, maintaining a pressure of 1/2 psig on the low side.

The NTSB preliminary report gives this account of what happened around 4 p.m. on September 13:

The contracted crew was working on a tie-in project of a new plastic distribution main and the abandonment of a cast-iron distribution main. The distribution main that was abandoned still had the regulator sensing lines that were used to detect pressure in the distribution system and provide input to the regulators to control the system pressure. Once the contractor crews disconnected the distribution main that was going to be abandoned, the section containing the sensing lines began losing pressure.

As the pressure in the abandoned distribution main dropped about 0.25 inches of water column (about 0.01 psig), the regulators responded by opening further, increasing pressure in the distribution system. Since the regulators no longer sensed system pressure they fully opened allowing the full flow of high-pressure gas to be released into the distribution system supplying the neighborhood, exceeding the maximum allowable pressure.

When I read those words, I groaned. The cause of the accident was not a leak or an equipment failure or a design flaw or a worker turning the wrong valve. The pressure didn’t just creep up beyond safe limits while no one was paying attention; the pressure was *driven* up by the automatic control system meant to keep it in bounds. The pressure regulators were “trying” to do the right thing. Sensor readings told them the pressure was falling, and so the controllers took corrective action to keep the gas flowing to customers. But the feedback loop the regulators relied on was not in fact a loop. They were measuring pressure in one pipe and pumping gas into another.

The NTSB’s preliminary report offers no conclusions or recommendations, but it does note that the contractor in Lawrence was following a “work package” prepared by Columbia Gas, which did not mention moving or replacing the pressure sensors. Thus if you’re looking for someone to blame, there’s a hint about where to point your finger. The clue is less useful, however, if you’re hoping to understand the disaster and prevent a recurrence. “Make sure all the parts are connected” is doubtless a good idea, but better still is building a failsafe system that will not burn the city down when somebody goofs.

Suppose you’re taking a shower, and the water feels too warm. You nudge the mixing valve toward cold, but the water gets hotter still. When you twist the valve a little further in the same direction, the temperature rises again, and the room fills with steam. In this situation, you would surely not continue turning the knob until you were scalded. At some point you would get out of the shower, shut off the water, and investigate. Maybe the controls are mislabeled. Maybe the plumber transposed the pipes.

Since you do so well controlling the shower, let’s put you in charge of regulating the municipal gas service. You sit in a small, windowless room, with your eyes on a pressure gauge and your hand on a valve. The gauge has a pointer indicating the measured pressure in the system, and a red dot (called a bug) showing the desired pressure, or set point. If the pointer falls below the bug, you open the valve a little to let in more gas; if the pointer drifts up too high, you close the valve to reduce the flow. (Of course there’s more to it than just open and close. For a given deviation from the set point, *how far* should you twist the valve handle? Control theory answers this question.)

It’s worth noting that you could do this job without any knowledge of what’s going on outside the windowless room. You needn’t give a thought to the nature of the “plant,” the system under control. What you’re controlling is the position of the needle on the gauge; the whole gas distribution network is just an elaborate mechanism for linking the valve you turn with the gauge you watch. Many automatic control system operate in exactly this mindless mode. And they work fine—until they don’t.

As a sentient being, you *do* in fact have a mental model of what’s happening outside. Just as the control law tells you how to respond to changes in the state of the plant, your model of the world tells you how the plant should respond to your control actions. For example, when you open the valve to increase the inflow of gas, you expect the pressure to increase. (Or, in some circumstances, to decrease more slowly. In any event, the sign of the second derivative should be positive.) If that doesn’t happen, the control law would call for making an even stronger correction, opening the valve further and forcing still more gas into the pipeline. But you, in your wisdom, might pause to consider the possible causes of this anomaly. Perhaps pressure is falling because a backhoe just ruptured a gas main. Or, as in Lawrence last month, maybe the pressure isn’t actually falling at all; you’re looking at sensors plugged into the wrong pipes. Opening the valve further could make matters worse.

Could we build an automatic control system with this kind of situational awareness? Control theory offers many options beyond the simple feedback loop. We might add a supervisory loop that essentially controls the controller and sets the set point. And there is an extensive literature on *predictive control*, where the controller has a built-in mathematical model of the plant, and uses it to find the best trajectory from the current state to the desired state. But neither of these techniques is commonly used for the kind of last-ditch safety measures that might have saved those homes in the Merrimack Valley. More often, when events get too weird, the controller is designed to give up, bail out, and leave it to the humans. That’s what happened in Lawrence.

Minutes before the fires and explosions occurred, the Columbia Gas monitoring center in Columbus, Ohio [probably a windowless room], received two high-pressure alarms for the South Lawrence gas pressure system: one at 4:04 p.m. and the other at 4:05 p.m. The monitoring center had no control capability to close or open valves; its only capability was to monitor pressures on the distribution system and advise field technicians accordingly. Following company protocol, at 4:06 p.m., the Columbia Gas controller reported the high-pressure event to the Meters and Regulations group in Lawrence. A local resident made the first 9-1-1 call to Lawrence emergency services at 4:11 p.m.

Columbia Gas shut down the regulator at issue by about 4:30 p.m.

I admit to a morbid fascination with stories of technological disaster. I read NTSB accident reports the way some people consume murder mysteries. The narratives belong to the genre of tragedy. In using that word I don’t mean just that the loss of life and property is very sad. These are stories of people with the best intentions and with great skill and courage, who are nonetheless overcome by forces they cannot master. The special pathos of *technological* tragedies is that the engines of our destruction are machines that we ourselves design and build.

Looking on the sunnier side, I suspect that technological tragedies are more likely than *Oedipus Rex* or *Hamlet* to suggest a practical lesson that might guide our future plans. Let me add two more examples that seem to have plot elements in common with the Lawrence gas disaster.

First, the meltdown at the Three Mile Island nuclear power plant in 1979. In that event, a maintenance mishap was detected by the automatic control system, which promptly shut down the reactor, just as it was supposed to do, and started emergency pumps to keep the uranium fuel rods covered with cooling water. But in the following minutes and hours, confusion reigned in the control room. Because of misleading sensor readings, the crowd of operators and engineers believed the water level in the reactor was too high, and they struggled mightily to lower it. Later they realized the reactor had been running dry all along.

Second, the crash of Air France 447, an overnight flight from Rio de Janeiro to Paris, in 2009. In this case the trouble began when ice at high altitude clogged pitot tubes, the sensors that measure airspeed. With inconsistent and implausible speed inputs, the autopilot and flight-management systems disengaged and sounded an alarm, basically telling the pilots “You’re on your own here.” Unfortunately, the pilots also found the instrument data confusing, and formed the erroneous opinion that they needed to pull the nose up and climb steeply. The aircraft entered an aerodynamic stall and fell tail-first into the ocean with the loss of all on board.

In these events no mechanical or physical fault made an accident inevitable. In Lawrence the pipes and valves functioned normally, as far as I can tell from press reports and the NTSB report. Even the sensors were working; they were just in the wrong place. At Three Mile Island there were multiple violations of safety codes and operating protocols; nevertheless, if either the automatic or the human controllers had correctly diagnosed the problem, the reactor would have survived. And the Air France aircraft over the Atlantic was airworthy to the end. It could have flown on to Paris if only there had been the means to level the wings and point it in the right direction.

All of these events feel like unnecessary disasters—if we were just a little smarter, we could have avoided them—but the fires in Lawrence are particularly tormenting in this respect. With an aircraft 35,000 feet over the ocean, you can’t simply press *Pause* when things don’t go right. Likewise a nuclear reactor has no safe-harbor state; even after you shut down the fission chain reaction, the core of the reactor generates enough heat to destroy itself. But Columbia Gas faced no such constraints in Lawrence. Even if the pressure-regulating system is not quite as simple as I have imagined it, there is always an escape route available when parameters refuse to respond to control inputs. You can just shut it all down. Safeguards built into the automatic control system could do that a lot more quickly than phone calls from Ohio. The service interruption would be costly for the company and inconvenient for the customers, but no one would lose their home or their life.

Control theory and control engineering are now embarking on their greatest adventure ever: the design of self-driving cars and trucks. Next year we may see the first models without a steering wheel or a brake pedal—there goes the option of asking the driver (passenger?) to take over. I am rooting for this bold undertaking to succeed. I am also reminded of a term that turns up frequently in discussions of Athenian tragedy: hubris.

]]>In psychology and literature, this kind of mental rambling is called *stream of consciousness*, a metaphor we owe to William James. It’s not the metaphor I would have chosen. My own consciousness, as I experience it, does not flow smoothly from one topic to the next but seems to flit across a landscape of ideas, more like a butterfly than a river, sometimes alighting daintily on one flower and then the next, sometimes carried away by gusts of wind, sometimes revisiting favorite spots over and over.

As a way of probing the architecture of my own memory, I have tried a more deliberate experiment in free association. I began with the same herbal recipe—parsley, sage, rosemary, and thyme—but for this exercise I wasn’t strolling through the garden spots of the Berkeley hills; I was sitting at a desk taking notes. The diagram below is my best effort at reconstructing the complete train of thought.

Scrolling through the chart from top to bottom reveals the items in the order my brain presented them to me, but the linkages between nodes do not form a single linear sequence. Instead the structure is treelike, with short chains of sequential associations ending with an abrupt return to an earlier node, as if I were being snapped back by a rubber band. These interruptions are marked in the diagram by green upward arrows; the red “X” at the bottom is where I decided to end the experiment.

My apologies to the half of humanity born since 1990, who will doubtless find many of the items mentioned in the diagram antiquated or obscure. You can hover over the labels for pop-up explanations, although I doubt they will make the associations any more meaningful. Memories, after all, are personal; they live inside your head. If you want a collection of ideas that resonate with your own experience, you’ll just have to create your own free-association diagram. I highly recommend it: You may discover something you didn’t know you knew.

The destination of my daily walk down the hill in Berkeley is the Simons Institute for the Theory of Computing, where I am immersed in a semester-long program on the Brain and Computation. It’s an environment that inspires thoughts about thoughts. I begin to wonder: What would it take to build a computational model of the free-association process? Among the various challenges proposed for artificial intelligence, this one looks easy. There’s no need for deep ratiocination; what we are asked to simulate is just woolgathering or daydreaming—what the mind does when it’s out of gear and the engine is idling. It ought to be effortless, no?

For the design of such a computational model, the first idea that comes to mind (at least to *my* mind) is a random walk on a mathematical graph, or network. The nodes of the network are things stored in memory—ideas, facts, events—and the links are various kinds of associations between them. For example, a node labeled *butterfly* might have links to *moth, caterpillar, monarch,* and *frittillary,* as well as the translations mentioned in the diagram above, and perhaps some less obvious connections, such as *Australian crawl, shrimp, Muhammad Ali, pellagra, throttle valve,* and *stage fright*. The data structure for a node of the network would include a list of pointers to all of these associated nodes. The pointers could be numbered from \(1\) to \(n\); the program would generate a pseudorandom number in this range, and jump to the corresponding node, where the whole procedure would start afresh.

This algorithm captures a few basic features of free association, but it also misses quite a lot. The model assumes that all destination nodes are equally likely, which is implausible. To accommodate differences in probability, we could give each link \(i\) a weight \(w_i\), then make the probabilities proportional to the weights.

A further complication is that the weights depend on context—on one’s recent history of mental activity. If it weren’t for the combination of Mrs. Robinson and Jackie Robinson, would I have thought of Joe DiMaggio? And now, as I write this, Joltin’ Joe brings to mind Marilyn Monroe, and then Arthur Miller, and I am helpless to stop another whole train of associations. Reproducing this effect in a computer model would require some mechanism for dynamically adjusting the probabilities of entire categories of nodes, depending on which other nodes have been visited lately.

Recency effects of another kind should also be taken into account. The rubber band that repeatedly yanks me back to Simon and Garfunkel and Mrs. Robinson needs to have a place in the model. Perhaps each recently visited node should be added to the list of candidate destinations even if it is not otherwise linked to the current node. On the other hand, habituation is also a possibility: Ideas revisited too often become tiresome, and so they need to be suppressed in the model.

One final challenge: Some memories are not isolated facts or ideas but parts of a story. They have a narrative structure, with events unfolding in chronological order. Nodes for such episodic memories require a *next* link, and maybe a *previous* link, too. That chain of links holds your whole life together, to the extent you remember it.

Could a computational model like this one reproduce my mental meanderings? Gathering data for the model would be quite a chore, but that’s no surprise, since it has taken me a lifetime to fill my cranium with that jumble of herbs, Herbs, Simons, Robinsons, and Hoffmans. More worrisome than the volume of data is the fiddly nature of the graph-walking algorithm. It’s easy to say, “Pick a node according to a set of weighted probabilities,” but when I look at the gory details of how it’s done, I have a hard time imagining anything like that happening in the brain.

Here’s the simplest algorithm I know for random weighted selection.

In code—specifically in the Julia programming language—the node selection procedure looks like this:

```
function select_next(links, weights)
total = sum(weights)
cum_weights = cumsum(weights)
probabilities = cum_weights / total
x = rand()
for i in 1:length(probabilities)
if probabilities[i] >= x
return i
end
end
end
```

I have slogged through these tedious details of cumulative sums and pseudorandom numbers as a way of emphasizing that the graph-walking algorithm is not as simple as it seems on first glance. And we still haven’t dealt with the matter of adjusting the probabilities on the fly, as attention drifts from topic to topic.

Even harder to fathom is the process of learning—adding new nodes and links to the network. I ended my session of free associating when I came to a question I couldn’t answer: “What’s the Russian for butterfly?” But I *can* answer it now. The next time I play this game, I’ll add *babochka* to my list of butterfly terms. In the computational model, inserting a node for *babochka* is easy enough, but the new node also needs to be linked to all the other butterfly nodes already present. Furthermore, *babochka* would introduce additional links of its own. It’s phonetically close to *babushka* (grandmother), one of the few Russian words in my vocabulary. The *-ochka* suffix is a diminutive, so it needs to be associated with French *-ette* and Italian *-ini*. The literal meaning of *babochka* is “little soul,” which suggests still more associations. Ultimately, learning a single new word might require a full reindexing of an entire tree of knowledge.

Let’s try a different model. Forget about the random walk on a network, with its spaghetti tangle of pointers to nodes. Instead, let’s just try to keep all similar things in the same neighborhood. In the memory banks of a digital computer, that means similar things have to be stored at nearby addresses. Here’s a hypothetical segment of memory centered on the concept *dog*. The nearby slots are occupied by other words, things, and categories that are likely to be evoked by the thought of *dog*: the obvious *cat* and *puppy*, various breeds of dogs and a few individual dogs (Skippy was the family pet when I was a kid), and some quirkier possibilities. Each item has a numeric address. The address has no intrinsic meaning, but it’s important that all the memory cells are numbered sequentially.

address | content |
---|---|

19216805 | god |

19216806 | the dog that didn’t bark in the night |

19216807 | Skippy |

19216808 | Lassie |

19216809 | canine |

19216810 | cat |

19216811 | dog |

19216812 | puppy |

19216813 | wolf |

19216814 | cave canem |

19216815 | Basset Hound |

19216816 | Weimaraner |

19216817 | dogmatic |

A program for idly exploring this memory array could be quite simple. It would execute a random walk over the memory addresses, but with a bias in favor of small steps. For example, the next address to be visited might be determined by sampling from a normal distribution centered on the present location. Here’s the Julia code. (The function `randn()`

returns a random real number drawn from the normal distribution with mean \(\mu = 0\) and standard deviation \(\sigma = 1\).)

```
function gaussian_ramble(addr, 𝜎)
r = randn() * 𝜎
return addr + round(Int, r)
end
```

The scheme has some attractive features. There’s no need to tabulate all the possible destinations as a preliminary to choosing one of them. Probabilities are not stored as numbers but are encoded by position within the array, and further modulated by the parameter 𝜎, which determines how far afield the procedure is willing to reach in the array. Although the program is still doing some arithmetic in order to sample from a normal distribution, that function could probably be implemented in a simpler way.

But the procedure also has a dreadful defect. In surrounding *dog* with all of its immediate associates, we leave no room for *their* associates. The doggy terms are fine in their own context, but what about the *cat* in the list? Where do we put *kitten* and *tiger* and *nine lives* and *Felix*? In a one-dimensional array there’s no hope of embedding every memory within its own proper neighborhood.

So let’s shift to two dimensions! By splitting the addresses into two components, we can set up two orthogonal axes. The first half of each address becomes a \(y\) coordinate and the second half an \(x\) coordinate. Now *dog* and *cat* are still close neighbors, but they also have private spaces where they can play with their own friends.

However, two dimensions aren’t enough, either. If we try to fill in all the correlatives of *The Cat in the Hat*, they will inevitably collide and conflict with those of *the dog that didn’t bark in the night*. Evidently we need more dimensions—a lot more.

Now would be a good moment for me to acknowledge that I am not the first person ever to think about how memories could be organized in the brain. A list of my predecessors might start with Plato, who compared memory to an aviary; we recognize our memories by their plumage, but sometimes we have trouble retrieving them as they flutter about in the cranial cage. The 16th-century Jesuit Matteo Ricci wrote of a “memory palace,” where we stroll through various rooms and corridors in search of treasures from the past. Modern theories of memory tend to be less colorful than these but more detailed, aiming to move beyond metaphor to mechanism. My personal favorite is a mathematical model devised in the 1980s by Pentti Kanerva, who is now at the Redwood Center for Theoretical Neuroscience here in Berkeley. He calls the idea sparse distributed memory, which I’m going to abbreviate as SDM. It makes clever use of the peculiar geometry of high-dimensional spaces.

Think of a cube in three dimensions. If the side length is taken as one unit, then the eight vertices can be labeled by vectors of three binary digits, starting with \(000\) and continuing through \(111\). At any vertex, changing a single bit of the vector takes you to a nearest-neighbor vertex. Changing two bits moves you to a next-nearest-neighbor, and flipping all three bits leads to the opposite corner of the cube—the most distant vertex.

The four-dimensional cube works the same way, with \(16\) vertices labeled by vectors that include all patterns of binary digits from \(0000\) through \(1111\). And indeed the description generalizes to \(N\) dimensions, where each vertex has an \(N\)-bit vector of coordinates. If we measure distance by the Manhattan metric—always moving along the edges of the cube and never taking shortcuts across a diagonal—the distance between any two vertices is simply the number of positions where the two coordinate vectors differ (also known as the Hamming distance). *bun*. It reflects the interpretation of the XOR operation as binary addition modulo 2. Kanerva prefers ∗ or ⊗, on the grounds that the role of XOR in high-dimensional computing is more like multiplication than addition. I have decided to duck this controversy by adopting the symbol ⊻, an alternative notation for XOR common among logicians. It’s a modification of ∨, the symbol for inclusive OR. Conveniently, it’s also the XOR symbol in Julia programs.

```
0 ⊻ 0 = 0
0 ⊻ 1 = 1
1 ⊻ 0 = 1
1 ⊻ 1 = 0
```

A Julia function for measuring the distance between vertices applies the XOR function to the two coordinate vectors and counts the \(1\)s in the result.

```
function distance(u, v)
w = u ⊻ v
return count_ones(w)
end
```

As \(N\) grows large, some curious properties of the \(N\)-cube come into view. Consider the \(1{,}000\)-dimensional cube, which has \(2^{1000}\) vertices. If you choose two of those vertices at random, what is the expected distance between them? Even though this is a question about distance, we can answer it without delving into any geometric details; it’s simply a matter of tallying the positions where the two binary vectors differ. For random vectors, each bit is \(0\) or \(1\) with equal probability, and so the vectors can be expected to differ at half of the bit positions. In the case of a \(1{,}000\)-bit vector, the typical distance is \(500\) bits. This outcome is not a great surprise. What *does* seem noteworthy is the way all the vertex-to-vertex distances cluster tightly around the mean value of 500.

For \(1{,}000\)-bit vectors, almost all randomly chosen pairs lie at a distance between \(450\) and \(550\) bits. In a sample of \(100\) million random pairs *(see graph above)* none were closer than \(400\) bits or farther apart than \(600\) bits. Nothing about our life in low-dimensional space prepares us for this condensation of probability in the middle distance. Here on Earth, you might be able to find a place to stand where you’re all alone, and almost everyone else is several thousand miles away; however, there’s no way to arrange the planet’s population so that *everyone* has this experience simultaneously. But that’s the situation in \(1{,}000\)-dimensional space.

Needless to say, it’s hard to visualize a \(1{,}000\)-dimensional cube, but it’s possible to get a little intuition about the geometry from as few as five dimensions. Tabulated below are all the vertex coordinates of a five-dimensional unit cube, arranged according to their Hamming distance from the origin \(00000\). A majority of the vertices (20 out of 32) are at the middle distances of either two or three bits. The table would have the same shape if any other vertex were taken as the origin.

A serious objection to all this talk of \(1{,}000\)-dimensional cubes is that we’ll never build one; there aren’t enough atoms in the universe for a structure with \(2^{1000}\) parts. But Kanerva points out that we need storage locations only for the items that we actually want to store. We could construct hardware for a random sample of, say, \(10^8\) vertices (each with a \(1{,}000\)-bit address) and leave the rest of the cube as a ghostly, unbuilt infrastructure. Kanerva calls the subset of vertices that exist in hardware *hard locations*. A set of \(10^8\) random hard locations would still exhibit the same squeezed distribution of distances as the full cube; indeed, this is precisely what the graph above shows.

The relative isolation of each vertex in the high-dimensional cube hints at one possible advantage of sparse distributed memory: A stored item has plenty of elbow room, and can spread out over a wide area without disturbing the neighbors. This is indeed one distinguishing feature of SDM, but there’s more to it.

Conventional computer memory enforces a one-to-one mapping between addresses and stored data items. The addresses are consecutive integers in a fixed range, such as \([0, 2^{64})\). Every integer in this range refers to a single, distinct location in the memory, and every location is associated with exactly one address. Also, each location holds just one value at a time; writing a new value wipes out the old one.

SDM breaks all of these rules. It has a huge address space—at least \(2^{1000}\)—but only a tiny, random fraction of those locations exist as physical entities; this is why the memory is said to be *sparse*. A given item of information is not stored in just one memory location; multiple copies are spread throughout a region—hence *distributed*. Furthermore, each individual address can hold multiple data items simultaneously. Thus information is both smeared out over a broad area and smushed together at the same site. The architecture also blurs the distinction between memory addresses and memory content; in many cases, the pattern of bits to be stored acts as its own address. Finally, the memory can respond to a partial or approximate address and find the correct item with high probability. Where the conventional memory is an “exact match machine,” SDM is a “best match machine,” retrieving the item most similar to the requested one.

In his 1988 book Kanerva gives a detailed quantitative analysis of a sparse distributed memory with \(1{,}000\) dimensions and \(1{,}000{,}000\) hard locations. The hard locations are chosen randomly from the full space of \(2^{1000}\) possible address vectors. Each hard location has room to store multiple \(1{,}000\)-bit vectors. The memory as a whole is designed to hold at least \(10{,}000\) distinct patterns. In what follows I’m going to consider this the canonical SDM model, although it is small by mammalian standards, and in his more recent work Kanerva has emphasized vectors with at least \(10{,}000\) dimensions.

Here’s how the memory works, in a simple computer implementation. The command `store(X)`

writes the vector \(X\) into the memory, treating it as both address and content. The value \(X\) is stored in all the hard locations that lie within a certain distance of the address \(X\). For the canonical model this distance is 451 bits. It defines an “access circle” designed to encompass about \(1{,}000\) hard locations; in other words, each vector is stored in about \(1/1{,}000\)th of the million hard locations.

It’s important to note that the stored item \(X\) does not have to be chosen from among the \(1{,}000{,}000\) binary vectors that are addresses of hard locations. On the contrary, \(X\) can be any of the \(2^{1000}\) possible binary patterns.

Suppose a thousand copies of \(X\) have already been written into the SDM when a new item \(Y\) comes along, to be stored in its own set of a thousand hard locations. There might be some overlap between the two sets of locations—sites where both \(X\) and \(Y\) are stored. The later-arriving value does not overwrite or replace the earlier one; both values are retained. When the memory has been filled to its capacity of \(10{,}000\) vectors, each of them stored \(1{,}000\) times, a typical hard location will hold copies of \(10\) distinct patterns.

Now the question is: How can we make sense of this memory mélange? In particular, how can we retrieve the correct value of \(X\) without interference from \(Y\) and all the other items jumbled together in the same storage locations?

The readout algorithm makes essential use of the curious distance distribution in a high-dimensional space. Even if \(X\) and \(Y\) are nearest neighbors among the \(10{,}000\) stored patterns, they are likely to differ by 420 or 430 bits; as a result, the number of hard locations where both values are stored is quite small—typically four, five, or six. The same is true of all the other patterns overlapping \(X\). There are thousands of them, but no one interfering pattern is present in more than a handful of copies inside the access circle of \(X\).

The command `fetch(X)`

should return the value that was earlier written by `store(X)`

. The first step in reconstructing the value is to gather up all information stored within the 451-bit access circle centered on \(X\). Because \(X\) was previously written into all of these locations, we can be sure of getting back \(1{,}000\) copies of it. We’ll also receive about \(10{,}000\) copies of *other* vectors, stored in locations whose access circles overlap that of \(X\). But because the overlaps are small, each of these vectors is present in only a few copies. In the aggregate, then, each of their \(1{,}000\) bits is equally likely to be a \(0\) or a \(1\). If we apply a majority-rule function to all the data gathered at each bit position, the result will be dominated by the \(1{,}000\) copies of \(X\). The probability of getting any result other than \(X\) is about \(10^{-19}\).

The bitwise majority-rule procedure is shown in more detail below, for a toy example of five data vectors of 20 bits each. The output is another vector where each bit reflects the majority of the corresponding bits in the data vectors. (If the number of data vectors is even, ties are broken by choosing \(0\) or \(1\) at random.) An alternative writing-and-reading scheme, also illustrated below, forgoes storing all the patterns individually and instead keeps a tally of the number of \(0\) and \(1\) bits at each position. A hard location has a \(1{,}000\)-bit counter, initialized to all \(0\)s. When a pattern is written into the location, each bit counter is incremented for a \(1\) or decremented for a \(0\). The readout algorithm simply examines the sign of each bit counter, returning \(1\) for positive, \(0\) for negative, and a random value when the counter bit is \(0\).

The two storage schemes give identical results.

From a computer-engineering point of view, this version of sparse distributed memory looks like an elaborately contrived joke. To remember \(10{,}000\) items we need a million hard locations, in which we store a thousand redundant copies of every pattern. Then, in order to retrieve just one item from memory, we harvest data on \(11{,}000\) stored patterns and apply a subtle majority-rule mechanism to unscramble them. And all we accomplish through these acrobatic maneuvers is to retrieve a vector we already had. Conventional memory works with much less fuss: Both writing and reading access a single location.

But an SDM can do things the conventional memory can’t. In particular, it can retrieve information based on a partial or approximate cue. Suppose a vector \(Z\) is a corrupted version of \(X\), where \(100\) of the \(1{,}000\) bits have been altered. Because the two vectors are similar, the command `fetch(Z)`

will probe many of the same sites where \(X\) is stored. At a Hamming distance of 100, \(X\) and \(Z\) can be expected to share about 300 hard locations. Because of this extensive overlap, the vector returned by `fetch(Z)`

—call it \(Z^{\prime}\)—will be closer to \(X\) than \(Z\) is. Now we can repeat the process with the command `fetch(Z′)`

, which will return a result \(Z^{\prime\prime}\) even closer \(X\). After only a few iterations the procedure reaches \(X\) itself.

Kanerva shows that this converging sequence of recursive read operations will succeed with near certainty as long as the starting pattern is not too far from the target. In other words, there is a critical radius: Any probe of the memory starting at a location inside the critical circle will almost surely converge to the center, and do so rather quickly. An attempt to recover the stored item from outside the critical circle fails, as the recursive recall process wanders away into the middle distance. Kanerva’s analysis yields a critical radius of 209 bits for the canonical SDM. In other words, if you know roughly 80 percent of the bits, you can reconstruct the whole pattern.

The illustration below traces the evolution of recursive-recall sequences using initial cues that differ from a target \(X\) by \(0, 5, 10, 15 \dots 1{,}000\) bits. In this experiment all sequences starting at a distance of \(205\) or less converged to \(X\) in fewer than \(10\) iterations *(blue trails)*. All sequences starting at a greater initial distance wandered aimlessly through the huge open spaces of the \(1{,}000\)-dimensional cube, staying roughly 500 bits from anywhere.

The transition from convergent to divergent trajectories is not perfectly sharp, as shown in the bad-hair-day graphic below. Here we have zoomed in to look at the fate of trajectories beginning at displacements of \(175, 176, 177, \dots 225\) bits. All trails whose starting point is within 209 bits of the target are colored blue; those starting at a greater distance are red. Most of the blue trajectories converge, quickly going to zero distance, and most of the red ones don’t. Near the critical distance, however, there are lots of exceptions.

The graph below offers yet another view of how initial distance from the target affects the likelihood of eventually converging on the correct memory address. At a distance of \(170\) bits almost all trials succeed; at \(240\) bits almost none do. The crossover point (where success and failure are equally likely) seems to lie at about \(203\) bits, a little lower than Kanerva’s result of \(209\).

The ability to reconstruct memories from partial information is a familiar element of human experience. You notice an actor in a television show, and you realize you’ve seen him before, but you don’t remember where. After a few minutes it comes to you: He’s Mr. Bates from *Downton Abbey*, but without his butler suit. Then there’s the high school reunion challenge: Looking at the stout, balding gentleman across the room, can you recognize the friend you last knew as a lanky teenager in track shorts? Sometimes, filling in the blanks requires a prolonged struggle. I have written before about my own inexplicable memory blind spot for the flowering vine wisteria, which I can name only after patiently working my way through a catalogue of false scents: hydrangea, verbena, forsythia.

Could our knack for recovering memories from incomplete or noisy inputs work something like the recursive recall process with high-dimensional vectors? It’s an attractive hypothesis, but there are also reasons for caution. For one thing, the brain seems to be able to tease meaning out of much skimpier clues. I don’t need to hear four-fifths of the Fifth Symphony before I recognize it; the first four notes will do. A flash of color moving through the trees instantly brings to mind the appropriate species—cardinal, bluejay, goldfinch. A mere whiff of chalkdust transports me back to the drowsy, overheated classroom where I doodled on the desktop all afternoon. These memories are evoked by a tiny fraction of the information they represent, far less than 80 percent.

Kanerva cites another quirk of human memory that might be modeled by an SDM: the tip-of-the-tongue phenomenon, whose essence is that you know you know something, even though you can’t immediately name it. This feeling is a bit mysterious: If you can’t find what you’re looking for, how do you know it’s there? The recursive recall process of the SDM offers a possible answer. When the successive patterns retrieved from memory are getting steadily closer together, you can be reasonably sure they will converge on a target, even before they get there.

In the struggle to retrieve a stubborn fact from memory, many people find that banging on the same door repeatedly is not a wise strategy. Rather than demanding immediate answers—getting bossy with your brain—it’s often better to set the problem aside, go for a walk, maybe even take a nap; the answer may then come to you, seemingly unbidden. Can this observation be explained by the SDM model? Perhaps, at least in part. If a sequence of recalled patterns is not converging, pursuing it further is probably fruitless. Starting over from a nearby point in the memory space might lead to a better outcome. But there’s a conundrum here: How do you find a new point of departure with better prospects? You might think you could just randomly flip a few bits in the input pattern in the hope that you’ll wind up closer to the target, but this is unlikely to work. If a vector is \(250\) bits from the target, then \(750\) bits are already correct (but you don’t know *which* \(750\) bits); any random change has a \(3/4\) chance of moving farther away rather than closer. To make progress you need to know which way to turn, and that’s a tricky question in \(1{,}000\)-dimensional space.

One aspect of the SDM architecture that seems to match human experience is the effect of repetition or rehearsal on memory. If you repeatedly recite a poem or practice playing a piece of music, you expect to remember it more easily in the future. A computational model of memory ought to exhibit the same training effect. Conventional computer memory certainly does not: There’s no benefit to writing the same value multiple times at the same address. With an SDM, in contrast, each repetition of a pattern adds another copy to all the hard locations within the pattern’s access circle. As a result, there’s less interference from overlapping patterns, and the critical radius for recall is enlarged. The effect is dramatic: When a single extra copy of a pattern is written into the memory, the critical radius grows from about \(200\) bits to more than \(300\).

By the same token, increasing the representation of one pattern can make others harder to recover. This is a form of forgetting, as the heavily imprinted pattern crowds out its neighbors and takes over part of their territory. This effect is also dramatic in the SDM—unrealistically so. A vector stored eight or ten times seems to monopolize most of the memory; it becomes an obsession, the answer to all questions.

A notable advantage of sparse distributed memory is its resilience in the face of hardware failures or errors. I would be unhappy with my own brain if the loss of a single neuron could leave a hole in my memory, so that I could no longer recognize the letter *g* or remember how to tie my shoelaces. SDM does not suffer from such fragility. With a thousand copies of every stored pattern, no one site is essential. Indeed, it’s possible to wipe out all information stored in \(60\) percent of the hard locations and still get perfect recall of \(10{,}000\) stored items, assuming you supply the exact address as the cue. With partial cues, the critical radius contracts as more sites are lost. After destroying \(60\) percent of the sites, the critical radius shrinks from \(200+\) bits to about \(150\) bits. With \(80\) percent of the sites gone, memory is seriously degraded but not extinguished.

And what about woolgathering? Can we traipse idly through the meadows of sparse distributed memory, serendipitously leaping from one stored pattern to the next? I’ll return to this question.

Most of the narrative above was written several weeks ago. At the time, I was reading about various competing theories of memory, and discussing their merits with my colleagues at the Simons Institute. I wrote up my thoughts on the subject, but I held off publishing because of nagging doubts about whether I truly understood the mathematics of sparse distributed memory. I’m glad I waited.

The Brain and Computation program ended in May. The participants have scattered; I am back in New England, where sage and rosemary are small potted plants rather than burgeoning shrubs spilling over the sidewalk. My morning strolls to the Berkeley campus, a daily occasion for musing about the nature of memory and learning, have themselves become “engrams” stored somewhere in my head (though I still don’t know where to look for them).

I have not given up the quest. Since I left Berkeley I’ve continued reading on theories of memory. I’ve also been writing programs to explore Pentti Kanerva’s sparse distributed memory and his broader ideas on “hyperdimensional computing.” Even if this project fails to reveal the secrets of human memory, it is certainly teaching me something about the mathematical and computational art of navigating high-dimensional spaces.

The diagram below represents the “right” way to implement SDM, as I understand it. The central element is a crossbar matrix in which the rows correspond to the memory’s hard locations and the columns carry signals representing the individual bits of an input vector. The canonical memory has a million rows, each with a randomly assigned \(1{,}000\)-bit address, and \(1{,}000\) columns; this toy version has 20 rows and 8 columns.

The process illustrated in the diagram is the storage of a single input vector in an otherwise empty memory. The eight input bits are compared simultaneously with all \(20\) hard-location addresses. Wherever an input bit and an address bit match—\(0\) with \(0\) or \(1\) with \(1\)—we place a dot at the intersection of the column and the row. Then we count the number of dots in each row, and if the count meets or exceeds a threshold, we write the input vector into the register associated with that row *(blue boxes)*. In the example shown, the threshold is \(5\), and \(8\) of the \(20\) addresses have at least \(5\) matches. In the \(1{,}000\)-bit memory, the threshold would be \(451\), and only about a thousandth of the registers would be selected.

The magic in this design is that all of the bit comparisons—a billion of them in the canonical model—happen concurrently. As a result, the access time for both reading and writing is independent of the number of hard locations, and can be very fast. Circuitry of this general type, known as an associative memory or content-addressable memory, has a role in certain specialized computing applications, such as triggering the particle detectors at the Large Hadron Collider and steering packets through the routers of the internet backbone. And the circuit diagram might also be plausibly mapped onto certain structures in the brain. Kanerva points out that the cerebellum looks a lot like such a matrix. The rows are flat, fanlike Purkinje cells, arranged like the pages of a book; the columns are parallel fibers threaded through the whole population of Purkinje cells. (However, the cerebellum is not the region of the mammalian brain where cognitive memory is thought to reside.)

It would be wonderful to build an SDM simulation based on this crossbar design; unfortunately, I don’t know how to do that with any computer hardware I can lay my hands on. A conventional processor offers no way to compare all the input bits with all the hard-location bits simultaneously. Instead I have to scan through a million hard locations one by one, and at each location compare a thousand pairs of bits. That’s a billion bit comparisons for every item stored into or retrieved from the memory. Add to that the time needed to write or read a million bits (a thousand copies of a \(1{,}000\)-bit vector), and we’re talking about quite a lumbering process. Here’s the code for storing a vector:

```
function store(v::BitVector)
for loc in SDM
if hamming_distance(v, loc.address) <= r
write_to_register!(loc.register, v)
end
end
end
```

This implementation needs almost an hour to stock the memory with \(10{,}000\) remembered patterns. (The complete program, in the form of a Jupyter notebook, is available on GitHub.)

Is there a better algorithm for simulating the SDM on conventional hardware? One possible strategy avoids repeatedly searching for the set of hard locations within the access circle of a given vector; instead, when the vector is first written into the memory, the program keeps a pointer to each of the thousand-or-so locations where it is stored. On any future reference to the same vector, the program can just follow the \(1{,}000\) saved pointers rather than scanning the entire array of a million hard locations. The cost of this caching scheme is the need to store all those pointers—\(10\) million of them for the canonical SDM. Doing so is feasible, and it might be worthwhile if you only wanted to store and retrieve exact, known values. But think about what happens in response to an approximate memory probe, with the recursive recall of \(Z^{\prime}\) and \(Z^{\prime\prime}\) and \(Z^{\prime\prime\prime}\), and so on. None of those intermediate values will be found in the cache, and so the full scan of all hard locations is still needed.

Perhaps there’s a cleverer shortcut. A recent review article on “Approximate Nearest Neighbor Search in High Dimensions,” by Alexandr Andoni, Piotr Indyk, and Ilya Razenshteyn, mentions an intriguing technique called locality sensitive hashing, but I can’t quite see how to adapt it to the SDM problem.

The ability to reconstruct memories from partial cues is a tantalizingly lifelike trait in a computational model. Perhaps it might be extended to yield a plausible mechanism for wandering idly through the chambers of memory, letting one idea lead to the next.

At first I thought I knew how this might work. A pattern \(X\) stored in the SDM creates a basin of attraction around itself, where any recursive probe of the memory starting within a critical radius will converge to \(X\). Given \(10{,}000\) such attractors, I can imagine them partitioning the memory space into a matrix of separate compartments, like a high-dimensional foam of soap bubbles. The basin for each stored item occupies a distinct volume, surrounded on all sides by other basins and bumping up against them, with sharp boundaries between adjacent domains. In support of this notion, I would note that the average radius of a basin of attraction shrinks when more content is poured into the memory, as if the bubbles were being compressed by overcrowding.

This vision of what’s going on inside the SDM suggests a simple way to drift from one domain to the next: Randomly flip enough bits in a vector to take it outside the present basin of attraction and into an adjacent one, then apply the recursive recall algorithm. Repeating this procedure will generate a random walk through the set of topics stored in the memory.

The only trouble is, it doesn’t work. If you try it, you will indeed wander aimlessly in the \(1{,}000\)-dimensional lattice, but you will never find anything stored there. The entire plan is based on a faulty intuition about the geometry of the SDM. The stored vectors with their basins of attraction are *not* tightly packed like soap bubbles; on the contrary, they are isolated galaxies floating in a vast and vacant universe, with huge tracts of empty space between them. A few calculations show the true nature of the situation. In the canonical model the critical radius defining the basin of attraction is about \(200\). The volume of a single basin—measured as the number of vectors inside it—is

$$\sum_{k = 1}^{200} \binom{1000}{k},$$

which works out to roughly \(10^{216}\). Thus all \(10{,}000\) basins occupy a volume of \(10^{220}\). That’s a big number, but it’s still a tiny fraction of the \(1{,}000\)-dimensional cube. Among all the vertices of the cube, only \(1\) out of \(10^{80}\) lies within 200 bits of a stored pattern. You could wander forever without stumbling into one of those basins.

(Forever? Oh, all right, maybe not forever. Because the hypercube is a finite structure, any path through it must eventually become recurrent, either hitting a fixed point from which it never escapes or falling into a repeating cycle. The stored vectors are fixed points, and there are also many other fixed points that don’t correspond to any meaningful pattern. For what it’s worth, in all my experiments with SDM programs, I have yet to run into a stored pattern “by accident.”)

Hoping to salvage this failed idea, I tried a few more experiments. In one case I deliberately stored a bunch of related concepts at nearby addresses (“nearby” meaning within 200 or 300 bits). Within this cluster, perhaps I could skip blithely from point to point. But in fact the entire cluster congealed into one big basin of attraction for the central pattern, which thus became a black hole swallowing up all its companions. I also tried fiddling with the value of \(r\), the radius of the access circle for all reading and writing operations. In the canonical model \(r = 451\). I thought that writing to a slightly smaller circle or reading from a slightly larger one might allow some wiggle room for randomness in the results, but this hope was also disappointed.

All of these efforts were based on a misunderstanding of high-dimensional vector spaces. Trying to find clusters of nearby values in the hypercube is hopeless; the stored patterns are sprinkled much too sparsely throughout the volume. And deliberately creating dense clusters is pointless, because it destroys the very property that makes the system interesting—the ability to converge on a stored item from any point in the surrounding basin of attraction. If we’re going to create a daydreaming algorithm for the SDM, it will have to work some other way.

In casting about for an alternative daydreaming mechanism, we might consider smuggling some graph theory into the world of sparse distributed memory. Then we could take a step back toward the original idea of mental rambling as a random walk on a graph or network. The key to building such graphs in the SDM turns out to be a familiar tool: the exclusive OR operator.

As discussed above, the Hamming distance between two vectors is calculated by taking their bitwise XOR and then counting the \(1\)s in the result. But the XOR operation provides more information than just the distance between two vectors; it also reveals the orientation or direction of the line that joins them. Specifically, the operation \(u \veebar v\) yields a vector that lists the bits that need to be changed to transform \(u\) into \(v\) or vice versa. You might also think of the \(1\)s and \(0\)s in the XOR vector as a sequence of directions to be followed to trace a path from \(u\) to \(v\).

XOR has always been my personal favorite among the Boolean functions. It is a difference operator, but unlike subtraction, XOR is symmetric: \(u \veebar v = v \veebar u\). Furthermore, XOR is its own inverse. This concept is easy to understand with functions of a single argument: \(f(x)\) is its own inverse if \(f(f(x)) = x\), so that applying the function twice gets you back to where you started. For a two-argument function such as XOR the situation is more complicated, but it’s still true that doing the same thing twice restores the original state. Specifically, if \(u \veebar v = w\), then \(u \veebar w = v\) and \(v \veebar w = u\). The three vectors \(u\), \(v\), and \(w\) form a tiny, closed universe. You can apply the XOR operator to any pair of them and you’ll get back the third element of the set. Below is my attempt to illustrate this idea. Each square represents a \(10{,}000\)-bit vector arranged as a \(100\)-by-\(100\) tableau of light and dark pixels. The three patterns appear to be random and independent, but hovering with the mouse pointer will show that each panel is in fact the XOR of the other two. For example, in the leftmost square, each red pixel matches either a green pixel or a blue pixel, but not both.

The self-inverse property suggests a new way of organizing information in the SDM. Suppose the word *butterfly* and its French equivalent *papillon* are stored as arbitrary, random vectors. They will not be close together; the distance between them is likely to be about 500 bits. Now we compute the XOR of these vectors, *butterfly* ⊻ *papillon*; the result is another vector that can also be stored in the SDM. This new vector encodes the relation *English-French*. Now we are equipped to translate. Given the vector for *butterfly*, we XOR it with the *English-French* vector and get *papillon*. The same trick works in the other direction.

This pair of words and the relation between them forms the nucleus of a semantic network. Let’s grow it a little. We can store the word *caterpillar* at an arbitrary address, then compute *butterfly* ⊻ *caterpillar* and call this new relation *adult-juvenile*. What’s the French for *caterpillar*? It’s *chenille*. We add this fact to the network by storing *chenille* at the address *caterpillar* ⊻ *English-French*. Now some magic happens: If we take *papillon* ⊻ *chenille*, we’ll learn that these words are connected by the relation *adult-juvenile*, even though we did not explicitly state that fact. It is a constraint imposed by the geometry of the construction.

The graph could be extended further by adding more English-French cognates (*dog-chien, horse-cheval*) or more adult-juvenile pairs: (*dog-puppy, tree-sapling*). And there are plenty of other relations to be explored: synonyms, antonyms, siblings, cause-effect, predator-prey, and so on. There’s also a sweet way of linking a set of events into a chronological sequence, just by XORing the addresses of a node’s predecessor and successor.

The XOR method of linking concepts is a hybrid of geometry and graph theory. In ordinary mathematical graph theory, distances and directions are irrelevant; all that matters is the presence or absence of connecting edges between nodes. In the SDM, on the other hand, the edge representing a relation between nodes is a vector of definite length and orientation within the \(1{,}000\)-dimensional space. Given a node and a relation, the XOR operation “binds” that node to a specific position elsewhere in the hypercube. The resulting structure is completely rigid; you can’t move a node without changing all the relations it participates in. In the case of the butterflies and caterpillars, the configuration of four nodes is necessarily a parallelogram, with pairs of opposite sides that have the same length and orientation.

Another distinctive feature of the XOR-linked graph is that the nodes and the edges have exactly the same representation. In most computer implementations of graph-theoretical ideas, these two entities are quite different; a node might be a list of attributes, and an edge would be a pair of pointers to the nodes it connects. In the SDM, both nodes and edges are simply high-dimensional vectors. Both can be stored in the same format.

As a model of human memory, XOR binding offers the prospect of connecting any two concepts through any relation we can invent. But the scheme also has some deficiencies. Many real-world relations are asymmetric; they don’t share the self-inverse property of XOR. An XOR vector can declare that Edward and Victoria are parent and child, but it can’t tell you which is which. Worse, the XOR vector connects exactly two nodes, never more, so a parent of multiple children faces an awkward predicament. Another challenge is keeping all the branches of a large graph consistent with one another. You can’t just add nodes and edges willy-nilly; they must be joined to the graph in the right order. Inserting a pupal stage between the butterfly and the caterpillar would require rewiring most of the diagram, moving several nodes to new locations within the hypercube and recalculating the relation vectors that connect them, all the while taking care that each change on the English side is mirrored correctly on the French side.

Some of these issues are addressed in another XOR-based technique that Kanerva calls bundling. The idea is to create a kind of database by storing attribute-value pairs. An entry for a book might have attributes such as *author*, *title*, and *publisher*, each of which is paired with a corresponding value. The first step in bundling the data is to separately XOR each attribute-value pair. Then the vectors resulting from these operations are combined to form a single sum vector, using the same algorithm described above for storing multiple vectors in a hard location of the SDM. Taking the XOR of an attribute name with this combined vector will extract an approximation to the corresponding value, close enough to identify it by the recursive recall method. In experiments with the canonical model I found that a single \(1{,}000\)-bit vector could hold six or seven attribute-value pairs without much risk of confusion.

Binding and bundling are not mentioned in Kanerva’s 1988 book, but he discusses them in detail in several more recent papers. (See Further Reading, below.) He points out that with these two operations the set of high-dimensional vectors acquires the structure of an algebraic field—or at least an approximation to a field. The canonical example of a field is the set of real numbers together with the operations of addition and multiplication and their inverses. The reals form a closed set under these operations: Adding, subtracting, multiplying or dividing any two real numbers yields another real number (except for division by zero, which is always the joker in the pack). Likewise a set of binary vectors is closed under binding and bundling, except that sometimes the result extracted from a bundled vector has to be “cleaned up” by the recursive recall process in order to recover a member of the set.

Can binding and bundling offer any help when we try to devise a woolgathering algorithm? They provide some basic tools for navigating through a semantic graph, including the possibility of performing a random walk. Starting from any node of an XOR-linked graph, a random-walk algorithm chooses from among all the relations available at that node. Selecting a relation vector at random and XORing it with the address of the node leads to a different node, where the procedure can be repeated. Similarly, in bundled attribute-value pairs, a randomly selected attribute calls forth the corresponding value, which becomes the next node to explore.

But how does the algorithm know which relations or which attributes are available for choosing? The relations and attributes are represented as vectors and stored in the memory just like any other objects, but there is no obvious means of retrieving those vectors unless you already know what they are. You can’t say to the memory, “Show me all the relations.” You can only present a pattern and ask, “Is this vector present? Have you seen it or something like it?”

With a conventional computer memory, you can do a core dump: Step through all the addresses and print out the value found at each location. There’s no such procedure for a distributed memory. I learned this troubling fact the hard way. While building a computational model of the SDM, I got the pieces working well enough that I could store a few thousand randomly generated patterns in the memory. But I could not retrieve them, because I didn’t know what to ask for. The solution was to maintain a separate list, outside the SDM itself, keeping a record of everything I stored. But it seems farfetched to suppose that the brain would maintain both a memory and an index to that memory. Why not just use the index, which is so much simpler?

In view of this limitation, it seems that sparse distributed memory is equipped to serve the senses but not the imagination. It can recognize familiar patterns and store novel ones, which will then be recognized when next encountered, even from partial or corrupted cues. With binding or bundling, the memory can also keep track of relations between pairs of stored items. But whatever is put into the memory can be gotten out only by supplying a suitable cue.

When I look at the publicity poster for *The Graduate*, I see Dustin Hoffman, more leery than leering, regarding the stockinged leg of Anne Bancroft, who plays Mrs. Robinson. This visual stimulus excites several subsets of neurons in my cerebral cortex, corresponding to my memories of the actors, the characters, the story, the soundtrack, the year 1967. All of this brain activity might be explained by the SDM memory architecture, if we grant that subsets of neurons can be represented in some abstract way by long, random binary vectors. What’s not so readily explained is how I can summon to mind all the same sensations without having the image in front of me. How do I draw those particular long, random sequences out of the great tangle of vectors without already knowing where they are?

So ends my long ramble, on a note of doubt and disappointment. It’s hardly surprising that I have failed to get to the bottom of it all. These are deep waters.

On the very first day of the Simons brain-and-computation program, Jeff Lichtman, who is laboring to trace the wiring diagram of the mouse brain, asked whether neuroscience has yet had its Watson-Crick moment. In molecular genetics we have reached the point where we can extract a strand of DNA from a living cell and read many its messages. We can even write our own messages and put them back into an organism. The equivalent capability in neuroscience would be to examine a hunk of brain tissue and read out the information stored there—the knowledge, the memories, the world view. Maybe we could also write information directly into the brain.

Science is not even close to achieving this feat—to the great relief of many. That includes me: I don’t look forward to having my thoughts sucked out of my head through electrodes or pipettes, to be replaced with #fakenews. However, I really *do* want to know how the brain works.

The Simons program left me dazzled by recent progress in neuroscience, but it also revealed that some of the biggest questions remain wide open. The connectomics projects of Lichtmann and others are producing a detailed map of millions of neurons and their interconnections. New recording techniques allow us to listen in on the signals emitted by individual nerve cells and to follow waves of excitation across broad regions of the brain. We have a pretty comprehensive catalogue of neuron types, and we know a lot about their physiology and biochemistry. All this is impressive, but so are the mysteries. We can record neural signals, but for the most part we don’t know what they mean. We don’t know how information is encoded or stored in the brain. It’s rather like trying to understand the circuitry of a digital computer without knowing anything of binary arithmetic or Boolean logic.

Pentti Kanerva’s sparse distributed memory is an attempt to fill in some of these gaps. It is not the only such attempt. A better-known alternative is John Hopfield’s conception of a neural network as a dynamical system settling into an energy-minimizing attractor. The two ideas have some basic principles in common: Information is scattered across large numbers of neurons, and it is encoded in a way that would not be readily understood by an outside observer, even one with access to all the neurons and the signals passing between them. Schemes of this kind, essentially mathematical and computational, occupy a conceptual middle ground between high-level psychology and low-level neural engineering. It’s the layer where the meaning is.

Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. *Proceedings of the National Academy of Sciences* 79(8):2554–2558.

Kanerva, Pentti. 1988. *Sparse Distributed Memory*. Cambridge, Mass.: MIT Press.

Kanerva, Pentti. 1996. Binary spatter-coding of ordered *K*-tuples. In C. von der Malsburg, W. von Seelen, J. C. Vorbruggen and B. Sendhoff, eds. *Artificial Neural Networks—ICANN 96 Proceedings*, pp. 869–873. Berlin: Springer.

Kanerva, Pentti. 2000. Large patterns make great symbols: An example of learning from example. In S. Wermter and R. Sun, eds. *Hybrid Neural Systems*, pp. 194–203. Heidelberg: Springer. PDF

Kanerva, Pentti. 2009. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. *Cognitive Computation* 1(2):139–159. PDF

Kanerva, Pentti. 2010. What we mean when we say “What’s the Dollar of Mexico?”: Prototypes and mapping in concept space. Report FS-10-08-006, AAAI Fall Symposium on Quantum Informatics for Cognitive, Social, and Semantic Processes. PDF

Kanerva, Pentti. 2014. Computing with 10,000-bit words. Fifty-second Annual Allerton Conference, University of Illinois at Urbana-Champagne, October 2014. PDF

Plate, Tony. 1995. Holographic reduced representations. IEEE Transactions on Neural Networks 6(3):623–641. PDF

Plate, Tony A. 2003. *Holographic Reduced Representation: Distributed Representation of Cognitive Structure*. Stanford, CA: CSLI Publications.

Rahimi, Abbas, Sohum Datta, Denis Kleyko, E. Paxon Frady, Bruno Olshausen, Pentti Kanerva, and Jan M. Rabaey. 2017. High-dimensional computing as a nanoscalable paradigm. *IEEE Transactions on Circuits and Systems* 64(9):2508–2521. Preprint PDF

- Donald O. Hebb’s
*The Organization of Behavior: A Neuropsychological Theory.*This is the book that introduced a fundamental hypothesis about learning and memory, captured in the slogan “Neurons that fire together get wired together.” - Norbert Wiener’s
*Cybernetics: or, Control and Communication in the Animal and the Machine*, an eccentric and wide-ranging masterpiece with a crucial chapter on “Computing Machines and the Nervous System.” - Claude Shannon’s
*The Mathematical Theory of Communication*, the foundational document of information theory. (Shannon’s part of this work had appeared a year earlier in the*Bell System Technical Journal*; the book version includes an interpretive essay by Warren Weaver.)

When I got the three volumes home, I made a surprising discovery: They were all published at roughly the same time, in 1948 and 1949. What are the odds of that? Perhaps it means nothing—just the long arm of coincidence reaching out to tap me on the shoulder. On the other hand, maybe there was something in the air circa 1950, something that made the period unusually fertile for studies of information, communication, and computation in brains and machines.

I have done a little digging in library catalogues and Wikipedia, as well as in my own files, looking for other titles that might belong on this list of distinguished midcentury milestones.

It turns out that George Kingsley Zipf’s *Human Behavior and the Principle of Least Effort* was also published in 1949. (This is the one about the curious power-law distribution seen in rankings of word frequencies, city sizes, and so on.)

Gilbert Ryle’s *The Concept of Mind* is another 1949 title, though I’ve never read it. Also from 1949: Nicholas Metropolis and Stanislaw Ulam published the first open account of the Monte Carlo method.

Drifting forward into 1950, we find another cluster of notables. There is John Nash’s one-page paper introducing what we now call the Nash equilibrium. Elsewhere in game theory, 1950 was the debut year for prisoner’s dilemma, although Merrill Flood’s paper describing it did not appear until two years later. Richard Hamming published “Error Detecting and Error Correcting Codes” in 1950. (It’s another paper from the *Bell System Technical Journal*.) Finally, there’s Alan M. Turing’s famous essay on “Computing Machinery and Intelligence.”

Does the density of high-octane publications really make 1948–50 an exceptional season of intellectual history? I can’t offer any solid statistical support for that notion. In the first place, my criteria for inclusion on the list are way too vague. (“Subjects I find interesting” may be closest to the truth.) In the second place, I can’t offer any evidence that other intervals were not equally productive. As a matter of fact, in my bibliographic rummaging I came across a nexus of brilliance five years earlier:

- Warren S. McCollough and Walter H. Pitts, “A logical calculus of the ideas immanent in nervous activity,” 1943.
- John von Neumann and Oskar Morgenstern,
*Theory of Games and Economic Behavior*, 1944. - Erwin Schrödinger,
*What Is Life? The Physical Aspect of the Living Cell,*1944. - Vannevar Bush, “As We May Think,” 1945.
- John von Neumann, “First Draft of a Report on the EDVAC,” 1945

I acknowledge a further reason for caution when I cite 1949 as a year of special distinction. It’s *my* year, the year of my birth.

In 1994 a document called the QED Manifesto made the rounds of certain mathematical mailing lists and Usenet groups.

QED is the very tentative title of a project to build a computer system that effectively represents all important mathematical knowledge and techniques. The QED system will conform to the highest standards of mathematical rigor, including the use of strict formality in the internal representation of knowledge and the use of mechanical methods to check proofs of the correctness of all entries in the system.

The ambitions of the QED project—and its eventual failure—were front and center in a talk by Thomas Hales (University of Pittsburgh) on Formal Abstracts in Mathematics. Hales is proposing another such undertaking: A comprehensive database of theorems and other mathematical propositions, along with the axioms, assumptions, and definitions on which the theorems depend, all represented in a formal notation readable by both humans and machines. Unlike QED, however, these “formal abstracts” would *not* include proofs of the theorems. Excluding proofs is a huge retreat from the aims of the QED group, but Hales argues that it’s necessary to make the project feasible with current technology.

Hales has plenty of experience in this field. In 1998 he announced a proof of the Kepler conjecture—the assertion that the grocer’s stack of oranges embodies the densest possible arrangement of equal-size spheres in three-dimensional space. Hales’s proof was long and complex, so much so that it stymied the efforts of journal referees to fully check it. Hales and 21 collaborators then spent a dozen years constructing a formal, computer-mediated verification of the proof.

What’s the use of a database of mathematical assertions if it doesn’t include proofs? Hales held out several potential benefits, two of which I found particularly appealing. First, the database could answer global questions about the mathematical literature; one could ask, “How many theorems depend on the Riemann hypothesis?” Second, the formal abstracts would capture the meaning of mathematical statements, not just their surface form. A search for all mentions of the equation \(x^m - y^n = 1\) would find instances that use symbols other than \(x, y, m, n,\) or that take slightly different forms, such as \(x^m - 1 = y^n\).

Hales’s formal abstracts sound intriguing, but I have to confess to a certain level of disappointment and bafflement. All around us, triumphant machines are conquering one domain after another—chess, go, poker, Jeopardy, the driver’s seat. But not proofs, apparently.

Am I the last person in the whole republic of numbers to learn that Sperner’s lemma is a discrete version of the Brouwer fixed-point theorem? Francis Su and John Stillwell clued me in.

The lemma—first stated in 1928 by the German mathematician Emanuel Sperner—seems rather narrow and specialized, but it turns up everywhere. It concerns a triangle whose vertices are assigned three distinct colors:

Divide the triangle into smaller triangles, constrained by two rules. First, no edge or segment of an edge can be part of more than two triangles. Second, if a vertex of a new small triangle lies on an edge of the original main triangle, the new vertex must be given one of the two colors found at the end points of that main edge. For example, a vertex along the red-green edge on the left side of the main triangle must be either red or green. Vertices strictly inside the main triangle can be given any of the three colors, without restriction.

The lemma states that at least one interior triangle must have a full complement of red, green, and blue vertices. Actually, the lemma’s claim is slightly stronger: The number of trichromatic inner triangles must be odd. In the augmented diagram below, adding a single new red vertex has created two more RGB triangles, for a total of three.

Su gave a quick proof of the lemma. Consider the set of all edge segments that have one red and one green endpoint. On the exterior boundary of the large triangle, such segments can appear only along the red-green edge, and there must be an odd number of them. Now draw a path that enters the large triangle from the outside, that crosses only red-green segments, and that crosses each such segment at most once.

One possible fate of this RG path is to enter through one red-green segment and exit through another. But since the number of red-green segments on the boundary is odd, there must be at least one path that enters the large triangle and never exits. The only way it can become trapped is to enter a red-green-blue triangle. (There’s nothing special about red-green segments, so this argument also holds for paths crossing red-blue and blue-green segments.)

So much for Sperner’s lemma. What do these nested triangles have to do with the Brouwer fixed-point theorem? That theorem operates in a continuous domain, which seems remote from the discrete network of Sperner’s triangulated triangle.

As the story goes (I can’t vouch for its provenance), L. E. J. Brouwer formulated his theorem at the breakfast table. Stirring his coffee, he noticed that there always seemed to be at least one stationary point on the surface of the moving liquid. He was able to prove this fact not just for the interior of a coffee cup but for any bounded, closed, and convex region, and not just for circular motion but for any continuous function that maps points within such a region to points in the same region. For each such function \(f\), there is a point \(p\) such that \(f(p) = p\).

Brouwer’s fixed-point theorem was a landmark in the development of topology, and yet Brouwer himself later renounced the theorem—or at least his proof of it, because the proof was nonconstructive: It gave no procedure for finding or identifying the fixed point. John Stillwell argues that a proof based on Sperner’s lemma comes as close as possible to a constructive proof, though it would still have left Brouwer unsatisfied.

The proof relies on the same kind of paths represented by yellow arrows in the diagram above. At least one such path comes to an end inside a tri-colored triangle, which Sperner’s lemma shows must exist in any properly colored triangulated network. If we continue subdividing the triangles under the Sperner rules, and proceed to the limit where the edge lengths go to zero, then the path ends at a single, stationary point. (It’s the “proceed to the limit” step that Brouwer would not have liked.)

You have five muffins to share among three students; lets call the students April, May, and June. One solution is to give each student one whole muffin, then divide the remaining two muffins into pieces of size one-third and two-thirds. Then the portions are divvied up as follows:

This allotment is quantitatively fair, in that each student receives five-thirds of a muffin, but June complains that her two small pieces are less appetizing than the others’ larger ones. She feels she’s been given leftover crumbs. Hence the division is not envy-free.

There are surely many ways of addressing this complaint. You might cut *all* the muffins into pieces of size one-third, and give each student five equal pieces. Or you might give each student a muffin and a half, then eat the leftover half yourself. These are practical and sensible strategies, but they are not what Bill Gasarch was seeking when he gave a talk on the problem Saturday afternoon. Gasarch asked a specific question: What is the maximum size of the minimum piece? Can we do better than one-third?

The answer is yes. Here is a division that cuts one muffin in half and divides each of the other four muffins into portions of size seven-twelfths and five-twelfths. April and May each get \(\frac{1}{2} + \frac{7}{12} + \frac{7}{12}\); June gets \(4 \times \frac{5}{12}\).

Five-twelfths is larger than one-third, and thus should seem less crumby. Indeed, Gasarch and his colleagues have proved five-twelfths is the best result possible: It is the maximum of the minimum. (Nevertheless, I worry that June may still be unhappy. Her portion is cut up into four pieces, whereas the others get three pieces each; furthermore, all of June’s pieces are smaller than April’s and May’s. Again, however, these concerns lie outside the scope of the mathematical problem.)

A key observation is that the smallest piece can never be larger than one-half. This is thunderously obvious once you know it, but I failed to see it when I first started thinking about the problem.

Fair-division problems have a long history (going back at least as far as the Talmud), and cake-cutting versions have been proliferating for decades. A 1961 article by L. E. Dubins and E. H. Spanier (*American Mathematical Monthly* 68:1–17) inspired much further work. There are even connections with Sperner’s lemma. Nevertheless, the genre is not exhausted yet; the muffin problem seems to be a new wrinkle. Gasarch and six co-authors (three of them high school students) have prepared a 166-page manuscript describing a year’s worth of labor on the problem, with optimal results for all instances with up to six students (and any number of muffins), as well as upper and lower bounds on solutions to larger instances, and various conjectures on open problems.

Long-time readers of bit-player may remember that Gasarch has been mentioned here before. Back in 2009 he offered (and eventually paid) \($17^2\) for a four-coloring of a 17-by-17 lattice such that no four lattice points forming a rectangle all have the same color. That problem attracted considerable attention both here and on Gasarch’s own Computational Complexity blog (conducted jointly with Lance Fortnow).

Note: In the comments Jim Propp points out that the muffin problem was invented by Alan Frank. The omission of this fact is my fault; Gasarch mentions it in his paper. The problem’s first appearance in print seems to be in a *New York Times* Numberplay column by Gary Antonick. Frank’s priority is acknowledged only in a footnote, which seems unfair. I apologize for again giving him credit only as an afterthought.

Last week I spent five days in the driver’s seat, crossing the country from east to west, mostly on Interstate 80. I’ve made the trip before, though never on this route. In particular, the 900-mile stretch from Lincoln, Nebraska, across the southern tier of Wyoming, and down to Salt Lake City was new to me.

Driving is a task that engages only a part of one’s neural network, so the rest of the mind is free to wander. On this occasion my thoughts took a political turn. After all, I was boring through the bright red heart of America. Especially in Wyoming.

Based on the party affiliations of registered voters, Wyoming is far and away the most Republican state in the union, with the party claiming the allegiance of two-thirds of the electorate. The Democrats have 18 percent. A 2013 Gallup poll identified Wyoming as the most “conservative” state, with just over half those surveyed preferring that label to “moderate” or “liberal.”

The other singular distinction of Wyoming is that it has the smallest population of all the states, estimated at 579,000. The entire state has fewer people than many U.S. cities, including Albuquerque, Milwaukee, and Baltimore. The population density is a little under six people per square mile.

I looked up these numbers while staying the night in Laramie, the state’s college town, and I was mulling them over as I continued west the next morning, climbing through miles of rolling grassland and sagebrush with scarcely any sign of human habitation. A mischievous thought came upon me. What would it take to flip Wyoming? If we could somehow induce 125,000 liberal voters to take up legal residence here, the state would change sides. We’d have two more Democrats in the Senate, and one more in the House. Berkeley, California, my destination on this road trip, has a population of about 120,000. Maybe we could persuade everyone in Berkeley to give up Chez Panisse and Moe’s Books, and build a new People’s Republic somewhere on Wyoming’s Medicine Bow River.

Let me quickly interject: This is a daydream, or maybe a nightmare, and not a serious proposal. Colonizing Wyoming for political purposes would not be a happy experience for either the immigrants or the natives. The scheme belongs in the same category as a plan announced by a former Mormon bishop to build a new city of a million people in Vermont. (Vermont has a population of about 624,000, the second smallest among U.S. states.)

Rather than trying to flip Wyoming, maybe one should try to fix it. *Why* is it the least populated state, and the most Republican? Why is so much of the landscape vacant? Why aren’t entrepreneurs with dreams of cryptocurrency fortunes flocking to Cheyenne or Casper with their plans for startup companies?

The experience of driving through the state on I-80 suggests some answers to these questions. I found myself wondering how even the existing population of a few hundred thousand manages to sustain itself. Wikipedia says there’s some agriculture in the state (beef, hay, sugar beets), but I saw little evidence of it. There’s tourism, but that’s mostly in the northwest corner, focused on Yellowstone and Grand Teton national parks and the cowboy-chic enclave of Jackson Hole. The only conspicuous economic activity along the I-80 corridor is connected with the mining and energy industries. My very first experience of Wyoming was olfactory: Coming downhill from Pine Bluffs, Nebraska, I caught of whiff of the Frontier oil refinery in Cheyenne; as I got closer to town, I watched the sun set behind a low-hanging purple haze that might also be refinery-related. The next day, halfway across the state, the Sinclair refinery announced itself in a similar way.

Still farther west, coal takes over where oil leaves off. The Jim Bridger power plant, whose stacks and cooling-tower plumes are visible from the highway, burns locally mined coal and exports the electricity.

As the author of a book celebrating industrial artifacts, I’m hardly the one to gripe about the presence of such infrastructure. On the other hand, oil and coal are not much of a foundation for a modern economy. Even with all the wells, the pipelines, the refineries, the mines, and the power plants, Wyoming employment in the “extractive” sector is only about 24,000 (or 7 percent of the state’s workforce), down sharply from a peak of 39,000 in 2008. If this is the industry that will build the state’s future, then the future looks bleak.

Economists going all the way back to Adam Smith have puzzled over the question: Why do some places prosper while others languish? Why, for example, are Denver and Boulder so much livelier than Cheyenne and Laramie? The Colorado cities and the Wyoming ones are only about 100 miles apart, and they share similar histories and physical environments. But Denver is booming, with a diverse and growing economy and a population approaching 700,000—greater than the entire state of Wyoming. Cheyenne remains a tenth the size of Denver, and in Cheyenne you don’t have to fight off hordes of hipsters to book a table for dinner. What makes the difference? I suspect the answer lies in a Yogi Berra phenomenon. Everybody wants to go to Denver because everyone is there already. Nobody wants to be in Cheyenne because it’s so lonely. If this guess is correct, maybe we’d be doing Wyoming a favor by bringing in that invasion of 125,000 sandal-and-hoodie–clad bicoastals.

One more Wyoming story. At the midpoint of my journey across the state, near milepost 205 on I-80, I passed the sign shown at left. I am an aficionado of continental divide crossings, and so I took particular note. Then, 50 miles farther along, I passed another sign, shown at right. On seeing this second crossing, I put myself on high alert for a *third* such sign. This is a matter of simple topology, or so I thought. If a line—perhaps a very wiggly one—divides an area into two regions, then if you start in one region and end up in the other, you must have crossed the line an odd number of times. Shown below are some possible configurations. In each case the red line is the path of the continental divide, and the dashed blue line is the road’s trajectory across it. At far left the situation is simple: The road intersects the divide in a single point. The middle diagram shows three crossings; it’s easy to see how further elaboration of the meandering path could yield five or seven or any odd number of crossings. An arrangement that might seem to generate just two crossings is show at right. One of the “crossings” is not a crossing at all but a point of tangency. Depending on your taste in such matters, the tangent intersection could be counted as crossing the divide twice or not at all; in either case, the total number of crossings remains odd.

In the remainder of my trip I never saw a sign marking a third crossing of the divide. The explanation has nothing to do with points of tangency. I should have known that, because I’ve actually written about this peculiarity of Wyoming topography before. Can you guess what’s happening? Wikipedia tells all.

]]>Twenty years ago, Kimberly-Clark, the Kleenex company, introduced a line of toilet paper embossed with the kite-and-dart aperiodic tiling discovered by Roger Penrose. When I first heard about this, I thought: How clever. Because the pattern never repeats, the creases in successive layers of a roll would never line up over any extended region, and so the sheets would be less likely to stick together.

Sir Roger Penrose had a different response. Apparently be believes the pattern is subject to copyright protection, and he also managed to get a patent issued in 1979, although that would have expired about the time of the toilet paper scandal. Penrose assigned his rights to a British company called Pentaplex Ltd. An article in the *Times* of London quoted a representative of Pentaplex:

So often we read of very large companies riding roughshod over small businesses or individuals, but when it comes to the population of Great Britain being invited by a multinational [company] to wipe their bottoms on what appears to be the work of a knight of the realm without his permission, then a last stand must be made.

Sir Roger sued. I haven’t been able to find a documented account of how the legal action was resolved, but it seems Kimberly-Clark quickly withdrew the product.

Some years ago I was given a small sample of the infamous Penrose toilet paper. It came to me from Phil and Phylis Morrison; a note from Phylis indicates that they acquired it from Marion Walter. Now I would like to pass this treasure on to a new custodian. The specimen is unused though not pristine, roughly a foot long, and accompanied by a photocopy of the abovementioned *Times* news item. In the photograph below I have boosted the contrast to make the raised ridges more visible; in real life the pattern is subtle.

Are you interested in artifacts with unusual symmetries? Would you like to add this object to your collection? Send a note with a U.S. mailing address to brian@bit-player.org. If I get multiple requests, I’ll figure out some Solomonic procedure for choosing the recipient(s). If there are no takers, I guess I’ll use it for its intended purpose.

I must also note that my hypothesis about the special non-nesting property of the embossed paper is totally bogus. In the first place, a roll of toilet paper is an Archimedian spiral, so that the circumference increases from one layer to the next; even a perfectly regular pattern will come into coincidence with itself only when the circumference equals an integer multiple of the pattern period. Second, the texture imprinted on the toilet paper is surely not a real aperiodic tiling. The manufacturing process would have involved passing the sheet between a pair of steel crimping cylinders bearing the incised network of kites and darts. Those cylinders are necessarily of finite diameter, and so the pattern must in fact repeat. If Kimberly-Clark had contested the law suit, they might have used that point in their defense.

]]>A year later, after the Mosaic browser came on the scene, my eyes were opened. I wrote a gushing article on the marvels of the WWW.

There have long been protocols for transferring various kinds of information over the Internet, but the Web offers the first seamless interface to the entire network . . . The Web promotes the illusion that all resources are at your fingertips; the universe of information is inside the little box that sits on your desk.

I was still missing half the story. Yes, the web (which has since lost its capital *W*) opened up an amazing portal onto humanity’s accumulated storehouse of knowledge. But it did something else as well: It empowered all of us to put our own stories and ideas before the public. Economic and technological barriers were swept away; we could all become creators as well as consumers. Perhaps for the first time since Gutenberg, public communication became a reasonably symmetrical, two-way social process.

The miracle of the web is not just that the technology exists, but that it’s accessible to much of the world’s population. The entire software infrastructure is freely available, including the HTTP protocol that started it all, the languages for markup, styling, and scripting (HTML, CSS, JavaScript), server software (Apache, Nginx), content-management systems such as WordPress, and also editors, debuggers, and other development tools. Thanks to this community effort, I get to have my own little broadcasting station, my personal media empire.

But can it last?

In the U.S., the immediate threat to the web is the repeal of net-neutrality regulations. Under the new rules (or non-rules), Internet service providers will be allowed to set up toll booths and roadblocks, fast lanes and slow lanes. They will be able to expedite content from favored sources (perhaps their own affiliates) and impede or block other kinds of traffic. They could charge consumers extra fees for access to some sites, or collect back-channel payments from publishers who want preferential treatment. For a glimpse of what might be in store, a New York *Times* article looks at some recent developments in Europe. (The European Union has its own net-neutrality law, but apparently it’s not being consistently enforced.)

The loss of net neutrality has elicited much wringing of hands and gnashing of teeth. I’m as annoyed as the next netizen. But I also think it’s important to keep in mind that the web (along with the internet more generally) has always lived at the edge of the precipice. Losing net neutrality will further erode the foundations, but it is not the only threat, and probably not the worst one.

Need I point out that the internet lost its innocence a long time ago? In the early years, when the network was entirely funded by the federal government, most commercial activity was forbidden. That began to change circa 1990, when crosslinks with private-enterprise networks were put in place, and the general public found ways to get online through dial-up links. The broadening of access did not please everyone. Internet insiders recoiled at the onslaught of clueless newbies (like me); commercial network operators such as CompuServe and AmericaOnline feared that their customers would be lured away by a heavily subsidized competitor. Both sides were right about the outcome.

As late as 1994, hucksterism on the internet was still a social trangression if not a legal one. Advertising, in particular, was punished by vigorous and vocal vigilante action. But the cause was already lost. The insular, nerdy community of internet adepts was soon overwhelmed by the dot-com boom. Advertising, of course, is now the engine that drives most of the largest websites.

Commerce also intruded at a deeper level in the stack of internet technologies. When the internet first became *inter*—a network of networks—bits moved freely from one system to another through an arrangement called peering, in which no money changed hands. By the late 1990s, however, peering was reserved for true peers—for networks of roughly the same size. Smaller carriers, such as local ISPs, had to pay to connect to the network backbone. These pay-to-play arrangements were never affected by network neutrality rules.

Express lanes and tolls are also not a novelty on the internet. Netflix, for example, pays to place disk farms full of videos at strategic internet nodes around the world, reducing both transit time and network congestion. And Google has built its own private data highways, laying thousands of miles of fiber optic cable to bypass the major backbone carriers. If you’re not Netflix or Google, and you can’t quite afford to build your own global distribution system, you can hire a content delivery network (CDN) such as Akamai or Cloudflare to do it for you. What you get for your money: speedier delivery, caching of static content near the destination, and some protection against malicious traffic. Again the network neutrality rules do not apply to CDNs, even when they are owned and run by companies that also act as telecommunications carriers and ISPs, such as AT&T.

In pointing out that there’s already a lot of money grubbing in the temple of the internet, I don’t mean to suggest that the repeal of net neutrality doesn’t matter or won’t make a difference. It’s a stupid decision. As a consumer, I dread the prospect of buying internet service the way one buys bundles of cable TV channels. As a creator of websites, I fear losing affordable access to readers. As a citizen, I denounce the reckless endangerment of a valuable civic asset. This is nothing but muddy boots trampling a cultural treasure.

Still and all, it could be worse. Most likely it *will* be. Here are three developments that make me uneasy about the future of the web.

**Dominance**. In round numbers, the web has something like a billion sites and four billion users—an extraordinarily close match of producers to consumers. For any other modern medium—television stations and their viewers, newspaper and their readers—the ratio is surely orders of magnitude larger. Yet the ratio for the web is also misleading. Three fourths of those billion web sites have no content and no audience (they are “parked” domain names), and almost all the rest are tiny. Meanwhile, Facebook gets the attention of roughly half of the four billion web users. Google and Facebook together, along with their subsidiaries such as YouTube, account for 70 percent of all internet traffic. The wealth distribution of the web is even more skewed than that of the world economy.

It’s not just the scale of the few large sites that I find intimidating. Facebook in particular seems eager not just to dominate the web but to supplant it. They make an offer to the consumer: We’ll give you a *better* internet, a curated experience; we’ll show you what you want to see and filter out the crap. And they make an offer to the publisher and advertiser: This is where the people are. If you want to reach them, buy a ticket and join the party.

If everyone follows the same trail to the same few destinations, net neutrality is meaningless.

**Fragmentation**. The web is built on open standards and a philosophy of sharing and cooperation. If I put up a public website, anyone can visit without asking my permission; they can use whatever software they please when they read my pages; they can publish links to what I’ve written, which any other web user can then follow. This crosslinked body of literature is now being shattered by the rise of *apps*. Facebook and Twitter and Google and other large internet properties would really prefer that you visit them not on the open web but via their own proprietary software. And no wonder: They can hold you captive in an environment where you can’t wander away to other sites; they can prevent you from blocking advertising or otherwise fiddling with what they feed you; and they can gather more information about you than they could from a generic web browser. The trouble is, when every website requires its own app, there’s no longer a web, just a sheaf of disconnected threads.

This battle seems to be lost already on mobile platforms.

**Suppression**. All of the challenges to the future of the web that I have mentioned so far are driven by the mere pursuit of money. Far scarier are forms of manipulation and discrimination based on noneconomic motives.

Governments have ultimate control over virtually all communications media—radio and TV, newspapers, books, movies, the telephone system, the postal service, and certainly the internet. Nations that we like to think of as enlightened have not hesitated to use that power to shape public discourse or to suppress unpopular or inconvenient opinions, particularly in times of stress. With internet technology, surveillance and censorship are far easier and more efficient than they ever were with earlier media. A number of countries (most notoriously China) have taken full advantage of those capabilities. Others could follow their example. Controls might be introduced overtly through legislation or imposed surreptitiously through hacking or by coercing service providers.

Still another avenue of suppression is inciting popular sentiment—burning down websites with tiki torches. I can’t say I’m sorry to see the Nazi site *Daily Stormer* hounded from the web by public outcry; no one, it seems, will register their domain name or host their content. Historically, however, this kind of intimidation has weighed most heavily on the other end of the political spectrum. It is the labor movement, racial and ethnic and religious minorities, socialists and communists and anarchists, feminists, and the LGBT community who have most often had their speech suppressed. Considering who wields power in Washington just now, a crackdown on “fake news” on the internet is hardly an outlandish possibility.

In spite of all these forebodings, I remain strangely optimistic about the web’s prospects for survival. The internet is a resilient structure, not just in its technological underpinnings but also in its social organization. Over the past 20 years, for many of us, the net has wormed its way into every aspect of daily life. It’s too big to fail now. Even if some basement command center in the White House had a big red switch that shuts down the whole network, no one would dare to throw it.

]]>One of the quirks of life with Dennis was that he didn’t hear well, as a result of childhood ear infections. In an unpublished memoir he lists his deafness as a major influence on his path through life. It was a hardship in school, because he missed much of what his teachers were saying. On the other hand, it kept him out of the military in World War II.

Later in life, hearing aids helped considerably, but only on one side. When we went to lunch, I learned to sit to his right, so that I could speak to the better ear. When we took someone out to lunch, the guest got the favored chair. In our monthly editorial meetings, however, he turned his deaf ear to Gerard Piel, the magazine’s co-founder and publisher. (They didn’t always get along.) In Dennis’s last years, after both of us had left the magazine, we would take long walks through Lower Manhattan, with stops in coffee shops and sojourns on park benches, and again I made sure I was the right-hand man. Dennis died in 2005. I miss him all the time.

Although I was always aware of Dennis’s hearing impairment, I never had an inkling of what his asymmetric sensory experience might feel like from inside his head. Now I have a chance to find out. A few days ago I had a sudden failure of hearing in my left ear. At the time I had no idea what was happening, so I can’t reconstruct an exact chronology, but I think the ear went from normal function to zilch in a matter of seconds or minutes. It was like somebody pulled the plug.

I have since learned that this is a rare phenomenon (5 to 20 cases per 100,000 population) but well-known to the medical community. It has a name: Sudden Sensorineural Hearing Loss. It is a malfunction of the cochlea, the inner-ear transducer between mechanical vibration and neural activity. An audiological exam confirmed that my eardrum and the delicate linkage of tiny bones in the middle ear are functioning normally, but the signal is not getting through to the brain. In most cases of SSNH, the cause is never identified. I’m under treatment, and there’s a decent chance that at least some level of hearing will be restored.

I don’t often write about matters this personal, and I’m not doing so now to whine about my fate or to elicit sympathy. I want to record what I’m going through because I find it fascinating as well as distressing. A great deal of what we know about the human brain comes from accidents and malfunctions, and now I’m learning some interesting lessons at first hand.

The obvious first-order effect of losing an ear is cutting in half the amplitude of the received acoustic signal. This is perhaps the least disruptive aspect of the impairment, and the easiest to mitigate.

The second major effect is more disturbing: trouble locating the source of a sound. Binaural hearing is key to localization. For low-pitched sounds, with wavelengths greater than the diameter of the head, the brain detects the phase difference between waves reaching the two ears. The phase measurement can yield an angular resolution of just a few degrees. At higher frequencies and shorter wavelengths, the head effectly blocks sound, and so there is a large intensity difference between the two ears, which provides another localizing cue. This mechanism is somewhat less acurate, but you can home in on a source by turning your head to null the intensity difference.

With just one ear, both kinds of directional guidance are lacking. This did not come as a surprise to me, but I had never thought about what it would be like to perceive nonlocalized sounds. You might imagine it would be like switching the audio system from stereophonic to monoaural. In that case, you lose the illusion that the strings are on the left side of the stage and the brasses on the right; the whole orchestra is all mixed up in front of you. Nevertheless, in your head you are still localizing the sounds; they are all coming from the speakers across the room. Having one ear is not like that; it’s not just life in mono.

In my present state I can’t identify the sources of many sounds, but they don’t come from nowhere. Some of them come from everywhere. The drone of the refrigerator surrounds me; I hear it radiating from all four walls and the floor and ceiling; it’s as if I’m somehow inside the sound. And one night there was a repetitive thrub-a-dub that puzzled me so much I had to get out of bed and go searching for the cause. The search was essentially a random one: I determined it was not the heating system, and nothing in the kitchen or bathroom. Finally I discovered that the noise was rain pouring off the roof into the gutters and downspouts.

The failures of localization are most disturbing when the apparent source is not vague or unknown but rather quite definite—and wrong! My phone rings, and I reach out to my right to pick it up, but in fact it’s in my shirt pocket. While driving the other day, I heard the whoosh of a car that seemed to be passing me on the right, along the shoulder of the road. I almost veered left to make room. If I had done so, I would have run into the overtaking vehicle, which was of course actually on my left. (Urgent priority: Learn to ignore deceptive directional cues.)

In the first hour or so after this whole episode began, I did not recognize it as a loss of hearing; what I noticed instead was a distracting barrage of echoes. I was chatting with three other people in a room that has always seemed acoustically normal, but words were coming at me from all directions like high-velocity ping-pong balls. The echoes have faded a little in the days since, but I still hear double in some situations. And, interestingly, the echo often seems to be coming from the nonfunctioning ear. I have a hypothesis about what’s going on. Echoes are real, after all; sounds really do bounce off walls, so that the ears receive multiple instances of a sound separated by millisecond delays. Normally, we don’t perceive those echoes. The ears must be sensing them, but some circuitry in the brain is suppressing the perception. (Telephone systems have such circuitry too.) Based on my experience, I suspect that the suppression mechanism depends on the presence of signals from both ears.

Similar to echo suppression is noise suppression. I find I have lost the benefit of the “cocktail party effect,” whereby we select a single voice to attend to and filter out the background chatter. The truth is, I was never very good at that trick, but I’m notably worse now. A possibly related development is that I have the illusion of *enhanced* hearing acuity for some kinds of noise. The sound of water running from a faucet carries all through the house now. And the sound of my own chewing can be thunderous. In the past, perhaps the binaural screening process was turning down the gain on such commonplace distractions.

Even though no sounds of the outside world are reaching me from the left side of my head, that doesn’t mean the ear is silent. It seems to emit a steady hiss, which I’m told is common in this condition. Occasionally, in a very quiet room, I also hear faint chimes of pure sine tones. Do any of these signals actually originate in the affected cochlea, or are they phantoms that the brain merely attributes to that source?

The most curious interior noise is one that I’ve taken to calling the motor. In the still of the night, if I turn my head a certain way, I hear a putt-putt-putt with the rhythm of a sputtering lawn-mower engine, though very faint and voiceless. The intriguing thing is, the sound is altered by my breathing. If I hold my breath for a few seconds, the putt-putting slows and sometimes stops entirely. Then when I take a breath, the motor revs up again. Could this response indicate sensitivity to oxygen levels in the blood reaching my head? I like to imagine that the source of the noise is a single lonely neuron in the cochlea, bravely tapping out its spike train—the last little drummer boy in my left ear. But I wouldn’t be surprised to learn it comes from somewhere higher up in the auditory pathway.

One of the first manuscripts I edited at *Scientific American* (published in October 1973) was an article by the polymath Gerald Oster.

Ordinary beat tones are elementary physics: Whenever two waves combine and interfere, they create a new wave whose frequency is equal to the difference between the two original frequencies. In the case of sound waves at frequencies a few hertz apart, we perceive the beat tone as a throbbing modulation of the sound intensity. Oster asked what happens when the waves are not allowed to combine and interfere but instead are presented separately to the two ears. In certain frequency ranges it turns out that most people still hear the beats; evidently they are generated by some interference process within the auditory networks of the brain. Oster suggested that a likely site is the superior olivary nucleus. There are two of these bodies arrayed symmetrically just to the left and right of the midline in the back of the brain. They both receive signals from both ears.

Whatever the mechanism generating the binaural beats, it has to be happening somewhere inside the head. It’s a dramatic reminder that perception is not a passive process. We don’t really see and hear the world; we fabricate a model of it based on the sensations we receive—or fail to receive.

I’m hopeful that this little experiment of nature going on inside my cranium will soon end, but if it turns out to be a permanent condition, I’ll cope. As it happens, my listening skills will be put to the test over the next several months, as I’m going to be spending a lot of time in lecture halls. There’s the annual Joint Mathematics Meeting coming up in early January, then I’m spending the rest of the spring semester at the Simons Institute for the Theory of Computing in Berkeley. Lots of talks to attend. You’ll find me in the front of the room, to the left of the speaker.

My years with Dennis Flanagan offer much comfort when I consider the prospect of being half-deaf. His deficit was more severe than mine, and he put up with it from childhood. It never held him back—not from creating one of the world’s great magazines, not from leading several organizations, not from traveling the world, not from spearing a 40-pound bass while free diving in Great South Bay.

One worry I face is music—will I ever be able to enjoy it again?—but Dennis’s example again offers encouragement. We shared a great fondness for Schubert. I can’t know exactly what Dennis was hearing when we listened to a performance of the Trout Quintet together, but he got as much pleasure out of it as I did. And in his sixties he went beyond appreciation to performance. He had wanted to learn the cello, but a musician friend advised him to take up the brass instrument of the same register. He did so, and promptly learned to play a Bach suite for unaccompanied cello on the slide trombone.

]]>Given the present state of life in America, what we really need is an Approximation to Rationality Day, but that may have to wait for 20/1/21. In the meantime, let us merrily fiddle with numbers, searching for ratios of integers that brazenly invade the personal space of famous irrationals.

When I was a teenager, somebody told me about the number 355/113, which is an exceptionally good approximation to *π*. The exact value is

3.141592920353982520964564173482358455657958984375,

correct through the first six digits after the decimal point. In other words, it differs from the true value by less than one-millionth. I was intrigued, and so I set out to find an even better approximation. My search was necessarily a pencil-and-paper affair, since I had no access to any electronic or even mechanical aids to computation. The spiral-bound notebook in which I made my calculations has not survived, and I remember nothing about the outcome of the effort.

A dozen years later I acquired some computing machinery: a Hewlett-Packard programmable calculator, called the HP-41C. Here is the main loop of an HP-41C program that searches for good rational approximations. Note the date at the top of the printout (written in middle-endian format). Apparently I was finishing up this program just before Approximation Day in 1981.

What’s that you say? You’re not fluent in the 30-year-old Hewlett-Packard dialect of reverse Polish notation? All right, here’s a program that does roughly the same thing, written in an oh-so-modern language, Julia.

```
function approximate(T, dmax)
d = 1
leastError = T
while d <= dmax && leastError > 0
n = Int(round(d * T))
err = abs(T - n/d) / T
merit = 1 / ((n + d)^2 * err)
if err < leastError
println("$n/$d = $(n/d) error = $err merit = $merit")
leastError = err
end
d += 1
end
end
```

The algorithm is a naive, sequential search for fractions \(n/d\) that approximate the target number \(T\). For each value of \(d\), you need to consider only one value of \(n\), namely the integer nearest to \(d \times T\). (What happens if \(d \times T\) falls halfway between two integers? That can’t happen if \(T\) is irrational.) Thus you can begin with \(d = 1\) and continue up to a specified largest denominator \(d = dmax\). The accuracy of the approximation is measured by the error term \(|T - n/d| / T\). Whenever a value of \(n/d\) yields a new minimum error, the program prints a line of results. (This version of the algorithm works correctly only for \(T \gt 1\), but it can readily be adapted to \(T \lt 1\).)

The HP-41C has a numerical precision of 10 decimal digits, and so the closest possible approximation to *π* is 3.141592654. Back in 1981 I ran the program until it found a fraction equal to this value—a *perfect* approximation, from the program’s point of view. According to a note on the printout, that took 13 hours. The Julia program above, running on a laptop, completes the same computation in about three milliseconds. You’re welcome to take a scroll through the results, below. (The numbers are not digit-for-digit identical to those generated by the HP-41C because Julia calculates with higher precision, about 16 decimal digits.)

3/1 = 3.0 error = 0.045070341573315915 merit = 1.3867212410256813 13/4 = 3.25 error = 0.03450712996224109 merit = 0.10027514940370374 16/5 = 3.2 error = 0.018591635655129744 merit = 0.12196741256165179 19/6 = 3.1666666666666665 error = 0.007981306117055373 merit = 0.20046844169789904 22/7 = 3.142857142857143 error = 0.0004024993041452083 merit = 2.9541930379680195 179/57 = 3.1403508771929824 error = 0.00039526983405584675 merit = 0.04542368072920613 201/64 = 3.140625 error = 0.0003080138345651019 merit = 0.04623150469956595 223/71 = 3.140845070422535 error = 0.00023796324342470652 merit = 0.04861781754719378 245/78 = 3.141025641025641 error = 0.0001804858353094197 merit = 0.053107007660473673 267/85 = 3.1411764705882352 error = 0.00013247529441315622 merit = 0.060922789404334425 289/92 = 3.141304347826087 error = 9.177070539240495e-5 merit = 0.07506646742266793 311/99 = 3.1414141414141414 error = 5.6822320879624425e-5 merit = 0.10469195703580983 333/106 = 3.141509433962264 error = 2.6489760736525772e-5 merit = 0.19588127575835135 355/113 = 3.1415929203539825 error = 8.478310581938076e-8 merit = 53.85164473263654 52518/16717 = 3.1415923909792425 error = 8.37221074104896e-8 merit = 0.00249177288308447 52873/16830 = 3.141592394533571 error = 8.259072954625822e-8 merit = 0.0024921016732136797 53228/16943 = 3.1415923980404887 error = 8.147444291923546e-8 merit = 0.0024926612882136163 53583/17056 = 3.141592401500938 error = 8.03729477091334e-8 merit = 0.0024934520351304946 53938/17169 = 3.1415924049158366 error = 7.928595172899531e-8 merit = 0.0024944743578840687 54293/17282 = 3.141592408286078 error = 7.821317056655376e-8 merit = 0.0024957288257085445 54648/17395 = 3.141592411612532 error = 7.715432730151448e-8 merit = 0.002497216134767719 55003/17508 = 3.1415924148960475 error = 7.610915194012454e-8 merit = 0.0024989371196291283 55358/17621 = 3.1415924181374497 error = 7.507738155653036e-8 merit = 0.0025008927426067996 55713/17734 = 3.1415924213375437 error = 7.405876001006156e-8 merit = 0.0025030840968725283 56068/17847 = 3.1415924244971145 error = 7.305303737979925e-8 merit = 0.002505512419906649 56423/17960 = 3.1415924276169265 error = 7.20599703886498e-8 merit = 0.002508179074048983 56778/18073 = 3.141592430697726 error = 7.107932141383905e-8 merit = 0.0025110855755419263 57133/18186 = 3.14159243374024 error = 7.01108591937022e-8 merit = 0.002514233565685482 57488/18299 = 3.1415924367451775 error = 6.915435783817789e-8 merit = 0.0025176248413626597 57843/18412 = 3.1415924397132304 error = 6.820959725288218e-8 merit = 0.0025212613363967255 58198/18525 = 3.141592442645074 error = 6.727636243231866e-8 merit = 0.002525145143834103 58553/18638 = 3.141592445541367 error = 6.635444374259433e-8 merit = 0.0025292785028112976 58908/18751 = 3.141592448402752 error = 6.544363663870371e-8 merit = 0.0025336638062423296 59263/18864 = 3.141592451229856 error = 6.454374152317083e-8 merit = 0.002538303603848205 59618/18977 = 3.1415924540232916 error = 6.365456332197522e-8 merit = 0.002543200616913158 59973/19090 = 3.1415924567836564 error = 6.277591190862598e-8 merit = 0.002548357720152209 60328/19203 = 3.1415924595115348 error = 6.190760125601375e-8 merit = 0.0025537779743748956 60683/19316 = 3.1415924622074964 error = 6.10494500018427e-8 merit = 0.0025594646031786867 61038/19429 = 3.1415924648720983 error = 6.020128088319864e-8 merit = 0.002565421015548036 61393/19542 = 3.141592467505885 error = 5.936292059519092e-8 merit = 0.0025716508123781218 61748/19655 = 3.141592470109387 error = 5.853420007366852e-8 merit = 0.0025781577749599853 62103/19768 = 3.1415924726831244 error = 5.771495407114599e-8 merit = 0.002584945883912429 62458/19881 = 3.141592475227604 error = 5.690502101544554e-8 merit = 0.002592019327133724 62813/19994 = 3.141592477743323 error = 5.6104242868339024e-8 merit = 0.0025993825084809985 63168/20107 = 3.1415924802307655 error = 5.531246526690591e-8 merit = 0.0026070400439016164 63523/20220 = 3.1415924826904056 error = 5.4529537523533324e-8 merit = 0.0026149967637792084 63878/20333 = 3.141592485122707 error = 5.375531191912607e-8 merit = 0.002623257749852838 64233/20446 = 3.141592487528123 error = 5.2989644268538606e-8 merit = 0.0026318283126966317 64588/20559 = 3.141592489907097 error = 5.22323933551431e-8 merit = 0.0026407140236596287 64943/20672 = 3.1415924922600618 error = 5.148342135490336e-8 merit = 0.002649920699086574 65298/20785 = 3.1415924945874427 error = 5.0742592988226976e-8 merit = 0.002659454449139831 65653/20898 = 3.1415924968896545 error = 5.0009776226755164e-8 merit = 0.0026693216486930156 66008/21011 = 3.141592499167103 error = 4.928484186928889e-8 merit = 0.002679528965991537 66363/21124 = 3.1415925014201855 error = 4.8567663400430846e-8 merit = 0.0026900833784673454 66718/21237 = 3.1415925036492913 error = 4.7858116990585446e-8 merit = 0.0027009921818650063 67073/21350 = 3.141592505854801 error = 4.715608149595883e-8 merit = 0.0027122629998437182 67428/21463 = 3.1415925080370872 error = 4.6461438175842924e-8 merit = 0.002723903810648984 67783/21576 = 3.1415925101965145 error = 4.577407111668933e-8 merit = 0.002735922933992634 68138/21689 = 3.1415925123334407 error = 4.5093866383961494e-8 merit = 0.0027483290931549346 68493/21802 = 3.1415925144482157 error = 4.442071258756658e-8 merit = 0.002761131395876878 68848/21915 = 3.141592516541182 error = 4.375450074049751e-8 merit = 0.002774339356802981 69203/22028 = 3.1415925186126747 error = 4.309512411747499e-8 merit = 0.0027879629217230834 69558/22141 = 3.1415925206630235 error = 4.244247783087354e-8 merit = 0.002802012512429091 69913/22254 = 3.14159252269255 error = 4.179645953751142e-8 merit = 0.0028164989998024 70268/22367 = 3.1415925247015695 error = 4.115696873186072e-8 merit = 0.0028314337694556623 70623/22480 = 3.1415925266903915 error = 4.0523907028763286e-8 merit = 0.002846828724926181 70978/22593 = 3.141592528659319 error = 3.989717788071482e-8 merit = 0.00286269633032941 71333/22706 = 3.1415925306086496 error = 3.9276686719222797e-8 merit = 0.0028790496258831624 71688/22819 = 3.1415925325386738 error = 3.86623409548065e-8 merit = 0.0028959022542887716 72043/22932 = 3.141592534449677 error = 3.805404969428105e-8 merit = 0.0029132685103826087 72398/23045 = 3.1415925363419395 error = 3.7451723882115376e-8 merit = 0.0029311633622333107 72753/23158 = 3.1415925382157353 error = 3.685527615907423e-8 merit = 0.002949602495467867 73108/23271 = 3.1415925400713336 error = 3.626462086221821e-8 merit = 0.002968602349703417 73463/23384 = 3.1415925419089974 error = 3.567967430761971e-8 merit = 0.002988180133716996 73818/23497 = 3.141592543728987 error = 3.510035365949903e-8 merit = 0.003008353961046636 74173/23610 = 3.1415925455315543 error = 3.452657862652023e-8 merit = 0.003029142753805288 74528/23723 = 3.1415925473169497 error = 3.395826962413729e-8 merit = 0.0030505664465106676 74883/23836 = 3.141592549085417 error = 3.339534904681598e-8 merit = 0.0030726459300795604 75238/23949 = 3.1415925508371956 error = 3.283774056124397e-8 merit = 0.003095403169820992 75593/24062 = 3.141592552572521 error = 3.228536938904675e-8 merit = 0.0031188612412389144 75948/24175 = 3.1415925542916234 error = 3.173816202407169e-8 merit = 0.0031430444223940115 76303/24288 = 3.14159255599473 error = 3.1196046373746034e-8 merit = 0.0031679782521683033 76658/24401 = 3.141592557682062 error = 3.065895190043484e-8 merit = 0.0031936895918127546 77013/24514 = 3.1415925593538385 error = 3.01268089146511e-8 merit = 0.0032202067806171002 77368/24627 = 3.141592561010273 error = 2.9599549423203633e-8 merit = 0.003247559639023363 77723/24740 = 3.1415925626515766 error = 2.9077106281049175e-8 merit = 0.0032757796556622983 78078/24853 = 3.1415925642779543 error = 2.8559414180798277e-8 merit = 0.0033048999843237645 78433/24966 = 3.14159256588961 error = 2.804640823913544e-8 merit = 0.003334955716987436 78788/25079 = 3.1415925674867418 error = 2.753802541039899e-8 merit = 0.0033659838476231357 79143/25192 = 3.141592569069546 error = 2.703420321435919e-8 merit = 0.003398023556100075 79498/25305 = 3.1415925706382137 error = 2.6534880725724155e-8 merit = 0.0034311162371627422 79853/25418 = 3.141592572192934 error = 2.6039997867349902e-8 merit = 0.0034653057466538235 80208/25531 = 3.141592573733892 error = 2.554949569295635e-8 merit = 0.0035006385417218717 80563/25644 = 3.14159257526127 error = 2.5063316245769302e-8 merit = 0.0035371638899188347 80918/25757 = 3.1415925767752455 error = 2.4581402841236452e-8 merit = 0.0035749340371894456 81273/25870 = 3.1415925782759953 error = 2.410369936023742e-8 merit = 0.003614004535709633 81628/25983 = 3.1415925797636914 error = 2.3630150955873712e-8 merit = 0.00365443439340209 81983/26096 = 3.141592581238504 error = 2.3160703488036753e-8 merit = 0.00369628643041249 82338/26209 = 3.141592582700599 error = 2.2695304230197833e-8 merit = 0.003739627468693587 82693/26322 = 3.1415925841501404 error = 2.2233900879902193e-8 merit = 0.0037845288174018898 83048/26435 = 3.1415925855872895 error = 2.1776442124200985e-8 merit = 0.0038310665494126084 83403/26548 = 3.1415925870122043 error = 2.1322877639651253e-8 merit = 0.00387932189896066 83758/26661 = 3.1415925884250404 error = 2.087315795095796e-8 merit = 0.003929381726572982 84113/26774 = 3.1415925898259505 error = 2.0427234430973973e-8 merit = 0.003981339007706688 84468/26887 = 3.1415925912150855 error = 1.9985059017984126e-8 merit = 0.004035293430477111 84823/27000 = 3.1415925925925925 error = 1.9546584922495102e-8 merit = 0.004091351857390988 85178/27113 = 3.1415925939586176 error = 1.9111765637729565e-8 merit = 0.004149629190123568 85533/27226 = 3.1415925953133033 error = 1.868055578777407e-8 merit = 0.004210248941258058 85888/27339 = 3.141592596656791 error = 1.825291042078912e-8 merit = 0.004273344214343279 86243/27452 = 3.1415925979892174 error = 1.78287858571571e-8 merit = 0.004339058439193095 86598/27565 = 3.1415925993107203 error = 1.7408138417260385e-8 merit = 0.004407546707464268 86953/27678 = 3.1415926006214323 error = 1.6990925835061217e-8 merit = 0.004478976601684539 87308/27791 = 3.1415926019214853 error = 1.6577106127237806e-8 merit = 0.004553529781140699 87663/27904 = 3.1415926032110093 error = 1.6166637875900305e-8 merit = 0.004631403402447433 88018/28017 = 3.141592604490131 error = 1.5759480794022753e-8 merit = 0.0047128116308472546 88373/28130 = 3.141592605758976 error = 1.5355594877295166e-8 merit = 0.004797987771392931 88728/28243 = 3.1415926070176683 error = 1.4954940686839493e-8 merit = 0.0048871863549194705 89083/28356 = 3.141592608266328 error = 1.4557479914641577e-8 merit = 0.004980685405908598 89438/28469 = 3.141592609505076 error = 1.4163174252687263e-8 merit = 0.005078789613658918 89793/28582 = 3.1415926107340284 error = 1.3771986523826276e-8 merit = 0.005181833172630217 90148/28695 = 3.141592611953302 error = 1.338387969226633e-8 merit = 0.005290183824183623 90503/28808 = 3.1415926131630103 error = 1.2998817570363058e-8 merit = 0.005404246870669908 90858/28921 = 3.1415926143632653 error = 1.2616764535904027e-8 merit = 0.005524470210563737 91213/29034 = 3.141592615554178 error = 1.2237685249392783e-8 merit = 0.005651350205744754 91568/29147 = 3.1415926167358563 error = 1.1861545360838771e-8 merit = 0.005785438063205309 91923/29260 = 3.1415926179084073 error = 1.1488310802967408e-8 merit = 0.005927347979056494 92278/29373 = 3.141592619071937 error = 1.111794779122008e-8 merit = 0.006077766389438445 92633/29486 = 3.141592620226548 error = 1.0750423671902066e-8 merit = 0.006237462409303776 92988/29599 = 3.1415926213723435 error = 1.0385705649960649e-8 merit = 0.006407301439430316 93343/29712 = 3.1415926225094237 error = 1.0023761778491034e-8 merit = 0.00658826005755035 93698/29825 = 3.1415926236378877 error = 9.664560534662385e-9 merit = 0.006781444748602359 94053/29938 = 3.1415926247578327 error = 9.308070961075804e-9 merit = 0.006988114128701429 94408/30051 = 3.1415926258693556 error = 8.954262241690382e-9 merit = 0.007209706348604964 94763/30164 = 3.14159262697255 error = 8.603104549971112e-9 merit = 0.007447871540046976 95118/30277 = 3.14159262806751 error = 8.254567918024995e-9 merit = 0.007704513406469473 95473/30390 = 3.141592629154327 error = 7.90862336746494e-9 merit = 0.007981838717667477 95828/30503 = 3.1415926302330917 error = 7.565241919903853e-9 merit = 0.008282421184374838 96183/30616 = 3.1415926313038933 error = 7.224395162386583e-9 merit = 0.008609280341750632 96538/30729 = 3.1415926323668195 error = 6.8860552473899216e-9 merit = 0.008965982432171553 96893/30842 = 3.1415926334219573 error = 6.550194468748648e-9 merit = 0.009356770586561815 97248/30955 = 3.141592634469391 error = 6.216785968445456e-9 merit = 0.009786732283709331 97603/31068 = 3.1415926355092054 error = 5.885802747105052e-9 merit = 0.010262022067809991 97958/31181 = 3.1415926365414837 error = 5.557218370784088e-9 merit = 0.010790155391967196 98313/31294 = 3.1415926375663066 error = 5.231007112329143e-9 merit = 0.011380406450991833 98668/31407 = 3.1415926385837554 error = 4.907143103228812e-9 merit = 0.012044356667029002 99023/31520 = 3.1415926395939087 error = 4.585601323119603e-9 merit = 0.01279665696194468 99378/31633 = 3.141592640596845 error = 4.266356751638026e-9 merit = 0.013656119502875172 99733/31746 = 3.1415926415926414 error = 3.9493849338525334e-9 merit = 0.014647305857352692 100088/31859 = 3.141592642581374 error = 3.6346615561895634e-9 merit = 0.015802906908552822 100443/31972 = 3.1415926435631176 error = 3.322162870507497e-9 merit = 0.017167407267272748 100798/32085 = 3.1415926445379463 error = 3.0118652700227016e-9 merit = 0.018802933529623964 101153/32198 = 3.141592645505932 error = 2.703745854741474e-9 merit = 0.020798958087527405 101508/32311 = 3.1415926464671475 error = 2.397781441954139e-9 merit = 0.02328921472604781 101863/32424 = 3.141592647421663 error = 2.0939496970989362e-9 merit = 0.026482916558483883 102218/32537 = 3.1415926483695484 error = 1.7922284269720909e-9 merit = 0.03072676661447583 102573/32650 = 3.141592649310873 error = 1.492595579727815e-9 merit = 0.03664010548445531 102928/32763 = 3.141592650245704 error = 1.195029527594277e-9 merit = 0.04544847105306477 103283/32876 = 3.1415926511741086 error = 8.995092082315892e-10 merit = 0.05996553050516452 103638/32989 = 3.1415926520961532 error = 6.060132765838922e-10 merit = 0.0883984797913258 103993/33102 = 3.1415926530119025 error = 3.1452123574324146e-10 merit = 0.16916355170353897 104348/33215 = 3.141592653921421 error = 2.5012447443706518e-11 merit = 2.1127131430431656

The error values in the middle column of the table above shrink steadily as you read from the top of the list to the bottom. Each successive approximation is more accurate than all those above it. Does that also mean each successive approximation is *better* than those above it? I would say no. Any reasonable notion of “better” in this context has to take into account the size of the numerator and the denominator.

If you want an approximation of \(\pi\) accurate to seven digits, I can give you one off the top of my head: \(3141593/1000000\). But the numbers making up that ratio are themselves seven digits long. What makes \(355/113\) impressive is that it achieves seven-digit accuracy with only three digits in the numerator and the denominator. Accordingly, I would argue that a “better” approximation is one that minimizes both error and size. The rightmost column of the table, filled with numbers labeled “merit” is meant to quantify this intuition.

When I wrote that program in 1981, I chose a strange formula for merit, one that now baffles me:

\[\frac{1}{(n + d)^2 * err}.\]

Adding the numerator and denominator and then squaring the sum is an operation that makes no sense, although the formula as a whole does have the correct qualitative behavior, favoring both smaller errors and smaller values of \(n\) and \(d\). In trying to reconstruct what I had in mind 26 years ago, my best guess is that I was trying to capture a geometric insight, and I flubbed it when translating math into code. On this assumption, the correct figure of merit would be:

\[\frac{1}{\sqrt{n^2 + d^2} * err}.\]

To see where this formula comes from, consider a two-dimensional lattice of integers, with a ray of slope \(\pi\) drawn from the origin and going on to infinite distance.

Because the line’s slope is irrational, it will never pass through any point of the integer lattice, but it will have many near misses. The near-miss points, with coordinates interpreted as numerator and denominator, are the accurate approximations to \(\pi\). The diagram suggests a measure of the merit based on distances. An approximation gets better when we minimize the distance of the lattice point from the origin as well as the vertical distance from the point to the \(\pi\) line. That’s the meaning of the formula with \(\sqrt{n^2 + d^2}\) in the denominator.

Another approach to defining merit simply counts digits. The merit is the ratio of the number of correctly predicted digits in the irrational target \(T\) to the number of digits in the denominator. A problem with this scheme is that it’s rather coarse. For example, \(13/4\) and \(16/5\) both have single-digit denominators and they each get one digit of \(\pi\) correct, but

\(16/5\) actually has a smaller error.

To smooth out the digit-counting criterion, and distinguish between values that differ in magnitude but have the same number of digits, we can take logarithms of the numbers. Let merit equal: \(-log(err) / log(d)\). (The \(log(err)\) term is negated because the error is always less than \(1\) and so its logarithm is negative.)

Here’s a comparison of the three merit criteria for some selected approximations to \(\pi\):

n/d 1981 merit distance merit log merit 3/1 1.3867212448620723 7.016316181613145 -- 13/4 0.10027514901117529 2.1306165422053285 2.4284808488226544 16/5 0.12196741168912356 3.208700907602539 2.4760467349663537 19/6 0.20046843839209055 6.288264070960828 2.6960388788612515 22/7 2.954192079226498 107.61458138965322 4.017563128080901 179/57 0.04542369572848121 13.467303354323912 1.9381258641568968 201/64 0.04623152429195394 15.390920494844842 1.9441196398907357 223/71 0.04861784421796857 17.956388291625093 1.9573120958787444 245/78 0.05310704607396699 21.548988850935377 1.9785253787278367 267/85 0.06092284944437125 26.93965209372642 2.0098618723780515 289/92 0.07506657421887829 35.92841360228601 2.055872071177696 311/99 0.10469219759604646 53.921550739835986 2.1273838230139175 333/106 0.1958822412726219 108.02438852795403 2.259868093766371 355/113 53.76883630752973 31610.90993685001 3.444107245852723 52163/16604 0.002495514149618044 215.57611105028013 1.6757260012234105 • • • • • • • • • • • • 103993/33102 0.2892417579456485 49813.04849576935 2.1538978293241056 104348/33215 0.5006051667655171 86508.24042805366 2.2065386096084607 208341/66317 0.3403602724772912 117433.39822796892 2.1589243556399245 312689/99532 0.6343809166515098 328504.0552596196 2.207421489352196

All three measures agree that \(22/7\) and \(355/113\) are quite special. In other respects they give quite different views of the data. My weird 1981 formula compares \((n + d)^{-2}\) with \(err^{-1}\); the asymmetry in the exponents suggests the merit will tend to zero as \(n\) and \(d\) increase, at least in the average case. The maximum of the distance-based measure, on the other hand, appears to grow without bound. And the logarithmic merit function seems to be settling on a value near 2.0. This implies that we shouldn’t expect to see many \(n/d \) approximations where the number of correct digits is greater than twice the number of digits in \(d\). The late Tom Apostol and Mamikon A. Mnatsakanian proved a closely related proposition (“Surprisingly accurate rational approximations,” *Mathematics Magazine*, Vol. 75, No. 4 (Oct. 2002), pp. 307-310).

The final joke on my 1981 self is that all this searching for better approximants can be neatly sidestepped by a bit of algorithmic sophistication. The magic phrase is “continued fractions.” The continued fraction for \(\pi\) begins:

\[ \pi = 3+\cfrac{1}{7+\cfrac{1}{15+\cfrac{1}{1+\cfrac{1}{292+\cfrac{1}{1 + \cdots}}}}}\]

Evaluating the successive levels of this expression yields a sequence of “convergents” that should look familiar:

\[3/1, 22/7, 333/106, 355/113, 103993/33102, 104348/33215.\]

It is a series of “best” approximations to \(\pi\), generated without bothering with all the intervening non-“best” values. I produced this list in CoCalc (a.k.a. SageMathCloud), following the excellent tutorial in William Stein’s *Elementary Number Theory*. Even much larger approximants gush forth from the algorithm in milliseconds. Here’s the 100th element of the series:

\[\frac{4170888101980193551139105407396069754167439670144501}{1327634917026642108692848192776111345311909093498260}\]

A question remains: In what sense are these approximations “best”? It’s guaranteed that every element of the series is more *accurate* than all those that came before, but it’s not clear to me that they also satisfy any sort of compactness criterion. But that’s a question to be taken up another day. Perhaps on Continued Fraction Day.