Babylonian accountants and land surveyors did their arithmetic in base 60, presumably because sexagesimal numbers help with wrangling fractions. When you organize things in groups of 60, you can divide them into halves, thirds, fourths, fifths, sixths, tenths, twelfths, fifteenths, twentieths, thirtieths, and sixtieths. No smaller number has as many divisors, a fact that puts 60 in an elite class of “highly composite numbers.” (The term and the definition were introduced by Srinivasan Ramanujan in 1915.)

There’s something else about 60 that I never noticed until a few weeks ago—although the Babylonians might well have known it, and Ramanujan surely did. The number 60, with its extravagant wealth of divisors, sits wedged between two other numbers that have no divisors at all except for 1 and themselves. Both 59 and 61 are primes. Such pairs of primes, separated by a single intervening integer, are known as twin primes. Other examples are (5, 7), (29, 31), and (1949, 1951). Over the years twin primes have gotten a great deal of attention from number theorists. Less has been said about the number in the middle—the interloper that keeps the twins apart. At the risk of sounding slightly twee, I’m going to call such middle numbers *twin tweens*, or more briefly just *tweens*.

Is it just a fluke that a number lying between two primes is outstandingly *un*prime? Is 60 unusual in this respect, or is there a pattern here, common to all twin primes and their twin tweens? One can imagine some sort of fairness principle at work: If \(n\) is flanked by divisor-poor neighbors, it must have lots of divisors to compensate, to balance things out. Perhaps every pair of twin primes forms a chicken salad sandwich, with two solid slabs of prime bread surrounding a squishy filling that gets chopped up into many small pieces.

As a quick check on this hypothesis, let’s plot the number of divisors \(d(n)\) for each integer in the range from \(n = 1\) to \(n = 75\):

In Figure 1 twin primes are marked by light blue dots, and their associated tweens are dark blue. Highly composite numbers—those \(n\) that have more divisors than any smaller \(n\)—are distinguished by golden haloes.

The interval 1–75 is a very small sample of the natural numbers, and an unusual one, since twin primes are abundant among small integers but become quite rare farther out along the number line. For a broader view of the matter, consider the number of divisors for all positive integers up to \(n = 10^8\).

Figure 2 plots \(d(n)\) over the same range, breaking up the sequence of 100 million numbers into 500 blocks of size 200,000, and taking the average value of \(d(n)\) within each block.

A glance at the graph leaves no doubt that numbers living next door to primes have many more divisors, on average, than numbers without prime neighbors. It’s as if the primes were heaving all their divisors over the fence into the neighbor’s yard. Or maybe it’s the twin tweens who are the offenders here, vampirishly sucking all the divisors out of nearby primes.

Allow me to suggest a less-fanciful explanation. All primes (with one notorious exception) are odd numbers, which means that all nearest neighbors of primes (again with one exception) are even. In other words, the neighbors of primes have 2 as a divisor, which gives them an immediate head start in the race to accumulate divisors. Twin tweens have a further advantage: All of them (with one exception) are divisible by 3 as well as by 2. Why? Among any three consecutive integers, one of them must be a multiple of 3, and it can’t be either of the primes, so it must be the tween.

Being divisible by both 2 and 3, a twin tween is also divisible by 6. Any other prime factors of the tween combine with 2 and 3 to produce still more divisors. For example, a number divisible by 2, 3, and 5 is also divisible by 10, 15, and 30.

However, a closer look at Figure 3 gives reason for caution. In the graph the mean \(d(n)\) for numbers divisible by 6 is about 43, but we already know that for tweens—for the subset of numbers divisible by 6 that happen to live between twin primes—\(d(n)\) is greater than 51. That further enhancement argues that nearby primes do, after all, have some influence on divisor abundance.

Further evidence comes from another plot of \(d(n)\) for numbers with and without prime neighbors, but this time limited to integers divisible by 6. Thus all members of the sample population have the same “head start.” Figure 4 presents the results of this experiment.

If prime neighbors had no effect (other than ensuring divisibility by 6), the blue, green, and red curves would all trace the same trajectory, but they do not. Although the tweens’ lead in the divisor race is narrowed somewhat, it is certainly not abolished. Numbers with two prime neighbors have about 20 percent more divisors than the overall average for numbers divisible by 6. Numbers with one prime neighbor are also slightly above average. Thus factors of 2 and 3 can’t be the whole story.

Here’s a hand-wavy attempt to explain what might be going on. In the same way that any three consecutive integers must include one that’s a multiple of 3, any five consecutive integers must include a multiple of 5. If you choose an integer \(n\) at random, you can be certain that exactly one member of the set \(\{n - 2, n - 1, n, n + 1, n + 2\}\) is divisible by 5. Since \(n\) was chosen randomly, all members of the set are equally likely to assume this role, and so you can plausibly say that 5 divides \(n\) with probability 1/5.

But suppose \(n\) is a twin tween. Then \(n - 1\) and \(n + 1\) are known to be prime, and neither of them can be a multiple of 5. You must therefore redistribute the probability over the remaining three members of the set. Now it seems that \(n\) is divisible by 5 with probability 1/3. You can make a similar argument about divisibility by 7, or 11, or any other prime. In each case the probability is enhanced by the presence of nearby primes.

The same argument works just as well if you turn it upside down. Knowing that \(n\) is even tells you that \(n - 1\) and \(n + 1\) are odd. If \(n\) is also divisible by 3, you know that \(n - 1\) and \(n + 1\) do not have 3 as a factor. Likewise with 5, 7, and so on. Thus finding an abundance of divisors in \(n\) raises the probability that \(n\)’s neighbors are prime.

Does this scheme make sense? There’s room for more than a sliver of doubt. Probability has nothing to do with the distribution of divisors among the integers. The process that determines divisibility is as simple as counting, and there’s nothing random about it. Imagine you are dealing cards to players seated at a very long table, their chairs numbered in sequence from 1 to infinity. First you deal a 1 card to every player. Then, starting with player 2, you deal a 2 card to every second player. Then a 3 card to player 3 and to every third player thereafter, and so on. When you finish (*if* you finish!), each player holds cards for all the divisors of his or her chair number, and no other cards.

This card-dealing routine sounds to me like a reasonable description of how integers are constructed. Adding a probabilistic element modifies the algorithm in a crucial way. As you are dealing out the divisors, every now and then a player refuses to accept a card, saying “Sorry, I’m prime; please give it to one of my neighbors.” You then select a recipient at random from the set of neighbors who lie within a suitable range.

Building a number system by randomly handing out divisors like raffle tickets might make for an amusing exercise, but it will not produce the numbers we all know and love. Primes do not wave off the dealer of divisors; on the contrary, a prime is prime because none of the cards between 1 and \(n\) land at its position. And integers divisible by 5 are not scattered along the number line according to some local probability distribution; they occur with absolute regularity at every fifth position. Introducing probability in this context seems misleading and unhelpful.

And yet… And yet…! It works.

Figure 5 shows the proportion of all \(n \le 10^8\) that are divisible by 5, classified according to the number of primes adjacent to \(n\). The overall average is 1/5, as it must be. But among twin tweens, with two prime neighbors, the proportion is close to 1/3, as the probabilistic model predicts. And about 1/4 of the numbers with a single prime neighbor are multiples of 5, which again is in line with predictions of the probabilistic model. And note that values of \(n\) with no prime neighbors have a below-average fraction of multiples of 5. In one sense this fact is unsurprising and indeed inescapable: If the average is fixed and one subgroup has an excess, then the complement of that subgroup must have a deficit. Nevertheless, it seems strange. How can an absence of nearby primes depress the density of multiples of 5 below the global average? After all, we know that multiples of 5 invariably come along at every fifth integer.

In presenting these notes on the quirks of tweens, I don’t mean to suggest there is some deep flaw or paradox in the theory of numbers. The foundations of arithmetic will not crumble because I’ve encountered more 5s than I expected in prime-rich segments of the number line. No numbers have wandered away from their proper places in the sequence of integers; we don’t have to track them down and put them back where they belong. What needs adjustment is my understanding of their distribution. In other words, the question is not so much “What’s going on?” but “What’s the right way to think about this?”

I know several wrong ways. The idea that twin primes repel divisors and tweens attract them is a just-so story, like the one about the crocodile tugging on the elephant’s nose. It might be acceptable as a metaphor, but not as a mechanism. There are no force fields acting between integers. Numbers cannot sense the properties of their neighbors. Nor do they have personalities; they are not acquisitive or abstemious, gregarious or reclusive.

The probabilistic formulation seems better than the just-so story in that it avoids explicit mention of causal links between numbers. But that idea still lurks beneath the surface. What does it mean to say “The presence of prime neighbors increases a number’s chance of being divisible by 5”? Surely not that the prime is somehow influencing the outcome of a coin flip or the spin of a roulette wheel. The statement makes sense only as an empirical, statistical observation: In a survey of many integers, more of those near primes are found to have 5 as a divisor than those far from primes. This is a true statement, but it doesn’t tell us why it’s true. (And there’s no assurance the observation holds for *all* numbers.)

Probabilistic reasoning is not a novelty in number theory. In 1936 Harald Cramér wrote:

With respect to the ordinary prime numbers, it is well known that, roughly speaking, we may say that the chance that a given integer \(n\) should be a prime is approximately \(1 / \log n\).

Cramér went on to build a whole probabalistic model of the primes, ignoring all questions of divisibility and simply declaring each integer to be prime or composite based on the flip of a coin (biased according to the \(1 / \log n\) probability). In some respects the model works amazingly well. As Figure 6 shows, it not only matches the overall trend in the distribution of primes, but it also gives a pretty good estimate of the prevalence of twin primes.

However, much else is missing from Cramér’s random model. In particular, it totally misses the unusual properties of twin tweens. In the region of the number line shown in Figure 6, from 1 to \(10^7\), true tweens have an average of 44 divisors. The Cramér tweens are just a random sampling of ordinary integers, with about 16 divisors on average.

Fundamentally, the distribution of twin tweens is inextricably tangled up with the distribution of twin primes. You can’t have one without the other. And that’s not a helpful development, because the distribution of primes (twin and otherwise) is one of the deepest enigmas in modern mathematics.

Throughout these musings I have characterized the distinctive properties of tweens by counting their divisors. There are other ways of approaching the problem that yield similar results. For example, you can calculate the sum of the divisors, \(\sigma(n)\) rather than the count \(d(n)\), a technique that leads to the classical notions of abundant, perfect, and deficient numbers. A number \(n\) is abundant if \(\sigma(n) > 2n\), perfect if \(\sigma(n) = 2n\) and deficient if \(\sigma(n) \lt 2n\). When I wrote a program to sort tweens into these three categories, I was surprised to discover that apart from 4 and 6, every twin tween is an abundant number. Such a definitive result seemed remarkable and important. But then I learned that *every* number divisible by 6, other than 6 itself, is abundant.

Another approach is to count the prime factors of \(n\), rather than the divisors. These two quantities are highly correlated, although the number of divisors is not simply a function of the number of factors; it also depends on the diversity of the factors.

As Figure 7 shows, counting primes tells a story that’s much the same as counting divisors. A typical integer in the range up to \(10^8\) has about four factors, whereas a typical tween in the same range has more than six.

We could also look at the size of \(n\)’s largest prime factor, \(f_{\max}(n)\), which is connected to the concept of a smooth number. A number is smooth if all of its prime factors are smaller than some stated bound, which might be a fixed constant or a function of \(n\), such as \(\sqrt n\). One measure of smoothness is \(\log n\, / \log f_{\max}(n)\). Computations show that by this definition tweens are smoother than average: The ratio of logs is about 2.0 for the tweens and about 1.7 for all numbers.

One more miscellaneous fact: No tween except 4 is a perfect square. Proof: Suppose \(n = m^2\) is a tween. Then \(n - 1 = m^2 - 1\), which has the factors \(m - 1\) and \(m + 1\), and so it cannot be prime. An extension of this argument rules out cubes and all higher perfect powers as twin tweens.

When I first began to ponder the tweens, I went looking to see what other people might have said on the subject. I didn’t find much. Although the literature on twin primes is immense, it focuses on the primes themselves, and especially on the question of whether there are infinitely many twins—a conjecture that’s been pending for 170 years. The numbers sandwiched between the primes are seldom mentioned.

The many varieties of highly composite numbers also have an enthusiastic fan club, but I have found little discussion of their frequent occurrence as neighbors of primes.

Could it be that I’m the first person ever to notice the curious properties of twin tweens? No. I am past the age of entertaining such droll thoughts, even transiently. If I have not found any references, it’s doubtless because I’m not looking in the right places. (Pointers welcome.)

I did eventually find a handful of interesting articles and letters. The key to tracking them down, unsurprisingly, was the Online Encyclopedia of Integer Sequences, which more and more seems to function as the Master Index to Mathematics. I had turned there first, but the entry on the tween sequence, titled “average of twin prime pairs,” has only one reference, and it was not exactly a gold mine of enlightenment. It took me to a 1967 volume of *Eureka*, the journal of the Cambridge Archimedians. All I found there (on page 16) was a very brief problem, asking for the continuation of the sequence 4, 6, 12, 18, 30, 42,…

There the matter rested for a few weeks, but eventually I came back to the OEIS to follow cross-links to some related sequences. Under “highly composite numbers” I found a reference to “Prime Numbers Generated From Highly Composite Numbers,” by Benny Lim (*Parabola*, 2018). Lim looked at the neighbors of the first 1,000 highly composite numbers. At the upper end of this range, the numbers are very large \((10^{76})\) and primes are very rare—but they are not so rare among the nearest neighbors of highly composite numbers, Lim found.

Another cross reference took me off to sequence A002822, labeled “Numbers \(m\) such that \(6m-1\), \(6m+1\) are twin primes.” In other words, this is the set of numbers that, when multiplied by 6, yield the twin tweens. The first few terms are: 1, 2, 3, 5, 7, 10, 12, 17, 18, 23, 25, 30, 32, 33, 38. The OEIS entry includes a link to a 2011 paper by Francesca Balestrieri, which introduces an intriguing idea I have not yet fully absorbed. Balestrieri shows that \(6m + 1\) is composite if \(m\) can be expressed as \(6xy + x - y\) for some integers \(x\) and \(y\), and otherwise is prime. There’s a similar but slightly more complicated rule for \(6m - 1\). She then proceeds to prove the following theorem:

The Twin Prime Conjecture is true if, and only if, there exist infinitely many \(m \in N\) such that \(m \ne 6xy + x - y\) and \(m \ne 6xy + x + y\) and \(m \ne 6xy - x - y\), for all \(x, y \in N\).

Other citations took me to three papers by Antonie Dinculescu, dated 2012 to 2018, which explore related themes. But the most impressive documents were two letters written to Neil J. A. Sloane, the founder and prime mover of the OEIS. In 1984 the redoubtable Solomon W. Golomb wrote to point out several publications from the 1950s and 60s that mention the connection between \(6xy \pm x \pm y\) and twin primes. The earliest of these appearances was a problem in the *American Mathematical Monthly*, proposed and solved by Golomb himself. He was 17 when he made the discovery, and this was his first mathematical publication. To support his claim of priority, he offered a $100 reward to anyone who could find an earlier reference.

The second letter, from Matthew A. Myers of Spruce Pine, North Carolina, supplies two such prior references. One is a well-known history of number theory by L. E. Dickson, published in 1919. The other is an “Essai sur les nombres premiers” by Wolfgang Ludwig Krafft, a colleague of Euler’s at the St. Petersburg Academy of Sciences. The essay was read to the academy in 1798 and published in Volume 12 of the *Nova Acta Academiae Scientiarum Imperialis Petropolitanae*. It deals at length with the \(6xy \pm x \pm y\) matter. This was 50 years before the concept of twin primes, and their conjectured infinite supply, was introduced by Alphonse de Polignac.

Myers reported these archeological findings in 2018. Sadly, Golomb had died two years earlier.

]]>Peaks and troughs, lumps and slumps, wave after wave of surge and retreat: I have been following the ups and downs of this curve, day by day, for a year and a half. The graph records the number of newly reported cases of Covid-19 in the United States for each day from 21 January 2020 through 20 July 2021. That’s 547 days, and also exactly 18 months. The faint, slender vertical bars in the background give the raw daily numbers; the bold blue line is a seven-day trailing average. (In other words, the case count for each day is averaged with the counts for the six preceding days.)

I struggle to understand the large-scale undulations of that graph. If you had asked me a few years ago what a major epidemic might look like, I would have mumbled something about exponential growth and decay, and I might have sketched a curve like this one:

My imaginary epidemic is so much simpler than the real thing! The number of daily infections goes up, and then it comes down again. It doesn’t bounce around like a nervous stock market. It doesn’t have seasonal booms and busts.

The graph tracing the actual incidence of the disease makes at least a dozen reversals of direction, along with various small-scale twitches and glitches. The big mountain in the middle has foothills on both sides, as well as some high alpine valleys between craggy peaks. I’m puzzled by all this structural embellishment. Is it mere noise—a product of random fluctuations—or is there some driving mechanism we ought to know about, some switch or dial that’s turning the infection process on and off every few months?

I have a few ideas about possible explanations, but I’m not so keen on any of them that I would try to persuade you they’re correct. However, I *do* hope to persuade you there’s something here that needs explaining.

Before going further, I want to acknowledge my sources. The data files I’m working with are curated by *The New York Times*, based on information collected from state and local health departments. Compiling the data is a big job; the *Times* lists more than 150 workers on the project. They need to reconcile the differing and continually shifting policies of the reporting agencies, and then figure out what to do when the incoming numbers look fishy. (Back in June, Florida had a day with –40,000 new cases.) The entire data archive, now about 2.3 gigabytes, is freely available on GitHub. Figure 1 in this article is modeled on a graph updated daily in the *Times*.

I must also make a few disclaimers. In noodling around with this data set I am not trying to forecast the course of the epidemic, or even to retrocast it—to develop a model accurate enough to reproduce details of timing and magnitude observed over the past year and a half. I’m certainly not offering medical or public-health advice. I’m just a puzzled person looking for simple mechanisms that might explain the overall shape of the incidence curve, and in particular the roller coaster pattern of recurring hills and valleys.

So far, four main waves of infection have washed over the U.S., with a fifth wave now beginning to look like a tsunami. Although the waves differ greatly in height, they seem to be coming at us with some regularity. Eyeballing Figure 1, I get the impression that the period from peak to peak is pretty consistent, at roughly four months.

Periodic oscillations in epidemic diseases have been noticed many times before. The classical example is measles in Great Britain, for which there are weekly records going back to the early 18th century. In 1917 John Brownlee studied the measles data with a form of Fourier analysis called the periodogram. He found that the strongest peak in the frequency spectrum came at a period of 97 weeks, reasonably consistent with the widespread observation that the disease reappears every second year. But Brownlee’s periodograms bristle with many lesser peaks, indicating that the measles rhythm is not a simple, uniform drumbeat.

The mechanism behind the oscillatory pattern in measles is easy to understand. The disease strikes children in the early school years, and the virus is so contagious that it can run through an entire community in a few weeks. Afterwards, another outbreak can’t take hold until a new cohort of children has reached the appropriate age. No such age dependence exists in Covid-19, and the much shorter period of recurrence suggests that some other mechanism must be driving the oscillations. Nevertheless, it seems worthwhile to try applying Fourier methods to the data in Figure 1.

The Fourier transform decomposes any curve representing a function of time into a sum of simple sine and cosine waves of various frequencies. In textbook examples, the algorithm works like magic. Take a wiggly curve like this one:

Feed it into the Fourier transformer, turn the crank, and out comes a graph that looks like this, showing the coefficients of various frequency components:

It would be illuminating to have such a succinct encoding of the Covid curve—a couple of numbers that explain its comings and goings. Alas, that’s not so easy. When I poured the Covid data into the Fourier machine, this is what came out:

More than a dozen coefficients have significant magnitude; some are positive and some are negative; no obvious pattern leaps off the page. This spectrum, like the simpler one in Figure 4, holds all the information needed to reconstruct its input. I confirmed that fact with a quick computational experiment. But looking at the jumble of coefficients doesn’t help me to understand the structure of the Covid curve. The Fourier-transformed version is even more baffling than the original.

One lesson to be drawn from this exercise is that the Fourier transform is indeed magic: If you want to make it work, you need to master the dark arts. I am no wizard in this department; as a matter of fact, most of my encounters with Fourier analysis have ended in tears and trauma. No doubt someone with higher skills could tease more insight from the numbers than I can. But I doubt that any degree of Fourier finesse will lead to some clear and concise description of the Covid curve. Even with \(200\) years of measles records, Brownlee wasn’t able to isolate a clear signal; with just a year and a half of Covid data, success is unlikely.

Yet my foray into the Fourier realm was not a complete waste of time. Applying the inverse Fourier transform to the first 13 coefficients (for wavenumbers 0 through 6) yields this set of curves:

It looks a mess, but the sum of these 13 sinusoidal waves yields quite a handsome, smoothed version of the Covid curve. In Figure 7 below, the pink area in the background shows the *Times* data, smoothed with the seven-day rolling average. The blue curve, much smoother still, is the waveform reconstructed from the 13 Fourier coefficients.

The reconstruction traces the outlines of all the large-scale features of the Covid curve, with serious errors only at the end points (which are always problematic in Fourier analysis). The Fourier curve also fails to reproduce the spiky triple peak atop the big surge from last winter, but I’m not sure that’s a defect.

Let’s take a closer look at that triple peak. The graph below is an expanded view of the two-month interval from 20 November 2020 through 20 January 2021. The light-colored bars in the background are raw data on new cases for each day; the dark blue line is the seven-day rolling average computed by the *Times*.

The peaks and valleys in this view are just as high and low as those in Figure 1; they look less dramatic only because the horizontal axis has been stretched ninefold. My focus is not on the peaks but on the troughs between them. (After all, there wouldn’t be three peaks if there weren’t two troughs to separate them.) Three data points marked by pink bars have case counts far lower than the surrounding days. Note the dates of those events. November 26 was Thanksgiving Day in the U.S. in 2020; December 25 is Christmas Day, and January 1 is New Years Day. It looks like the virus went on holiday, but of course it was actually the medical workers and public health officials who took a day off, so that many cases did not get recorded on those days.

There may be more to this story. Although the holidays show up on the chart as low points in the progress of the epidemic, they were very likely occasions of higher-than-normal contagion, because of family gatherings, religious services, revelry, and so on. (I commented on the Thanksgiving risk last fall.) Those “extra” infections would not show up in the statistics until several days later, along with the cases that went undiagnosed or unreported on the holidays themselves. Thus each dip appears deeper because it is followed by a surge.

All in all, it seems likely that the troughs creating the triple peak are a reporting anomaly, rather than a reflection of genuine changes in the viral transmission rate. Thus a curve that smooths them away may give a better account of what’s really going on in the population.

There’s another transformation—quite different from Fourier analysis—that might tell us something about the data. The time derivative of the Covid curve gives the rate of change in the infection rate—positive when the epidemic is surging, negative when it’s retreating. Because we’re working with a series of discrete values, computing the derivative is trivially easy: It’s just the series of differences between successive values.

The derivative of the raw data *(blue)* looks like a seismograph recording from a jumpy day along the San Andreas. The three big holiday anomalies—where case counts change by 100,000 per day—produce dramatic excursions. The smaller jagged waves that extend over most of the 18-month interval are probably connected with the seven-day cycle of data collection, which typically show case counts increasing through the work week and then falling off on the weekend.

The seven-day trailing average is designed to suppress that weekly cycle, and it also smooths over some larger fluctuations. The resulting curve *(red)* is not only less jittery but also has much lower amplitude. (I have expanded the vertical scale by a factor of two for clarity.)

Finally, the reconstituted curve built by summing 13 Fourier components yields a derivative curve *(green)* whose oscillations are mere ripples, even when stretched vertically by a factor of four.

The points where the derivative curves cross the zero line—going from positive to negative or vice versa—correspond to peaks or troughs in the underlying case-count curve. Each zero crossing marks a moment when the epidemic’s trend reversed direction, when a growing daily case load began to decline, or a falling one turned around and started gaining again. The blue raw-data curve has 255 zero crossings, and the red averaged curve has 122. Even the lesser figure implies that the infection trend is reversing course every four or five days, which is not plausible; most of those sign changes must result from noise in the data.

The silky smooth green curve has nine zero crossings, most of which seem to signal real changes in the course of the epidemic. I would like to understand what’s causing those events.

You catch a virus. (Sorry about that.) Some days later you infect a few others, who after a similar delay pass the gift on to still more people. This is the mechanism of exponential (or geometric) growth. With each link in the chain of transmission the number of new cases is multiplied by a factor of \(R\), which is the natural growth ratio of the epidemic—the average number of cases spawned by each infected individual. Starting with a single case at time \(t = 0\), the number of new infections at any later time \(t\) is \(R^t\). If \(R\) is greater than 1, even very slightly, the number of cases increases without limit; if \(R\) is less than 1, the epidemic fades away.

The average delay between when you become infected and when you infect others is known as the serial passage time, which I am going to abbreviate T_{SP} and take as the basic unit for measuring the duration of events in the epidemic. For Covid-19, one T_{SP} is probably about five days.

Exponential growth is famously unmanageable. If \(R = 2\), the case count doubles with every iteration: \(1, 2, 4, 8, 16, 32\dots\). It increases roughly a thousandfold after 10 T_{SP}, and a millionfold after 20 T_{SP}. The rate of increase becomes so steep that I can’t even graph it except on a logarithmic scale, where an exponential trajectory becomes a straight line.

What is the value of \(R\) for the SARS-CoV-2 virus? No one knows for sure. The number is difficult to measure, and it varies with time and place. Another number, \(R_0\), is often regarded as an intrinsic property of the virus itself, an indicator of how easily it passes from person to person. The Centers for Disease Control and Prevention (CDC) suggests that \(R_0\) for SARS-CoV-2 probably lies between 2.0 and 4.0, with a best guess of 2.5. That would make it catchier than influenza but less so than measles. However, the CDC has also published a report arguing that \(R_0\) is “easily misrepresented, misinterpreted, and misapplied.” I’ve certainly been confused by much of what I’ve read on the subject.

Whatever numerical value we assign to \(R\), if it’s greater than 1, it cannot possibly describe the complete course of an epidemic. As \(t\) increases, \(R^t\) will grow at an accelerating pace, and before you know it the predicted number of cases will exceed the global human population. For \(R = 2\), this absurdity arrives after about 33 T_{SP}, which is less than six months.

What we need is a mathematical model with a built-in limit to growth. As it happens, the best-known model in epidemiology features just such a mechanism. Introduced almost 100 years ago by W. O. Kermack and A. G. McKendrick of the Royal College of Physicians in Edinburgh, *removed*, acknowledging that recovery is not the only way an infection can end. But I don’t want to be grim today. Also note that I’m using a calligraphic font for \(\mathcal{S},\mathcal{I}\), and \(\mathcal{R}\) to avoid confusion between the growth rate \(R\) and the recovered group \(\mathcal{R}\).*susceptible* \((\mathcal{S})\), *infective* \((\mathcal{I})\), and *recovered* \((\mathcal{R})\). Initially (before a pathogen enters the population), everyone is of type \(\mathcal{S}\). Susceptibles who contract the virus become infectives—capable of transmitting the disease to other susceptibles. Then, after each infective’s illness has run its course, that person joins the recovered class. Having acquired immunity through infection, the recovereds will never be susceptible again.

A SIR epidemic can’t keep growing indefinitely for the same reason that a forest fire can’t keep burning after all the trees are reduced to ashes. At the beginning of an epidemic, when the entire population is susceptible, the case count can grow exponentially. But growth slows later, when each infective has a harder time finding susceptibles to infect. Kermack and McKendrick made the interesting discovery that the epidemic dies out before it has reached the entire population. That is, the last infective recovers before the last susceptible is infected, leaving a residual \(\mathcal{S}\) population that has never experienced the disease.

The SIR model itself has gone viral in the past few years. There are tutorials everywhere on the web, as well as scholarly articles and books. (I recommend *Epidemic Modelling: An Introduction*, by Daryl J. Daley and Joseph Gani. Or try *Mathematical Modelling of Zombies* if you’re feeling brave.) Most accounts of the SIR model, including the original by Kermack and McKendrick, are presented in terms of differential equations. I’m instead going to give a version with discrete time steps—\(\Delta t\) rather than \(dt\)—because I find it easier to explain and because it translates line for line into computer code. In the equations that follow, \(\mathcal{S}\), \(\mathcal{I}\), and \(\mathcal{R}\) are real numbers in the range \([0, 1]\), representing proportions of some fixed-size population.

\[\begin{align}

\Delta\mathcal{I} & = \beta \mathcal{I}\mathcal{S}\\[0.8ex]

\Delta\mathcal{R} & = \gamma \mathcal{I}\\[1.5ex]

\mathcal{S}_{t+\Delta t} & = \mathcal{S}_{t} - \Delta\mathcal{I}\\[0.8ex]

\mathcal{I}_{t+\Delta t} & = \mathcal{I}_{t} + \Delta\mathcal{I} - \Delta\mathcal{R}\\[0.8ex]

\mathcal{R}_{t+\Delta t} & = \mathcal{R}_{t} + \Delta\mathcal{R}\\[0.8ex]

\end{align}\]

The first equation, with \(\Delta\mathcal{I}\) on the left hand side, describes the actual contagion process—the recruitment of new infectives from the susceptible population. The number of new cases is proportional to the product of \(\mathcal{I}\) and \(\mathcal{S}\), since the only way to propagate the disease is to bring together someone who already has it with someone who can catch it. The constant of proportionality, \(\beta\), is a basic parameter of the model. It measures how often (per T_{SP}) an infective person encounters others closely enough to communicate the virus.

The second equation, for \(\Delta\mathcal{R}\), similarly describes recovery. For epidemiological purposes, you don’t have to be feeling tiptop again to be done with the disease; recovery is defined as the moment when you are no long capable of infecting other people. The model takes a simple approach to this idea, withdrawing a fixed fraction of the infectives in every time step. The fraction is given by the parameter \(\gamma\).

After the first two equations calculate the number of people who are changing their status in a given time step, the last three equations update the population segments accordingly. The susceptibles lose \(\Delta\mathcal{I}\) members; the infectives gain \(\Delta\mathcal{I}\) and lose \(\Delta\mathcal{R}\); the recovereds gain \(\Delta\mathcal{R}\). The total population \(\mathcal{S} + \mathcal{I} + \mathcal{R}\) remains constant throughout.

In

Here’s what happens when you put the model in motion. For this run I set \(\beta = 0.6\) and \(\gamma = 0.2\), which implies that \(\rho = 3.0\). Another number that needs to be specified is the initial proportion of infectives; I chose \(10^{-6}\), or in other words one in a million. The model ran for \(100\) T_{SP}, with a time step of \(\Delta t = 0.1\) T_{SP}; thus there were \(1{,}000\) iterations overall.

Let me call your attention to a few features of this graph. At the outset, nothing seems to happen for weeks and weeks, and then all of a sudden a huge blue wave rises up out of the calm waters. Starting from one case in a population of a million, it takes \(18\) T_{SP} to reach one case in a thousand, but just \(12\) more T_{SP} to reach one in \(10\).

Note that the population of infectives reaches a peak near where the susceptible and recovered curves cross—that is, where \(\mathcal{S} = \mathcal{R}\). This relationship holds true over a wide range of parameter values. That’s not surprising, because the whole epidemic process acts as a mechanism for converting susceptibles into recovereds, via a brief transit through the infective stage. But, as Kermack and McKendrick predicted, the conversion doesn’t quite go to completion. At the end, about \(6\) percent of the population remains in the susceptible category, and there are no infectives left to convert them. This is the condition called herd immunity, where the population of susceptibles is so diluted that most infectives recover before they can find someone to infect. It’s the end of the epidemic, though it comes only after \(90+\) percent of the people have gotten sick. (That’s not what I would call a victory over the virus.)

The \(\mathcal{I}\) class in the SIR model can be taken as a proxy for the tally of new cases tracked in the *Times* data. The two variables are not quite the same—infectives remain in the class \(\mathcal{I}\) until they recover, whereas new cases are counted only on the day they are reported—but they are very similar and roughly proportional to one another. And that brings me to the main point I want to make about the SIR model: In Figure 11 the blue curve for infectives looks nothing like the corresponding new-case tally in Figure 1. In the SIR model, the number of infectives starts near zero, rises steeply to a peak, and thereafter tapers gradually back to zero, never to rise again. It’s a one-hump camel. The roller coaster Covid curve is utterly different.

The detailed geometry of the \(\mathcal{I}\) curve depends on the values assigned to the parameters \(\beta\) and \(\gamma\). Changing those variables can make the curve longer or shorter, taller or flatter. But no choice of parameters will give the curve multiple ups and downs. There are no oscillatory solutions to these equations.

The SIR model strikes me as so plausible that it—or some variant of it—really *must* be a correct description of the natural course of an epidemic. But that doesn’t mean it can explain what’s going on right now with Covid-19. A key element of the model is saturation: the spread of the disease stalls when there are too few susceptibles left to catch it. That can’t be what caused the steep downturn in Covid incidence that began in January of this year, or the earlier slumps that began in April and July of 2020. We were nowhere near saturation during any of those events, and we still aren’t now. (For the moment I’m ignoring the effects of vaccination. I’ll take up that subject below.)

In Figure 11 there comes a dramatic triple point where each of the three categories constitutes about a third of the total population. If we projected that situation onto the U.S., we would have (in very round numbers) \(100\) million active infections, another \(100\) million people who have recovered from an earlier bout with the virus, and a third \(100\) million who have so far escaped (but most of whom will catch it in the coming weeks). That’s orders of magnitude beyond anything seen so far. The cumulative case count, which combines the \(\mathcal{I}\) and \(\mathcal{R}\) categories, is approaching \(37\) million, or \(11\) percent of the U.S. population. *y* axis spans the full U.S. population of 330 million, you get the flatline graph at right.)

If we are still on the early part of the curve, in the regime of rampant exponential growth, it’s easy to understand the surging accelerations we’ve seen in the worst moments of the epidemic. The hard part is explaining the repeated slowdowns in viral transmission that punctuate the Covid curve. In the SIR model, the turnaround comes when the virus begins to run out of victims, but that’s a one-time phenomenon, and we haven’t gotten there yet. What can account for the deep valleys in the *Times* Covid curve?

Among the ideas that immediately come to mind, one strong contender is feedback. We all have access to real-time information on the status of the epidemic. It comes from governmental agencies, from the news media, from idiots on Facebook, and from the personal communications of family, friends, and neighbors. Most of us, I think, respond appropriately to those messages, modulating our anti-contagion precautions according to the perceived severity of the threat. When it’s scary outside, we hunker down and mask up. When the risk abates, it’s party time again! I can easily imagine a scenario where such on-again, off-again measures would trigger oscillations in the incidence of the disease.

If this hypothesis turns out to be true, it is cause for both hope and frustration. Hope because the interludes of viral retreat suggest that our tools for fighting the epidemic must be reasonably effective. Frustration because the rebounds indicate we’re not deploying those tools as well as we could. Look again at the Covid curve of Figure 1, specifically at the steep downturn following the winter peak. In early February, the new case rate was dropping by 30,000 per week. Over the first three weeks of that month, the rate was cut in half. Whatever we were doing then, it was working brilliantly. If we had just continued on the same trajectory, the case count would have hit zero in early March. Instead, the downward slope flattened, and then turned upward again.

We had another chance in June. All through April and May, new cases had been falling steadily, from 65,000 to 17,000, a pace of about –800 cases a day. If we’d been able to sustain that rate for just three more weeks, we’d have crossed the finish line in late June. But again the trend reversed course, and by now we’re back up well above 100,000 cases a day.

Are these pointless ups and downs truly caused by feedback effects? I don’t know. I am particularly unsure about the “if only” part of the story—the idea that if only we’d kept the clamps on for just a few more weeks, the virus would have been eradicated, or nearly so. But it’s an idea to keep in mind.

Perhaps we could learn more by creating a feedback loop in the SIR model, and looking for oscillatory dynamics. Negative feedback is anything that acts to slow the infection rate when that rate is high, and to boost it when it’s low. Such a contrarian mechanism could be added to the model in several ways. Perhaps the simplest is a lockdown threshold: Whenever the number of infectives rises above some fixed limit, everyone goes into isolation; when the \(\mathcal{I}\) level falls below the threshold again, all cautions and restrictions are lifted. It’s an all-or-nothing rule, which makes it simple to implement. We need a constant to represent the threshold level, and a new factor (which I am naming \(\varphi\), for fear) in the equation for \(\Delta \mathcal{I}\):

\[\Delta\mathcal{I} = \beta \varphi \mathcal{I} \mathcal{S}\]

The \(\varphi\) factor is 1 whenever \(\mathcal{I}\) is below the threshold, and \(0\) when it rises above. The effect is to shut down all new infections as soon as the threshold is reached, and start them up again when the rate falls.

Does this scheme produce oscillations in the \(\mathcal{I}\) curve? Strictly speaking, the answer is yes, but you’d never guess it by looking at the graph.

The feedback loop serves as a control system, like a thermostat that switches the furnace off and on to maintain a set temperature. In this case, the feedback loop holds the infective population steady at the threshold level, which is set at 0.05. On close examination, it turns out that \(\mathcal{I}\) is oscillating around the threshold level, but with such a short period and tiny amplitude that the waves are invisible. The value bounces back and forth between 0.049 and 0.051.

To get macroscopic oscillations, we need more than feedback. The SIR output shown below comes from a model that combines feedback with a delay between measuring the state of the epidemic and acting on that information. Introducing such a delay is not the only way to make the model swing, but it’s certainly a plausible one. As a matter of fact, a model *without* any delay, in which a society responds instantly to every tiny twitch in the case count, seems wholly unrealistic.

The model of Figure 14 adopts the same parameters, \(\beta = 0.6\) and \(\gamma = 0.2\), as the version of Figure 13, as well as the same lockdown threshold \((0.05)\). It differs only in the timing of events. If the infective count climbs above the threshold at time \(t\), control measures do not take effect until \(t + 3\); in the meantime, infections continue to spread through the population. The delay and overshoot on the way up are matched by a delay and undershoot at the other end of the cycle, when lockdown continues for three T_{SP} after the threshold is crossed on the way down.

Given these specific parameters and settings, the model produces four cycles of diminishing amplitude and increasing wavelength. (No further cycles are possible because \(\mathcal{I}\) remains below the threshold.) Admittedly, those four spiky sawtooth peaks don’t look much like the humps in the Covid curve. If we’re going to seriously consider the feedback hypothesis, we’ll need stronger evidence than this. But the model is very crude; it could be refined and improved.

The fact is, I really want to believe that feedback could be a major component in the oscillatory dynamics of Covid-19. It would be comforting to know that our measures to combat the epidemic have had a powerful effect, and that we therefore have some degree of control over our fate. But I’m having a hard time keeping the faith. For one thing, I would note that our countermeasures have not always been on target. In the epidemic’s first wave, when the characteristics of the virus were largely unknown, the use of facemasks was discouraged (except by medical personnel), and there was a lot of emphasis on hand washing, gloves, and sanitizing surfaces. Not to mention drinking bleach. Those measures were probably not very effective in stopping the virus, but the wave receded anyway.

Another source of doubt is that wavelike fluctuations are not unique to Covid-19.

**Update:** Alina Glaubitz and Feng Fu of Dartmouth have applied a game-theoretical approach to generating oscillatory dynamics in a SIR model. Their work was published last November but I have only just learned about it from an article by Lina Sorg in *SIAM News*.

One detail of the SIR model troubles me. As formulated by Kermack and McKendrick, the model treats infection and recovery as symmetrical, mirror-image processes, both of them described by exponential functions. The exponential rule for infections makes biological sense. You can only get the virus via transmission from someone who already has it, so the number of new infections is proportional to the number of existing infections. But recovery is different; it’s not contagious. Although the duration of the illness may vary to some extent, there’s no reason to suppose it would depend on the number of other people who are sick at the same time.

In the model, a fixed fraction of the infectives, \(\gamma \mathcal{I}\), recover at every time step. _{SP}, more than \(10\) percent of the original group remain infectious.

The long tail of this distribution corresponds to illnesses that persist for many weeks. Such cases exist, but they are rare. According to the CDC, most Covid patients have no detectable “replication-competent virus” \(10\) days after the onset of symptoms. Even in the most severe cases, with immunocompromised patients, \(20\) days of infectivity seems to be the outer limit. *Theoretical Population Biology* Vol. 60, No. 1, pp. 59–71).

Modifying the model for a fixed period of infectivity is not difficult. We can keep track of the infectives with a data structure called a queue. Each new batch of newly recruited infectives goes into the tail of the queue, then advances one place with each time step. After \(m\) steps (where \(m\) is the duration of the illness), the batch reaches the head of the queue and joins the company of the recovered. Here is what happens when \(m = 3\) T_{SP}:

I chose \(3\) T_{SP} for this example because it is close to the median duration in the exponential distribution in Figure 11, and therefore ought to resemble the earlier result. And so it does, approximately. As in Figure 11, the peak in the infectives curve lies near where the susceptible and recovered curves cross. But the peak never grows quite as tall; and, for obvious reasons, it decays much faster. As a result, the epidemic ends with many more susceptibles untouched by the disease—more than 25 percent.

A disease duration of \(3\) T_{SP}, or about \(15\) days, is still well over the CDC estimates of the typical length. Shortening the queue to \(2\) T_{SP}, or about \(10\) days, transforms the outcome even more dramatically. Now the susceptible and recovered curves never cross, and almost \(70\) percent of the susceptible population remains uninfected when the epidemic peters out.

Figure 18 comes a little closer to describing the current Covid situation in the U.S. than the other models considered above. It’s not that the curves’ shape resembles that of the data, but the overall magnitude or intensity of the epidemic is closer to observed levels. Of the models presented so far, this is the first that reaches a natural limit without burning through most of the population. Maybe we’re on to something.

On the other hand, there are a couple of reasons for caution. First, with these parameters, the initial growth of the epidemic is extremely slow; it takes \(40\) or \(50\) T_{SP} before infections have a noticeable effect on the population. That’s well over six months. Second, we’re still dealing with a one-hump camel. Even though most of the population is untouched, the epidemic has run its course, and there will not be a second wave. Something important is still missing.

Before leaving this topic behind, I want to point out that the finite time span of a viral infection gives us a special point of leverage for controlling the spread of the disease. The viruses that proliferate in your body must find a new host within a week or two, or else they face extinction. Therefore, if we could completely isolate every individual in the country for just two or three weeks, the epidemic would be over. Admittedly, putting each and every one of us into solitary confinement is not feasible (or morally acceptable), but we could strive to come as close as possible, strongly discouraging all forms of person-to-person contact. Testing, tracing, and quarantines would deal with straggler cases. My point is that a very strict but brief lockdown could be both more effective and less disruptive than a loose one that goes on for months. Where other strategies aim to flatten the curve, this one attempts to break the chain.

When Covid emerged late in 2019, it was soon labeled a *pandemic*, signifying that it’s bigger than a mere epidemic, that it’s everywhere. But it’s not everywhere at once. Flareups have skipped around from region to region and country to country. Perhaps we should view the pandemic not as a single global event but as an ensemble of more localized outbreaks.

Suppose small clusters of infections erupt at random times, then run their course and subside. By chance, several geographically isolated clusters might be active over the same range of dates and add up to a big bump in the national case tally. Random fluctuations could also produce interludes of widespread calm, which would cause a dip in the national curve.

We can test this notion with a simple computational experiment, modeling a population divided into \(N\) clusters or communities. For each cluster a SIR model generates a curve giving the proportion of infectives as a function of time. The initiation time for each of these mini-epidemics is chosen randomly and independently. Summing the \(N\) curves gives the total case count for the country as a whole, again as a function of time.

Before scrolling down to look at the graphs generated by this process, you might make a guess about how the experiment will turn out. In particular, how will the shape of the national curve change as the number of local clusters increases?

If there’s just one cluster, then the national curve is obviously identical to the trajectory of the disease in that one place. With two clusters, there’s a good chance they will not overlap much, and so the national curve will probably have two humps, with a deep valley between them. With \(N = 3\) or \(4\), overlap becomes more of an issue, but the sum curve still seems likely to have \(N\) humps, perhaps with shallower depressions separating them. Before I saw the results, I made the following guess about the behavior of the sum as \(N\) continues increasing: The sum curve will always have approximately \(N\) peaks, I thought, but the height difference between peaks and troughs should get steadily smaller. Thus at large \(N\) the sum curve would have many tiny ripples, small enough that the overall curve would appear to be one broad, flat-topped hummock.

So much for my intuition. Here are two examples of sum curves generated by clusters of \(N\) mini-epidemics, one curve for \(N = 6\) and one for \(N = 50\). The histories for individual clusters are traced by fine red lines; the sums are blue. All the curves have been scaled so that the highest peak of the sum curve touches \(1.0\).

My *not* increase in proportion to \(N\). As a matter of fact, both of the sum curves in Figure 19 have four distinct peaks (possibly five in the example at right), even though the number of component curves contributing to the sum is only six in one case and is \(50\) in the other.

I have to confess that the two examples in Figure 19 were not chosen at random. I picked them because they looked good, and because they illustrated a point I wanted to make—namely that the number of peaks in the sum curve remains nearly constant, regardless of the value of \(N\). Figure 20 assembles a more representative sample, selected without deliberate bias but again showing that the number of peaks is not sensitive to \(N\), although the valleys separating those peaks get shallower as \(N\) grows.

The

The question is: Why \(4 \pm 1\)? Why do we keep seeing those particular numbers? And if \(N\), the number of components being summed, has little influence on this property of the sum curve, then what *does* govern it? I puzzled over these questions for some time before a helpful analogy occurred to me.

Suppose you have a bunch of sine waves, all at the same frequency \(f\) but with randomly assigned phases; that is, the waves all have the same shape, but they are shifted left or right along the \(x\) axis by random amounts. What would the sum of those waves look like? The answer is: another sine wave of frequency \(f\). This is a little fact that’s been known for ages (at least since Euler) and is not hard to prove, but it still comes as a bit of a shock every time I run into it. I believe the same kind of argument can explain the behavior of a sum of SIR curves, even though those curves are not sinusoidal. The component SIR curves have a period of \(20\) to \(30\) T_{SP}. In a model run that spans \(100\) T_{SP}, these curves can be considered to have a frequency of between three and five cycles per epidemic period. Their sum should be a wave with the same frequency—something like the Covid curve, with its four (or four and a half) prominent humps. In support of this thesis, when I let the model run to \(200\) T_{SP}, I get a sum curve with seven or eight peaks.

I am intrigued by the idea that an epidemic might arrive in cyclic waves not because of anything special about viral or human behavior but because of a mathematical process akin to wave interference. It’s such a cute idea, dressing up an obscure bit of counterintuitive mathematics and bringing it to bear on a matter of great importance to all of us. And yet, alas, a closer look at the Covid data suggests that nature doesn’t share my fondness for summing waves with random phases.

Figure 22, again based on data extracted from the *Times* archive, plots \(49\) curves, representing the time course of case counts in the Lower \(48\) states and the District of Columbia. I have separated them by region, and in each group I’ve labeled the trace with the highest peak. We already know that these curves yield a sum with four tall peaks; that’s where this whole investigation began. But the \(49\) curves do not support the notion that those peaks might be produced by summing randomly timed mini-epidemics. The oscillations in the \(49\) curves are *not* randomly timed; there are strong correlations between them. And many of the curves have multiple humps, which isn’t possible if each mini-epidemic is supposed to act like a SIR model that runs to completion.

Although these curves spoil a hypothesis I had found alluring, they also reveal some interesting facts about the Covid epidemic. I knew that the first wave was concentrated in New York City and surrounding areas, but I had not realized how much the second wave, in the summer of 2020, was confined to the country’s southern flank, from Florida all the way to California. The summer wave this year is also most intense in Florida and along the Gulf Coast. Coincidence? When I showed the graphs to a friend, she responded: “Air conditioning.”

Searching for the key to Covid, I’ve tried out three slightly whimsical notions: the possibility of a periodic signal, like the sunspot cycle, bringing us waves of infection on a regular schedule; feedback loops producing yo-yo dynamics in the case count; and randomly timed mini-epidemics that add up to a predictable, slow variation in the infection rate. In retrospect they still seem like ideas worth looking into, but none of them does a convincing job of explaining the data.

In my mind the big questions remain unanswered. In November of 2020 the daily tally of new Covid cases was above \(100{,}000\) and rising at a fearful rate. Three months later the infection rate was falling just as steeply. What changed between those dates? What action or circumstance or accident of fate blunted the momentum of the onrushing epidemic and forced it into retreat? And now, just a few months after the case count bottomed out, we are again above \(100{,}000\) cases per day and still climbing. What has changed again to bring the epidemic roaring back?

There are a couple of obvious answers to these questions. As a matter of fact, those answers are sitting in the back of the room, frantically waving their hands, begging me to call on them. First is the vaccination campaign, which has now reached half the U.S. population. The incredibly swift development, manufacture, and distribution of those vaccines is a wonder. In the coming months and years they are what will save us, if anything can. But it’s not so clear that vaccination is what stopped the big wave last winter. The sharp downturn in infection rates began in the first week of January, when vaccination was just getting under way in the U.S. On January 9 (the date when the decline began) only about \(2\) percent of the population had received even one dose. The vaccination effort reached a peak in April, when more than three million doses a day were being administered. By then, however, the dropoff in case numbers had stopped and reversed. If you want to argue that the vaccine ended the big winter surge, it’s hard to align causation with chronology.

On the other hand, the level of vaccination that has now been achieved should exert a powerful damping effect on any future waves. Removing half the people from the susceptible list may not be enough to reach herd immunity and eliminate the virus from the population, but it ought to be enough to turn a growing epidemic into a wilting one.

The SIR model of Figure 23 has the same parameters as the model of Figure 3 \((\beta = 0.6, \gamma = 0.2,\) implying \(\rho = 3.0)\), but \(50\) percent of the people are vaccinated at the start of the simulation. With this diluted population of susceptibles, essentially nothing happens for almost a year. The epidemic is dormant, if not quite defunct.

That’s the world we should be living in right now, according to the SIR model. Instead, today’s new case count is \(141{,}365\); almost \(81{,}556\) people are hospitalized with Covid infections; and 704 people have died. What gives? How can this be happening?

At this point I must acknowledge the other hand waving in the back of the room: the Delta variant, otherwise known as B.1.617.2. Half a dozen mutations in the viral spike protein, which binds to a human cell-surface receptor, have apparently made this new strain at least twice as contagious as the original one.

In Figure 24 contagiousness is doubled by increasing \(\rho\) from \(3.0\) to \(6.0\). That boost brings the epidemic back to life, although there is still quite a long delay before the virus becomes widespread in the unvaccinated half of the population.

The situation is may well be worse than the model suggests. All the models I have reported on here pretend that the human population is homogeneous, or thoroughly mixed. If an infected person is about to spread the virus, everyone in the country has the same probability of being the recipient. This assumption greatly simplifies the construction of the model, but of course it’s far from the truth. In daily life you most often cross paths with people like yourself—people from your own neighborhood, your own age group, your own workplace or school. Those frequent contacts are also people who share your vaccination status. If you are unvaccinated, you are not only more vulnerable to the virus but also more likely to meet people who carry it. This somewhat subtle birds-of-a-feather effect is what allows us to have “a pandemic of the unvaccinated.”

Recent reports have brought still more unsettling and unwelcome news, with evidence that even fully vaccinated people may sometimes spread the virus. I’m waiting for confirmation of that before I panic. (But I’m waiting with my face mask on.)

Having demonstrated that I understand nothing about the history of the epidemic in the U.S.—why it went up and down and up and down and up and down and up and down—I can hardly expect to understand the present upward trend. About the future I have no clue at all. Will this new wave tower over all the previous ones, or is it Covid’s last gasp? I can believe anything.

But let us not despair. This is not the zombie apocalypse. The survival of humanity is not in question. It’s been a difficult ordeal for the past 18 months, and it’s not over yet, but we can get through this. Perhaps, at some point in the not-too-distant future, we’ll even understand what’s going on.

Today *The New York Times* has published two articles raising questions similar to those asked here. David Leonhardt and Ashley Wu write a “morning newsletter” titled “Has Delta Peaked?” Apoorva Mandavilli, Benjamin Mueller, and Shalini Venugopal Bhagat ask “When Will the Delta Surge End?” I think it’s fair to say that the questions in the headlines are not answered in the articles, but that’s not a complaint. I certainly haven’t answered them either.

I’m going to take this opportunity to update two of the figures to include data through the end of August.

In *Figure 1r* the surge in case numbers that was just beginning back in late July has become a formidable sugarloaf peak. The open question is what happens next. Leonhardt and Wu make the optimistic observation that “The number of new daily U.S. cases has risen less over the past week than at any point since June.” In other words, we can celebrate a negative second derivative: The number of cases is still high and is still growing, but it’s growing slower than it was. And they cite the periodicity observed in past U.S. peaks and in those elsewhere as a further reason for hope that we may be near the turning point.

*Figure 22r* tracks where the new cases are coming from. As in earlier peaks, California, Texas, and Florida stand out.

*The New York Times* data archive for Covid-19 cases and deaths in the United States is available in this GitHub repository. The version I used in preparing this article, cloned on 21 July 2021, is identified as “commit c3ab8c1beba1f4728d284c7b1e58d7074254aff8″. You should be able to access the identical set of files through this link.

Source code for the SIR models and for generating the illustrations in this article is also available on GitHub. The code is written in the Julia programming language and organized in Pluto notebooks.

Bartlett, M. S. 1956. Deterministic and stochastic models for recurrent epidemics. In *Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 4: Contributions to Biology and Problems of Health*, pp. 81–109. Berkeley: University of California Press.

Bartlett, M. S. 1957. Measles periodicity and community size. *Journal of the Royal Statistical Society, Series A (General)* 120(1):48–70.

Brownlee, John. 1917. An investigation into the periodicity of measles epidemics in London from 1703 to the present day by the method 0f the periodogram. *Philosophical Transactions of the Royal Society of London, Series B*, 208:225–250.

Daley, D. J., and J. Gani. 1999. *Epidemic Modelling: An Introduction*. Cambridge: Cambridge University Press.

Glaubitz, Alina, and Feng Fu. 2020. Oscillatory dynamics in the

dilemma of social distancing. *Proceedings of the Royal Society A* 476:20200686.

Kendall, David G. 1956. Deterministic and stochastic epidemics in closed populations. In *Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 4: Contributions to Biology and Problems of Health*, pp. 149–165. Berkeley: University of California Press.

Kermack, W. O., and A. G. McKendrick. 1927. A contribution to the mathematical theory of epidemics. *Proceedings of the Royal Society of London, Series A* 115:700–721.

Lloyd, Alun L. 2001. Realistic distributions of infectious periods in epidemic models: changing patterns of persistence and dynamics. *Theoretical Population Biology* 60:59–71.

Smith?, Robert (editor). 2014. *Mathematical Modelling of Zombies*. Ottawa: University of Ottawa Press.

Over the years I’ve had several opportunities to play with Ising models and Monte Carlo methods, and I thought I had a pretty good grasp of the basic principles. But, you know, the more you know, the more you know you don’t know.

In 2019 I wrote a brief article on Glauber dynamics, a technique for analyzing the Ising model introduced by Roy J. Glauber, a Harvard physicist. In my article I presented an Ising simulation written in JavaScript, and I explained the algorithm behind it. Then, this past March, I learned that I had made a serious blunder. The program I’d offered as an illustration of Glauber dynamics actually implemented a different procedure, known as the Metropolis algorithm. Oops. (The mistake was brought to my attention by a comment signed “L. Y.,” with no other identifying information. Whoever you are, L. Y., thank you!)

A few days after L. Y.’s comment appeared, I tracked down the source of my error: I had reused some old code and neglected to modify it for its new setting. I corrected the program—only one line needed changing—and I was about to publish an update when I paused for thought. Maybe I could dismiss my goof as mere carelessness, but I realized there were other aspects of the Ising model and the Monte Carlo method where my understanding was vague or superficial. For example, I was not entirely sure where to draw the line between the Glauber and Metropolis procedures. (I’m even less sure now.) I didn’t know which features of the two algorithms are most essential to their nature, or how those features affect the outcome of a simulation. I had homework to do.

Since then, Monte Carlo Ising models have consumed most of my waking hours (and some of the sleeping ones). Sifting through the literature, I’ve found sources I never looked at before, and I’ve reread some familiar works with new understanding and appreciation. I’ve written a bunch of computer programs to clarify just which details matter most. I’ve dug into the early history of the field, trying to figure out what the inventors of these techniques had in mind when they made their design choices. Three months later, there are still soft spots in my knowledge, but it’s time to tell the story as best I can.

This is a long article—nearly 6,000 words. If you can’t read it all, I recommend playing with the simulation programs. There are five of them: 1, 2, 3, 4, 5. On the other hand, if you just can’t get enough of this stuff, you might want to have a look at the source code for those programs on GitHub. The repo also includes data and scripts for the graphs in this article.

Let’s jump right in with an Ising simulation. Below this paragraph is a grid of randomly colored squares, and beneath that a control panel. Feel free to play. Press the *Run* button, adjust the temperature slider, and click the radio buttons to switch back and forth between the Metropolis and the Glauber algorithms. The *Step* button slows down the action, showing one frame at a time. Above the grid are numerical readouts labeled *Magnetization* and *Local Correlation*; I’ll explain below what those instruments are monitoring.

The model consists of 10,000 sites, arranged in a \(100 \times 100\) square lattice, and colored either dark or light, indigo or mauve. In the initial condition (or after pressing the *Reset* button) the cells are assigned colors at random. Once the model is running, more organized patterns emerge. Adjacent cells “want” to have the same color, but thermal agitation disrupts their efforts to reach accord.

The lattice is constructed with “wraparound” boundaries: *periodic* boundary conditions. Imagine infinitely many copies of the lattice laid down like square tiles on an infinite plane.

When the model is running, changing the temperature can have a dramatic effect. At the upper end of the scale, the grid seethes with activity, like a boiling cauldron, and no feature survives for more than a few milliseconds. In the middle of the temperature range, large clusters of like-colored cells begin to appear, and their lifetimes are somewhat longer. When the system is cooled still further, the clusters evolve into blobby islands and isthmuses, coves and straits, all of them bounded by strangely writhing coastlines. Often, the land masses eventually erode away, or else the seas evaporate, leaving a featureless monochromatic expanse. In other cases broad stripes span the width or height of the array.

Whereas nudging the temperature control utterly transforms the appearance of the grid, the effect of switching between the two algorithms is subtler.

- At high temperature (5.0, say), both programs exhibit frenetic activity, but the turmoil in Metropolis mode is more intense.
- At temperatures near 3.0, I perceive something curious in the Metropolis program: Blobs of color seem to migrate across the grid. If I stare at the screen for a while, I see dense flocks of crows rippling upward or leftward; sometimes there are groups going both ways at once, with wings flapping. In the Glauber algorithm, blobs of color wiggle and jiggle like agitated amoebas, but they don’t go anywhere.
- At still lower temperatures (below about 1.5), the Ising world calms down. Both programs converge to the same monochrome or striped patterns, but Metropolis gets there faster.

I have been noticing these visual curiosities—the fluttering wings, the pulsating amoebas—for some time, but I have never seen them mentioned in the literature. Perhaps that’s because graphic approaches to the Ising model are of more interest to amateurs like me than to serious students of the underlying physics and mathematics. Nevertheless, I would like to understand where the patterns come from. (Some partial answers will emerge toward the end of this article.)

For those who want numbers rather than pictures, I offer the magnetization and local-correlation meters at the top of the program display. Magnetization is a global measure of the extent to which one color or the other dominates the lattice. Specifically, it is the number of dark cells minus the number of light cells, divided by the total number of cells:

\[M = \frac{\blacksquare - \square }{\blacksquare + \square}.\]

\(M\) ranges from \(-1\) (all light cells) through \(0\) (equal numbers of light and dark cells) to \(+1\) (all dark).

Local correlation examines all pairs of nearest-neighbor cells and tallies the number of like pairs minus the number of unlike pairs, divided by the total number of pairs:

\[R = \frac{(\square\square + \blacksquare\blacksquare) - (\square\blacksquare + \blacksquare\square) }{\square\square + \square\blacksquare + \blacksquare\square + \blacksquare\blacksquare}.\]

Again the range is from \(-1\) to \(+1\). These two quantities are both measures of order in the Ising system, but they focus on different spatial scales, global *vs.* local. All three of the patterns in Figure 1 have magnetization \(M = 0\), but they have very different values of local correlation \(R\).

The Ising model was invented 100 years ago by Wilhelm Lenz of the University of Hamburg, who suggested it as a thesis project for his student Ernst Ising. It was introduced as a model of a permanent magnet.

A real ferromagnet is a quantum-mechanical device. Inside, electrons in neighboring atoms come so close together that their wave functions overlap. Under these circumstances the electrons can reduce their energy slightly by aligning their spin vectors. According to the rules of quantum mechanics, an electron’s spin must point in one of two directions; by convention, the directions are labeled *up* and *down*. The ferromagnetic interaction favors pairings with both spins up or both down. Each spin generates a small magnetic dipole moment. Zillions of them acting together hold your grocery list to the refrigerator door.

In the Ising version of this structure, the basic elements are still called spins, but there is nothing twirly about them, and nothing quantum mechanical either. They are just abstract variables constrained to take on exactly two values. It really doesn’t matter whether we name the values up and down, mauve and indigo, or plus and minus. (Within the computer programs, the two values are \(+1\) and \(-1\), which means that flipping a spin is just a matter of multiplying by \(-1\).) In this article I’m going to refer to up/down spins and dark/light cells interchangeably, adopting whichever term is more convenient at the moment.

As in a ferromagnet, nearby Ising spins want to line up in parallel; they reduce their energy when they do so. This urge to match spin directions (or cell colors) extends only to nearest neighbors; more distant sites in the lattice have no influence on one another. In the two-dimensional square lattice—the setting for all my simulations—each spin’s four nearest neighbors are the lattice sites to the north, east, south, and west (including “wraparound” neighbors for cells on the boundary lines).

If neighboring spins want to point the same way, why don’t they just go ahead and do so? The whole system could immediately collapse into the lowest-energy configuration, with all spins up or all down. That does happen, but there are complicating factors and countervailing forces. Neighborhood conflicts are the principal complication: Flipping your spin to please one neighbor may alienate another. The countervailing influence is heat. Thermal fluctuations can flip a spin even when the change is energetically unfavorable.

The behavior of the Ising model is easiest to understand at the two extremities of the temperature scale. As the temperature \(T\) climbs toward infinity, thermal agitation completely overwhelms the cooperative tendencies of adjacent spins, and all possible states of the system are on an equal footing. The lattice becomes a random array of up and down spins, each of which is rapidly changing its orientation. At the opposite end of the scale, where \(T\) approaches zero, the system freezes. As thermal fluctuations subside, the spins sooner or later sink into the orderly, low-energy, fully magnetized state—although “sooner or later” can stretch out toward the age of the universe.

Things get more complicated between these extremes. Experiments with real magnets show that the transition from a hot random state to a cold magnetized state is not gradual. As the material is cooled, spontaneous magnetization appears suddenly at a critical temperature called the Curie point (about 840 K in iron). Lenz and Ising wondered whether this abrupt onset of magnetization could be seen in their simple, abstract model. Ising was able to analyze only a one-dimensional version of the system—a line or ring of spins—and he was disappointed to see no sharp phase transition. He thought this result would hold in higher dimensions as well, but on that point he was later proved wrong.

The idealized phase diagram in Figure 2 (borrowed with amendments from my 2019 article) outlines the course of events for a two-dimensional model. To the right, above the critical temperature \(T_c\), there is just one phase, in which up and down spins are equally abundant on average, although they may form transient clusters of various sizes. Below the critical point the diagram has two branches, leading to all-up and all-down states at zero temperature. As the system is cooled through \(T_c\) it must follow one branch or the other, but which one is chosen is a matter of chance.

The immediate vicinity of \(T_c\) is the most interesting region of the phase diagram. If you scroll back up to Program 1 and set the temperature near 2.27, you’ll see filigreed patterns of all possible sizes, from single pixels up to the diameter of the lattice. The time scale of fluctuations also spans orders of magnitude, with some structures winking in and out of existence in milliseconds and others lasting long enough to test your patience.

All of this complexity comes from a remarkably simple mechanism. The model makes no attempt to capture all the details of ferromagnet physics. But with minimal resources—binary variables on a plain grid with short-range interactions—we see the spontaneous emergence of cooperative, collective phenomena, as self-organizing patterns spread across the lattice. The model is not just a toy. Ten years ago Barry M. McCoy and Jean-Marie Maillard wrote:

It may be rightly said that the two dimensional Ising model… is one of the most important systems studied in theoretical physics. It is the first statistical mechanical system which can be exactly solved which exhibits a phase transition.

As I see it, the main question raised by the Ising model is this: At a specified temperature \(T\), what does the lattice of spins look like? Of course “look like” is a vague notion; even if you know the answer, you’ll have a hard time communicating it except by showing pictures. But the question can be reformulated in more concrete ways. We might ask: Which configurations of the spins are most likely to be seen at temperature \(T\)? Or, conversely: Given a spin configuration \(S\), what is the probability that \(S\) will turn up when the lattice is at temperature \(T\)?

Intuition offers some guidance on these points. Low-energy configurations should always be more likely than high-energy ones, at any finite temperature. Differences in energy should have a stronger influence at low temperature; as the system gets warmer, thermal fluctuations can mask the tendency of spins to align. These rules of thumb are embodied in a little fragment of mathematics at the very heart of the Ising model:

\[W_B = \exp\left(\frac{-E}{k_B T}\right).\]

Here \(E\) is the energy of a given spin configuration, found by scanning through the entire lattice and tallying the number of nearest-neighbor pairs that have parallel *vs.* antiparallel spins. In the denominator, \(T\) is the absolute temperature and \(k_B\) is Boltzmann’s constant, named for Ludwig Boltzmann, the Austrian maestro of statistical mechanics. The entire expression is known as the Boltzmann weight, and it determines the probability of observing any given configuration.

In standard physical units the constant \(k_B\) is about \(10^{-23}\) joules per kelvin, but the Ising model doesn’t really live in the world of joules and kelvins. It’s a mathematical abstraction, and we can measure its energy and temperature in any units we choose. The convention among theorists is to set \(k_B = 1\), and thereby eliminate it from the formula altogether. Then we can treat both energy and temperature as if they were pure numbers, without units.

Figure 3 confirms that the equation for the Boltzmann weight yields curves with an appropriate general shape. Lower energies correspond to higher weights, and lower temperatures yield steeper slopes. These features make the curves plausible candidates for describing a physical system such as a ferromagnet. Proving that they are not only good candidates but the unique, true description of a ferromagnet is a mathematical and philosophical challenge that I decline to take on. Fortunately, I don’t have to. The model, unlike the magnet, is a human invention, and we can make it obey whatever laws we choose. In this case let’s simply decree that the Boltzmann distribution gives the correct relation between energy, temperature, and probability.

Note that the Boltzmann weight is said to *determine* a probability, not that it *is* a probability. It can’t be. \(W_B\) can range from zero to infinity, but a probability must lie between zero and one. To get the probability of a given configuration, we need to calculate its Boltzmann weight and then divide by \(Z\), the sum of the weights of all possible configurations—a process called normalization. For a model with \(10{,}000\) spins there are \(2^{10{,}000}\) configurations, so normalization is not a task to be attempted by direct, brute-force arithmetic.

It’s a tribute to the ingenuity of mathematicians that the impossible-sounding problem of calculating \(Z\) has in fact been conquered. In 1944 Lars Onsager published a complete solution of the two-dimensional Ising model—complete in the sense that it allows you to calculate the magnetization, the energy per spin, and a variety of other properties, all as a function of temperature. I would like to say more about Onsager’s solution, but I can’t. I’ve tried more than once to work my way through his paper, but it defeats me every time. I would understand nothing at all about this result if it weren’t for a little help from my friends. Barry Cipra, in a 1987 article, and Cristopher Moore and Stephan Mertens, in their magisterial tome *The Nature of Computation*, rederive the solution by other means. They relate the Ising model to more tractable problems in graph theory, where I am able to follow most of the steps in the argument. Even in these lucid expositions, however, I find the ultimate result unilluminating. I’ll cite just one fact emerging from Onsager’s difficult algebraic exercise. The exact location of the critical temperature, separating the magnetic from the nonmagnetic phases, is:

\[\frac{2}{\log{(1 + \sqrt{2})}} \approx 2.269185314.\]

For those intimidated by the icy crags of Mt. Onsager, I can recommend the warm blue waters of Monte Carlo. The math is easier. There’s a clear, mechanistic connection between microscopic events and macroscopic properties. And there are the visualizations—that lively dance of the mauve and the indigo—which offer revealing glimpses of what’s going on behind the mathematical curtains. All that’s missing is exactness. Monte Carlo studies can pin down \(T_C\) to several decinal places, but they will never give the algebraic expression found by Onsager.

The Monte Carlo method was devised in the years immediately after World War II by mathematicians and physicists working at the Los Alamos Laboratory in New Mexico.

The second application was the design of nuclear weapons. The problem at hand was to understand the diffusion of neutrons through uranium and other materials. When a wandering neutron collided with an atomic nucleus, the neutron might be scattered in a new direction, or it might be absorbed by the nucleus and effectively disappear, or it might induce fission in the nucleus and thereby give rise to several more neutrons. Experiments had provided reasonable estimates of the probability of each of these events, but it was still hard to answer the crucial question: In a lump of fissionable matter with a particular shape, size, and composition, would the nuclear chain reaction fizzle or boom? The Monte Carlo method offered an answer by simulating the paths of thousands of neutrons, using random numbers to generate events with the appropriate probabilities. The first such calculations were done with the ENIAC, the vacuum-tube computer built at the University of Pennsylvania. Later the work shifted to the MANIAC, built at Los Alamos.

This early version of the Monte Carlo method is now sometimes called simple or naive Monte Carlo; I have also seen the term hit-or-miss Monte Carlo. The scheme served well enough for card games and for weapons of mass destruction, but the Los Alamos group never attempted to apply it to a problem anything like the Ising model. It would not have worked if they had tried. I know that because textbooks say so, but I had never seen any discussion of exactly *how* the model would fail. So I decided to try it for myself.

My plan was indeed simple and naive and hit-or-miss. First I generated a random sample of \(10{,}000\) spin configurations, drawn independently with uniform probability from the set of all possible states of the lattice. This was easy to do: I constructed the samples by the computational equivalent of tossing a fair coin to assign a value to each spin. Then I calculated the energy of each configuration and, assuming some definite temperature \(T\), assigned a Boltzmann weight. I still couldn’t convert the Boltzmann weights into true probabilities without knowing the sum of all \(2^{10{,}000}\) weights, but I could sum up the weights of the \(10{,}000\) configurations in the sample. Dividing each weight by this sum yields a *relative* probability: It estimates how frequently (at temperature \(T\)) we can expect to see a member of the sample relative to all the other members.

At extremely high temperatures—say \(T \gt 1{,}000\)—this procedure works pretty well. That’s because all configurations are nearly equivalent at those temperatures; they all have about the same relative probability. On cooling the system, I hoped to see a gradual skewing of the relative probabilities, as configurations with lower energy are given greater weight. What happens, however, is not a gradual skewing but a spectacular collapse. At \(T = 2\) the lowest-energy state in my sample had a relative probability of \(0.9999999979388337\), leaving just \(0.00000000206117\) to be shared among the other \(9{,}999\) members of the set.

The fundamental problem is that a small sample of randomly generated lattice configurations will almost never include any states that are commonly seen at low temperature. The histograms of Figure 4 show Boltzmann distributions at various temperatures *(blue)* compared with the distribution of randomly generated states *(red)*. The random distribution is a slender peak centered at zero energy. There is slight overlap with the Boltzmann distribution at \(T = 50\), but none whatever for lower temperatures.

There’s actually some good news in this fiasco. The failure of random sampling indicates that the interesting states of the Ising system—those which give the model its distinctive behavior—form a tiny subset buried within the enormous space of \(2^{10{,}000}\) configurations. If we can find a way to focus on that subset and ignore the rest, the job will be much easier.

The means to focus more narrowly came with a second wave of Monte Carlo methods, also emanating from Los Alamos. The foundational document was a paper titled “Equation of State Calculations by Fast Computing Machines,” published in 1953. Among the five authors, Nicholas Metropolis was listed first (presumably for alphabetical reasons), and his name remains firmly attached to the algorithm presented in the paper.

With admirable clarity, Metropolis *et al.* explain the distinction between the old and the new Monte Carlo: “[I]nstead of choosing configurations randomly, then weighting them with \(\exp(-E/kT)\), we choose configurations with a probability \(\exp(-E/kT)\) and weight them evenly.” Starting from an arbitrary initial state, the scheme makes small, random modifications, with a bias favoring configurations with a lower energy (and thus higher Boltzmann weight), but not altogether excluding moves to higher-energy states. After many moves of this kind, the system is almost certain to be meandering through a neighborhood that includes the most probable configurations. Methods based on this principle have come to be known as MCMC, for Markov chain Monte Carlo. The Metropolis algorithm and Glauber dynamics are the best-known exemplars.

Roy Glauber also had Los Alamos connections. He worked there during World War II, in the same theory division that was home to Ulam, John von Neumann, Hans Bethe, Richard Feynman, and many other notables of physics and mathematics. But Glauber was a very junior member of the group; he was 18 when he arrived, and a sophomore at Harvard. His one paper on the Ising model was published two decades later, in 1963, and makes no mention of his former Los Alamos colleagues. It also makes no mention of Monte Carlo methods; nevertheless, Glauber dynamics has been taken up enthusiastically by the Monte Carlo community.

When applied to the Ising model, both the Metropolis algorithm and Glauber dynamics work by focusing attention on a single spin at each step, and either flipping the selected spin or leaving it unchanged. Thus the system passes through a sequence of states that differ by at most one spin flip. Statistically speaking, this procedure sounds a little dodgy. Unlike the naive Monte Carlo approach, where successive states are completely independent, MCMC generates configurations they are closely correlated. It’s a biased sample. To overcome the bias, the MCMC process has to run long enough for the correlations to fade away. With a lattice of \(N\) sites, a common protocol retains only every \(N\)th sample, discarding all those in between.

The mathematical justification for the use of correlated samples is the theory of Markov chains, devised by the Russian mathematician A. A. Markov circa 1900. It is a tool for calculating probabilities when each event depends on the previous event. And, in the Monte Carlo method, it allows one to work with those probabilities without getting bogged down in the morass of normalization.

The Metropolis and the Glauber algorithms are built on the same armature. They both rely on two main components: a visitation sequence and an acceptance function. The visitation sequence determines which lattice site to visit next; in effect, it shines a spotlight on one selected spin, proposing to flip it to the opposite orientation. The acceptance function determines whether to accept this proposal (and flip the spin) or reject it (and leave the existing spin direction unchanged). Each iteration of this two-phase process constitutes one “microstep” of the Monte Carlo procedure. Repeating the procedure \(N\) times constitutes a “macrostep.” Thus one macrostep amounts to one microstep per spin.

In the Metropolis algorithm, the visitation order is deterministic. The program sweeps through the lattice methodically, repeating the same sequence of visits in every macrostep. The original 1953 presentation of the algorithm did not prescribe any specific sequence, but the procedure was clearly designed to visit each site exactly once during a sweep. The version of the Metropolis algorithm in Program 1 adopts the most obvious deterministic option: “typewriter order.” The program chugs through the first row of the lattice from left to right, then goes through the second row in the same way, and so on down to the bottom.

Glauber dynamics takes a different approach: At each microstep the algorithm selects a single spin at random, with uniform probability, from the entire set of \(N\) spins. In other words, every spin has a \(1 / N\) chance of being chosen at each microstep, whether or not it has been chosen before. A macrostep lasts for \(N\) microsteps, but the procedure does *not* guarantee that every spin will get a turn during every sweep. Some sites will be passed over, while others are visited more than once. Still, as the number of steps goes to infinity, all the sites eventually get equal attention.

So much for the visitation sequence; now on to the acceptance function. It has three parts:

- Calculate \(\Delta E\), the change in energy that would result from flipping the selected spin \(s\). To determine this value, we need to examine \(s\) itself and its four nearest neighbors.
- Based on \(\Delta E\) and the temperature \(T\), calculate the probability \(p\) of flipping spin \(s\).
- Generate a random number \(r\) in the interval \([0, 1)\). If \(r \lt p\), flip the selected spin; otherwise leave it as is.

Part 2 of the acceptance rule calls for a mathematical function that maps values of \(\Delta E\) and \(T\) to a probability \(p\). To be a valid probability, \(p\) must be confined to the interval \([0, 1]\). To make sense in the context of the Monte Carlo method, the function should assign a higher probability to spin flips that reduce the system’s energy, without totally excluding those that bring an energy increase. And this preference for negative \(\Delta E\) should grow sharper as T gets lower. The specific functions chosen by the Metropolis and the Glauber algorithms satisfy both of these criteria.

Let’s begin with the Glauber acceptance function, which I’m going to call the *G*-rule:

\[p = \frac{e^{-\Delta E/T}}{1 + e^{-\Delta E/T}}.\]

Parts of this equation should look familiar. The expression for the Boltzmann weight, \(e^{-\Delta E/T}\), appears twice, except that the configuration energy \(E\) is replaced by \(\Delta E\), the change in energy when a specific spin is flipped. *G*-rule stays within the bounds of \(0\) to \(1\). The curve at right shows the probability distribution for \(T = 2.7\), near the critical point for the onset of magnetization. To get a qualitative understanding of the form of this curve, consider what happens when \(\Delta E\) grows without bound toward positive infinity: The numerator of the fraction goes to \(0\) while the denominator goes to \(1\), leaving a quotient that approaches \(0.0\). At the other end of the curve, as \(\Delta E\) goes to negative infinity, both numerator and denominator increase without limit, and the probability approaches (but never quite reaches) \(1.0\). Between these extremes, the curve is symmetrical and smooth. It looks like it would make a pleasant ski run.

The Metropolis acceptance criterion also includes the expression \(e^{-\Delta E/T}\), but the function and the curve are quite different. The acceptance probability is defined in a piecewise fashion:

\[p = \left\{\begin{array}{cl}

1 & \text { if } \quad \Delta E \leq 0 \\

e^{-\Delta E/T} & \text { if } \quad \Delta E>0

\end{array}\right.\]

In words, the rule says: If flipping a spin would reduce the energy of the system or leave it unchanged, always do it; otherwise, flip the spin with probability \(e^{-\Delta E/T}\). The probability curve *(left)* has a steep escarpment; if this one is a ski slope, it rates a black diamond. Unlike the smooth and symmetrical Glauber curve, this one has a sharp corner, as well as a strong bias. Consider a spin with \(\Delta E = 0\). Glauber flips such a spin with probability \(1/2\), but Metropolis *always* flips it.

The graphs in Figure 7 compare the two acceptance functions over a range of temperatures. The curves differ most at the highest temperatures, and they become almost indistinguishable at the lowest temperatures, where both curves approximate a step function. Although both functions are defined over the entire real number line, the two-dimensional Ising model allows \(\Delta E\) to take on only five distinct values: \(–8, –4, 0, +4,\) and \(+8\). Thus the Ising probability functions are never evaluated anywhere other than the positions marked by colored dots.

Here are JavaScript functions that implement a macrostep in each of the algorithms, with their differences in both visitation sequence and acceptance function:

```
function runMetro() {
for (let y = 0; y < gridSize; y++) {
for (let x = 0; x < gridSize; x++) {
let deltaE = calcDeltaE(x, y);
let boltzmann = Math.exp(-deltaE/temperature);
if ((deltaE <= 0) || (Math.random() < boltzmann)) {
lattice[x][y] *= -1;
}
}
}
drawLattice();
}
function runGlauber() {
for (let i = 0; i < N; i++) {
let x = Math.floor(Math.random() * gridSize);
let y = Math.floor(Math.random() * gridSize);
let deltaE = calcDeltaE(x, y);
let boltzmann = Math.exp(-deltaE/temperature);
if (Math.random() < (boltzmann / (1 + boltzmann))) {
lattice[x][y] *= -1;
}
}
drawLattice();
}
```

(As I mentioned above, the rest of the source code for the simuations is available on GitHub.)

We’ve seen that the Metropolis and the Glauber algorithms differ in their choice of both visitation sequence and acceptance function. They also produce different patterns or textures when you watch them in action on the computer screen. But what about the numbers? Do they predict different properties for the Ising ferromagnet?

A theorem mentioned throughout the MCMC literature says that these two algorithms (and others like them) should give identical results when properties of the model are measured at thermal equilibrium. I have encountered this statement many times in my reading, but until a few weeks ago I had never tested it for myself. Here are some magnetization data that look fairly convincing:

T = 1.0 | T = 2.0 | T = 2.7 | T = 3.0 | T = 5.0 | T = 10.0 | |
---|---|---|---|---|---|---|

Metropolis | 0.9993 | 0.9114 | 0.0409 | 0.0269 | 0.0134 | 0.0099 |

Glauber | 0.9993 | 0.9118 | 0.0378 | 0.0274 | 0.0136 | 0.0100 |

The table records the absolute value of the magnetization in Metropolis and Glauber simulations at various temperatures. Five of the six measurements differ by less than \(0.001\); the exception cames at \(T = 2.7\), near the critical point, where the difference rises to about \(0.003\). Note that the results are consistent with the presence of a phase transition: Magnetization remains close to \(0\) down to the critical point and then approaches \(1\) at lower temperatures. (By reporting the magnitude, or absolute value, of the magnetization, we treat all-up and all-down states as equivalent.)

I made the measurements by first setting the temperature and then letting the simulation run for at least 1,000 macrosteps in order to reach an equilibrium condition.

When I first looked at these results and saw the close match between Metropolis and Glauber, I felt a twinge of paradoxical surprise. I call it paradoxical because I knew before I started what I would see, and that’s exactly what I *did* see, so obviously I should not have been surprised at all. But some part of my mind didn’t get that memo, and as I watched the two algorithms converge to the same values all across the temperature scale, it seemed remarkable.

The theory behind this convergence was apparently understood by the pioneers of MCMC in the 1950s. The theorem states that any MCMC algorithm will produce the same distribution of states at equilibrium, as long as the algorithm satisfies two conditions, called ergodicity and detailed balance.

*ergodic* was coined by Boltzmann, and is usually said to have the Greek roots εργον οδος, meaning something like “energy path.” Giovanni Gallavotti disputes this etymology, suggesting a derivation from εργον ειδoς, which he translates as “monode with a given energy.” Take your pick.*cul de sac* states you might wander into and never be able to escape, or border walls that divide the space into isolated regions. The Metropolis and Glauber algorithms satisfy this condition because every transition between states has a nonzero probability. (In both algorithms the acceptance probability comes arbitrarily close to zero but never touches it.) In the specific case of the \(100 \times 100\) lattice I’ve been playing with, any two states are connected by a path of no more than \(10{,}000\) steps.

Both algorithms also exhibit detailed balance, which is essentially a requirement of reversibility. Suppose that while watching a model run, you observe a transition from state \(A\) to state \(B\). Detailed balance says that if you continue observing long enough, you will see the inverse transition \(B \rightarrow A\) with the same frequency as \(A \rightarrow B\). Given the shapes of the acceptance curves, this assertion may seem implausible. If \(A \rightarrow B\) is energetically favorable, then \(B \rightarrow A\) must be unfavorable, and it will have a lower probability. But there’s another factor at work here. Remember we are assuming the system is in equilibrium, which implies that the occupancy of each state—or the amount of time the system spends in that state—is proportional to the state’s Boltzmann weight. Because the system is more often found in state \(B\), the transition \(B \rightarrow A\) has more chances to be chosen, counterbalancing the lower intrinsic probability.

The claim that Metropolis and Glauber yield identical results applies only when the Ising system is in equilibrium—poised at the eternal noon where the sun stands still and nothing ever changes. For Metropolis and his colleagues at Los Alamos in the early 1950s, understanding the equilibrium behavior of a computational model was challenge enough. They were coaxing answers from a computer with about four kilobytes of memory. Ten years later, however, Glauber wanted to look beyond equilibrium. For example, he wanted to know what happens when the temperature suddenly changes. How do the spins reorganize themselves during the transient period between one equilibrium state and another? He designed his version of the Ising model specifically to deal with such dynamic situations. He wrote in his 1963 paper:

If the mathematical problems of equilibrium statistical mechanics are great, they are at least relatively well-defined. The situation is quite otherwise in dealing with systems which undergo large-scale changes with time…. We have attempted, therefore, to devise a form of the Ising model whose behavior can be followed exactly, in statistical terms, as a function of time.

Clearly, in this dynamic situation, the algorithms are not identical or interchangeable. The Metropolis program adapts more quickly to the cooler environment; Glauber produces a slower but steadier rise in magnetization. The curves differ in shape, with Metropolis exhibiting a distinctive “knee” where the slope flattens. I want to know what causes these differences, but before digging into that question it seems important to understand why *both* algorithms are so agonizingly slow. At the right edge of the graph the blue Metropolis curve is approaching the equilibrium value of magnetization (which is about 0.91), but it has taken 7,500 Monte Carlo macrosteps (or 75 million microsteps) to get there. The red Glauber curve will require many more. What’s the holdup?

To put this sluggishness in perspective, let’s look at the behavior of local spin correlations measured under the same circumstances. Graphing the average nearest-neighbor correlation following a sudden temperature drop produces these hockey-stick curves:

The response is dramatically faster; both algorithms reach quite high levels of local correlation within just a few macrosteps.

For a hint of why local correlations grow so much faster than global magnetization, it’s enough to spend a few minutes watching the Ising simulation evolve on the computer screen. When the temperature plunges from warm \(T = 5\) to frigid \(T = 2\), nearby spins have a strong incentive to line up in parallel, but magnetization does not spread uniformly across the entire lattice. Small clusters of aligned spins start expanding, and they merge with other clusters of the same polarity, thereby growing larger still. It doesn’t take long, however, before clusters of opposite polarity run into one another, blocking further growth for both. From then on, magetization is a zero-sum game: The up team can win only if the down team loses.

Figure 10 shows the first few Monte Carlo macrosteps following a flash freeze. The initial configuration at the upper left reflects the high-temperature state, with a nearly random, salt-and-pepper mix of up and down spins. The rest of the snapshots (reading left to right and top to bottom) show the emergence of large-scale order. Prominent clusters appear after the very first macrostep, and by the second or third step some of these blobs have grown to include hundreds of lattice sites. But the rate of change becomes sluggish thereafter. The balance of power may tilt one way and then the other, but it’s hard for either side to gain a permanent advantage. The mottled, camouflage texture will persist for hundreds or thousands of steps.

If you choose a single spin at random from such a mottled lattice, you’ll almost surely find that it lives in an area where most of the neighbors have the same orientation. Hence the high levels of local correlation. But that fact does *not* imply that the entire array is approaching unanimity. On the contrary, the lattice can be evenly divided between up and down domains, leaving a net magnetization near zero. (Yes, it’s like political polarization, where homogeneous states add up to a deadlocked nation.)

The images in Figure 11 show three views of the same state of an Ising lattice. At left is the conventional representation, with sinuous, interlaced territories of nearly pure up and down spins. The middle panel shows the same configuration recolored according to the local level of spin correlation. The vast majority of sites *(lightest hue)* are surrounded by four neighbors of the same orientation; they correspond to *both* the mauve and the indigo regions of the leftmost image. Only along the boundaries between domains is there any substantial conflict, where darker colors mark cells whose neighbors include spins of the opposite orientation. The panel at right highlights a special category of sites—those with exactly two parallel and two antiparallel neighbors. They are special because they are tiny neutral territories wedged between the contending factions. Flipping such a spin does not alter its correlation status; both before and after it has two like and two unlike neighbors. Flipping a neutral spin also does not alter the total energy of the system. But it *can* shift the magnetization. Indeed, flipping such “neutral” spins is the main agent of evolution in the Ising system at low temperature.

The struggle to reach full magnetization in an Ising lattice looks like trench warfare. Contending armies, almost evenly matched, face off over the boundary lines between up and down territories. All the action is along the borders; nothing that happens behind the lines makes much difference. Even along the boundaries, some sections of the front are static. If a domain margin is a straight line parallel to the \(x\) or \(y\) axis, the sites on each side of the border have three friendly neighbors and only one enemy; they are unlikely to flip. The volatile neutral sites that make movement possible appear only at corners and along diagonals, where neighborhoods are evenly split.

*only* neutral spins. They have the pleasant property of conserving energy, which is not true of the Metropolis and Glauber algorithms.

From these observations and ruminations I feel I’ve acquired some intuition about why my Monte Carlo simulations bog down during the transition from a chaotic to an ordered state. But why is the Glauber algorithm even slower than the Metropolis?

Since the schemes differ in two features—the visitation sequence and the acceptance function—it makes sense to investigate which of those features has the greater effect on the convergence rate. That calls for another computational experiment.

The tableau below is a mix-and-match version of the MCMC Ising simulation. In the control panel you can choose the visitation order and the acceptance function independently. If you select a deterministic visitation order and the *M*-rule acceptance function, you have the classical Metropolis algorithm. Likewise random order and the *G*-rule correspond to Glauber dynamics. But you can also pair deterministic order with the *G*-rule or random order with the *M*-rule. (The latter mixed-breed choice is what I unthinkingly implemented in my 2019 program.)

I have also included an acceptance rule labeled *M**, which I’ll explain below.

Watching the screen while switching among these alternative components reveals that all the combinations yield different visual textures, at least at some temperatures. Also, it appears there’s something special about the pairing of deterministic visitation order with the *M*-rule acceptance function (*i.e.*, the standard Metropolis algorithm).

Try setting the temperature to 2.5 or 3.0. I find that the distinctive sensation of fluttery motion—bird flocks migrating across the screen—appears *only* with the deterministic/*M*-rule combination. With all other pairings, I see amoeba-like blobs that grow and shrink, fuse and divide, but there’s not much coordinated motion.

Now lower the temperature to about 1.5, and alternately click *Run* and *Reset* until you get a persistent bold stripe that crosses the entire grid either horizontally or vertically. *M*-rule combination is different from all the others. With this mode, the stripe appears to wiggle across the screen like a millipede, either right to left or bottom to top. Changing either the visitation order or the acceptance function suppresses this peristaltic motion; the stripe may still have pulsating bulges and constrictions, but they’re not going anywhere.

These observations suggest some curious interactions between the visitation order and the acceptance function, but they do not reveal which factor gives the Metropolis algorithm its speed advantage. Using the same program, however, we can gather some statistical data that might help answer the question.

These curves were a surprise to me. From my earlier experiments I already knew that the Metropolis algorithm—the combination of elements in the blue curve—would outperform the Glauber version, corresponding to the red curve. But I expected the acceptance function to account for most of the difference. The data do not support that supposition. On the contrary, they suggest that both elements matter, and the visitation sequence may even be the more important one. A deterministic visitation order beats a random order no matter which acceptance function it is paired with.

My expectations were based mainly on discussions of the “mixing time” for various Monte Carlo algorithms. Mixing time is the number of steps needed for a simulation to reach equilibrium from an arbitrary initial state, or in other words the time needed for the system to lose all memory of how it began. If you care only about equilibrium properties, then an algorithm that offers the shortest mixing time is likely to be preferred, since it also minimizes the number of CPU cycles you have to waste before you can start taking data. Discussions of mixing time tend to focus on the acceptance function, not the visitation sequence. In particular, the *M*-rule acceptance function of the Metropolis algorithm was explicitly designed to minimize mixing time.

What I am measuring in my experiments is not exactly mixing time, but it’s closely related. Going from an arbitrary initial state to equilibrium at a specified temperature is much like a transition from one temperature to another. What’s going on inside the model is similar. Thus if the acceptance function determines the mixing time, I would expect it also to be the major factor in adapting to a new temperature regime.

On the other hand, I can offer a plausible-sounding theory of why visitation order might matter. The deterministic model scans through all \(10{,}000\) lattice sites during every Monte Carlo macrostep; each such sweep is guaranteed to visit every site exactly once. The random order makes no such promise. In that algorithm, each microstep selects a site at random, whether or not it has been visited before. A macrostep concludes after \(10{,}000\) such random choices. Under this protocol some sites are passed over without being selected even once, while others are chosen two or more times. How many sites are likely to be missed? During each microstep, every site has the same probability of being chosen, namely \(1 / 10{,}000\). Thus the probability of *not* being selected on any given turn is \(9{,}999 / 10{,}000\). For a site to remain unvisited throughout an entire macrostep, it must be passed over \(10{,}000\) times in a row. The probability of that event is \((9{,}999 / 10{,}000)^{10{,}000}\), which works out to about \(0.368\).

Excluding more than a third of the sites on every pass through the lattice seems certain to have *some* effect on the outcome of an experiment. In the long run the random selection process is fair, in the sense that every spin is sampled at the same frequency. But the rate of convergence to the equilibrium state may well be lower.

There are also compelling arguments for the importance of the acceptance function. A key fact mentioned by several authors is that the *M* acceptance rule leads to more spin flips per Monte Carlo step. If the energy change of a proposed flip is favorable or neutral, the *M*-rule *always* approves the flip, whereas the *G*-rule rejects some proposed flips even when they lower the energy. Indeed, for all values of \(T\) and \(\Delta E\) the *M*-rule gives a higher probability of acceptance than the *G*-rule does. This liberal policy—if in doubt, flip—allows the *M*-rule to explore the space of all possible spin configurations more rapidly.

The discrete nature of the Ising model, with just five possible values of \(\Delta E\), introduces a further consideration. At \(\Delta E = \pm 4\) and at \(\Delta E = \pm 8\), the *M*-rule and the *G*-rule don’t actually differ very much when the temperature is below the critical point *(see Figure 7)*. The two curves diverge only at \(\Delta E = 0\): The *M*-rule invariably flips a spin in this circumstance, whereas the *G*-rule does so only half the time, assigning a probability of \(0.5\). This difference is important because the lattice sites where \(\Delta E = 0\) are the ones that account for almost all of the spin flips at low temperature. These are the neutral sites highlighted in the right panel ofFigure 11, the ones with two like and two unlike neighbors.

This line of thought leads to another hypothesis. Maybe the big difference between the Metropolis and the Glauber algorithms has to do with the handling of this single point on the acceptance curve. And there’s an obvious way to test the hypothesis: Simply change the *M*-rule at this one point, having it toss a coin whenever \(\Delta E = 0\). The definition becomes:

\[p = \left\{\begin{array}{cl}

1 & \text { if } \quad \Delta E \lt 0 \\

\frac{1}{2} & \text { if } \quad \Delta E = 0 \\

e^{-\Delta E/T} & \text { if } \quad \Delta E>0

\end{array}\right.\]

This modified acceptance function is the *M** rule offered as an option in Program 2. Watching it in action, I find that switching the Metropolis algorithm from *M* to *M** robs it of its most distinctive traits: At high temperature the fluttering birds are banished, and at low temperature the wiggling worms are immobilized. The effects on convergence rates are also intriguing. In the Metropolis algorithm, replacing *M* with *M** greatly diminishes convergence speed, from a standout level to just a little better than average. At the same time, in the Glauber algorithm replacing *G* with *M** brings a considerable performance improvement; when combined with random visitation order, *M** is superior not only to *G* but also to *M*.

I don’t know how to make sense of all these results except to suggest that both the visitation order and the acceptance function have important roles, and non-additive interactions between them may also be at work. Here’s one further puzzle. In all the experiments described above, the Glauber algorithm and its variations respond to a disturbance more slowly than Metropolis. But before dismissing Glauber as the perennial laggard, take a look at Figure 14.

Here we’re observing a transition from low to high temperature, the opposite of the situation discussed above. When going in this direction—from an orderly phase to a chaotic one, melting rather than freezing—both algorithms are quite zippy, but Glauber is a little faster than Metropolis. Randomness, it appears, is good for randomization. That sounds sensible enough, but I can’t explain in any detail how it comes about.

Up to this point, a deterministic visitation order has always meant the typewriter scan of the lattice—left to right and top to bottom. Of course this is not the only deterministic route through the grid. In Program 3 you can play with a few of the others.

Why should visitation order matter at all? As long as you touch every site exactly once, you might imagine that all sequences would produce the same result at the end of a macrostep. But it’s not so, and it’s not hard to see why. Whenever two sites are neighbors, the outcome of applying the Monte Carlo process can depend on which neighbor you visit first.

Consider the cruciform configuration at right. At first glance, you might assume that the dark central square will be unlikely to change its state. After all, the central square has four like-colored neighbors; if it were to flip, it would have four opposite-colored neighbors, and the energy associated with those spin-spin interactions would rise from \(-4\) to \(+4\). Any visitation sequence that went first to the central square would almost surely leave it unflipped. However, when the Metropolis algorithm comes tap-tap-tapping along in typewriter mode, the central cell does in fact change color, and so do all four of its neighbors. The entire structure is annihilated in a single sweep of the algorithm. (The erased pattern *does* leave behind a ghost—one of the diagonal neighbor sites flips from light to dark. But then that solitary witness disappears on the next sweep.)

To understand what’s going on here, just follow along as the algorithm marches from left to right and top to bottom through the lattice. When it reaches the central square of the cross, it has already visited (and flipped) the neighbors to the north and to the west. Hence the central square has two neighbors of each color, so that \(\Delta E = 0\). According to the *M*-rule, that square must be flipped from dark to light. The remaining two dark squares are now isolated, with only light neighbors, so they too flip when their time comes.

The underlying issue here is one of chronology—of past, present, and future. Each site has its moment in the present, when it surveys its surroundings and decides (based on the results of the survey) whether or not to change its state. But in that present moment, half of the site’s neighbors are living in the past—the typewriter algorithm has already visited them—and the other half are in the future, still waiting their turn.

A well-known alternative to the typewriter sequence might seem at first to avoid this temporal split decision. Superimposing a checkerboard pattern on the lattice creates two sublattices that do not communicate for purposes of the Ising model. Each black square has only white neighbors, and vice versa. Thus you can run through all the black sites (in any order; it really doesn’t matter), flipping spins as needed. Afterwards you turn to the white sites. These two half-scans make up one macrostep. Throughout the process, every site sees all of its neighbors in the same generation. And yet time has not been abolished. The black cells, in the first half of the sweep, see four neighboring sites that have not yet been visited. The white cells see neighbors that have already had their chance to flip. Again half the neighbors are in the past and half in the future, but they are distributed differently.

There are plenty of other deterministic sequences. You can trace successive diagonals; in Program 3 they run from southwest to northeast. There’s the ever-popular boustrophedonic order, following in the footsteps of the ox in the plowed field. More generally, if we number the sites consecutively from \(1\) to \(10{,}000\), any permutation of this sequence represents a valid visitation order, touching each site exactly once. There are \(10{,}000!\) such permutations, a number that dwarfs even the \(2^{10{,}000}\) configurations of the binary-valued lattice. The *permuted* choice in Program 3 selects one of those permutations at random; it is then used repeatedly for every macrostep until the program is reset. The *re-permuted* option is similar but selects a new permutation for each macrostep. The *random* selection is here for comparison with all the deterministic variations.

(There’s one final button labeled simultaneous, which I’ll explain below. If you just can’t wait, go ahead and press it, but I won’t be held responsible for what happens.)

The variations add some further novelties to the collection of curious visual effects seen in earlier simulations. The fluttering wings are back, in the diagonal as well as the typewriter sequences. Checkerboard has a different rhythm; I am reminded of a crowd of frantic commuters in the concourse of Grand Central Terminal. Boustrophedon is bidirectional: The millipede’s legs carry it both up and down or both left and right at the same time. Permuted is similar to checkerboard, but re-permuted is quite different.

The next question is whether these variant algorithms have any quantitative effect on the model’s dynamics. Figure 15 shows the response to a sudden freeze for seven visitation sequences. Five of them follow roughly the same arcing trajectory. Typewriter remains at the top of the heap, but checkerboard, diagonal, boustrophedon, and permuted are all close by, forming something like a comet tail. The random algorithm is much slower, which is to be expected given the results of earlier experiments.

The intriguing case is the re-permuted order, which seems to lie in the no man’s land between the random and the deterministic algorithms. Perhaps it belongs there. In earlier comparisons of the Metropolis and Glauber algorithms, I speculated that random visitation is slower to converge because many sites are passed over in each macrostep, while others are visited more than once. That’s not true of the re-permuted visitation sequence, which calls on every site exactly once, though in random order. The only difference between the permuted algorithm and the re-permuted one is that the former reuses the same permutation over and over, whereas re-permuted creates a new sequence for every macrostep. The faster convergence of the static permuted algorithm suggests there is some advantage to revisiting all the same sites in the same order, no matter what that order may be. Most likely this has something to do with sites that get switched back and forth repeatedly, on every sweep.

Now for the mysterious *simultaneous* visitation sequence. If you have not played with it yet in Program 3, I suggest running the following experiment. Select the typewriter sequence, press the *Run* button, reduce the temperature to 1.10 or 1.15, and wait until the lattice is all mauve or all indigo, with just a peppering of opposite-color dots. (If you get a persistent wide stripe instead of a clear field, raise the temperature and try again.) Now select the *simultaneous* visitation order.

This behavior is truly weird but not inexplicable. The algorithm behind it is one that I have always thought should be the best approach to a Monte Carlo Ising simulation. In fact it seems to be just about the worst.

All of the other visitation sequences are—as the term suggests they should be—*sequential*. They visit one site at a time, and differ only in how they decide where to go next. If you think about the Ising model as if it were a real physical process, this kind of serialization seems pretty implausible. I can’t bring myself to believe that atoms in a ferromagnet politely take turns in flipping their spins. And surely there’s no central planner of the sequence, no orchestra conductor on a podium, pointing a baton at each site when its turn comes.

Natural systems have an all-at-onceness to them. They are made up of many independent agents that are all carrying out the same kinds of activities at the same time. If we could somehow build an Ising model out of real atoms, then each cell or site would be watching the state of its four neighbors all the time, and also sensing thermal agitation in the lattice; it would decide to flip whenever circumstances favored that choice, although there might be some randomness to the timing. If we imagine a computer model of this process (yes, a model of a model), the most natural implementation would require a highly parallel machine with one processor per site.

Lacking such fancy hardware, I make due with fake parallelism. The *simultaneous* algorithm makes two passes through the lattice on every macrostep. On the first pass, it looks at the neighborhood of each site and decides whether or not to flip the spin, but it doesn’t actually make any changes to the lattice. Instead, it uses an auxiliary array to keep track of which spins are scheduled to flip. Then, after all sites have been surveyed in the first pass, the second pass goes through the lattice again, flipping all the spins that were designated in the first pass. The great advantage of this scheme is that it avoids the temporal oddities of working within a lattice where some spins have already been updated and others have not. In the *simultaneous* algorithm, all the spins make the transition from one generation to the next at the same instant.

When I first wrote a program to implement this scheme, almost 40 years ago, I didn’t really know what I was doing, and I was utterly baffled by the outcome. The information mechanics group at MIT (Ed Fredkin, Tommaso Toffoli, Norman Margolus, and Gérard Vichniac) soon came to my rescue and explained what was going on, but all these years later I still haven’t quite made my peace with it.

*does* flip, with the result that the new configuration is a mirror image of the previous one, with every up spin become a down and vice versa. When the next round begins, every site wants to flip again.

What continues to disturb me about this phenomenon is that I still think the simultaneous update rule is in some sense more natural or realistic than many of the alternatives. It is closer to how the world works—or how I imagine that it works—than any serial ordering of updates. Yet nature does not create magnets that continually swing between states that have the highest possible energy. (A 2002 paper by Gabriel Pérez, Francisco Sastre, and Rubén Medina attempts to rehabilitate the simultaneous-update scheme, but the blinking catastrophe remains pretty catastrophic.)

This is not the only bizarre behavior to be found in the dark corners of Monte Carlo Ising models. In the Metropolis algorithm,

I have not seen this high-temperature anomaly mentioned in published works on the Metropolis algorithm, although it must have been noticed many times over the years. Perhaps it’s not mentioned because this kind of failure will never be seen in physical systems. \(T = 1{,}000\) in the Ising model is \(370\) times the critical temperature; the corresponding temperature in iron would be over \(300{,}000\) kelvins. Iron boils at \(3{,}000\) kelvins.

The curves in Figure 15 and most of the other graphs above are averages taken over hundreds of repetitions of the Monte Carlo process. The averaging operation is meant to act like sandpaper, smoothing out noise in the curves, but it can also grind away interesting features, replacing a population of diverse individuals with a single homogenized exemplar. Figure 17 shows six examples of the lumpy and jumpy trajectories recorded during single runs of the program:

In these squiggles, magnetization does not grow smoothly or steadily with time. Instead we see spurts of growth followed by plateaus and even episodes of retreat. One of the Metropolis runs is slower than the three Glauber examples, and indeed makes no progress toward a magnetized state. Looking at these plots, it’s tempting to explain them away by saying that the magnetization measurements exhibit high variance. That’s certain true, but it’s not the whole story.

Figure 18 shows the distribution of times needed for a Metropolis Ising model to reach a magnetization of \(0.85\) in response to a sudden shift from \(T = 10\) to \(T= 2\). The histogram records data from \(10{,}000\) program runs, expressing convergence time in Monte Carlo macrosteps.

The median of this distribution is \(451\) macrosteps; in other words, half of the runs concluded in \(451\) steps or fewer. But the other half of the population is spread out over quite a wide range. Runs of \(10\) times the median length are not great rareties, and the blip at the far right end of the \(x\) axis represents the \(59\) runs that had still not reached the threshold after \(10{,}000\) macrosteps (where I stopped counting). This is a heavy-tailed distribution, which appears to be made up of two subpopulations. In one group, forming the sharp peak at the left, magnetization is quick and easy, but members of the other group are recalcitrant, holding out for thousands of steps. I have a hypothesis about what distinguishes those two sets. The short-lived ones are ponds; the stubborn ones that overstay their welcome are rivers.

When an Ising system cools and becomes fully magnetized, it goes from a salt-and-pepper array of tiny clusters to a monochromatic expanse of one color or the other. At some point during this process, there must be a moment when the lattice is divided into exactly two regions, one light and one dark.

I believe the correct answer has to do with the concepts of inside and outside, connected and disconnected, open sets and closed sets—but I can’t articulate these ideas in a way that would pass mathematical muster. I want to say that the indigo pond is a bounded region, entirely enclosed by the unbounded mauve continent. But the wraparound lattice make it hard to wrap your head around this notion. The two images in Figure 20 show exactly the same object as Figure 19, the only difference being that the origin of the coordinate system has moved, so that the center of the disk seems to lie on an edge or in a corner of the lattice. The indigo pond is still surrounded by the mauve continent, but it sure doesn’t look that way. In any case, why should boundedness determine which area survives the Monte Carlo process?

For me, the distinction between inside and outside began to make sense when I tried taking a more “local” view of the boundaries between regions, and the curvature of those boundaries. As noted in connection with Figure 11, boundaries are places where you can expect to find neutral lattice sites (i.e., \(\Delta E = 0\)), which are the only sites where a spin is likely to change orientation at low temperature.

I’ll spare you the trouble of counting the dots in Figure 21: There are 34 orange ones inside the pond but only 30 green ones outside. That margin could be significant. Because the dotted cells are likely to change state, the greater abundance of orange dots means there are more indigo cells ready to turn mauve than mauve cells that might become indigo. If the bias continues as the system evolves, the indigo region will steadily lose area and eventually be swallowed up.

But is there any reason to think the interior of the pond will *always* have a surplus of neutral sites susceptible to flipping?

What if the shape becomes a little more complicated? Perhaps the square pond grows a protuberance on one side, and an invagination on another, as in Figure 23. *Aha!* moment. There’s a conservation law, I thought. No matter how you alter the outline of the pond, the neutral sites inside will outnumber those outside by four.

This notion is not utterly crazy. If you walk clockwise around the boundary of the simple square pond in Figure 22, you will have made four right turns by the time you get back to your starting point. Each of those right turns creates a neutral cell in the interior of the pond—we’ll call them *innie turns*—where you can place an orange dot. A clockwise circuit of the modified pond in Figure 23, with its excrenscences and increscences, requires some left turns as well as right turns. *outie turn*—earning a green dot. But for every outie turn added to the perimeter, you’ll have to make an additional innie turn in order to get back to your starting point. Thus, except for the four original corners, innie and outie turns are like particles and antiparticles, always created and annihilated in pairs. A closed path, no matter how convoluted, always has exactly four more innies than outies. The four-turn differential is again on exhibit in the more elaborate example of Figure 24, where the orange dots prevail 17 to 13. Indeed, I assert that there are *always* four more innie turns than outie turns on the perimeter of any simple (*i.e.*, non-self-intersecting) closed curve on the square lattice. (I think this is one of those statements that is obviously true but not so simple to prove, like the claim that every simple closed curve on the plane has an inside and an outside.)

Unfortunately, even if the statement about counting right and left turns is true, the corresponding statement about orange and green dots is not. *not* come in matched innie/outie pairs. In this case there are more green dots than orange ones, which might be taken to suggest that the indigo area will grow rather than shrink.

In my effort to explain why ponds always evaporate, I seem to have reached a dead end. I should have known from the outset that the effort was doomed. I can’t prove that ponds *always* shrink because they don’t. The system is ergodic: Any state can be reached from any other state in a finite number of steps. In particular, a single indigo cell (a very small pond) can grow to cover the whole lattice. The sequence of steps needed to make that happen is improbable, but it certainly exists.

If proof is out of reach, maybe we can at least persuade ourselves that the pond shrinks with high probability. And we have a tool for doing just that: It’s called the Monte Carlo method. Figure 26 follows the fate of a \(25 \times 25\) square pond embedded in an otherwise blank lattice of \(100 \times 100\) sites, evolving under Glauber dynamics at a very low temperature \((T = 0.1)\). The uppermost curve, in indigo, shows the steady evaporation of the pond, dropping from the original \(625\) sites to \(0\) after about \(320\) macrosteps. The middle traces record the abundance of Swiss sites, orange for those inside the pond and green for those outside. Because of the low temperature, these are the only sites that have any appreciable likelihood of flipping. The black trace at the bottom is the difference between orange and green. For the most part it hovers at \(+4\), never exceeding that value but seldom falling much below it, and making only one brief foray into negative territory. Statistically speaking, the system appears to vindicate the innie/outie hypothesis. The pond shrinks because there are almost always more flippable spins inside than outside.

Figure 26 is based on a single run of the Monte Carlo algorithm. Figure 27 presents essentially the same data averaged over \(1{,}000\) Monte Carlo runs under the same conditions—starting with a \(25 \times 25\) square pond and applying Glauber dynamics at \(T = 0.1\).

The pond’s loss of area follows a remarkably linear path, with a steady rate very close to two lattice sites per Monte Carlo macrostep. And it’s clear that virtually all of these pondlike blocks of indigo cells disappear within a little more than \(300\) macrosteps, putting them in the tall peak at the short end of the lifetime distribution in Figure 18. None of them contribute to the long tail that extends out past \(10{,}000\) steps.

So much for the quick-drying ponds. What about the long-flowing rivers?

When the two sides join, everything changes. It’s not just a matter of adjusting the shape and size of the rectangle. There is no more rectangle! By definition, a rectangle has four sides and four right-angle corners. The object now occupying the lattice has only two sides and no corners. It may appear to have corners at the far left and right, but that’s an artifact of drawing the figure on a flat plane. It really lives on a torus, and the band of indigo cells is like a ring of chocolate icing that goes all the way around the doughnut. Or it’s a river—an endless river. You can walk along either bank as far as you wish, and you’ll never find a place to cross.

The topological difference between a pond and a river has dire consequences for Monte Carlo simulations of the Ising model. When the rectangle’s four corners disappeared, so did the four orange dots marking interior Swiss cells. Indeed, the river has no neutral cells at all, neither inside nor outside. At temperatures near zero, where neutral cells are the only ones that ever change state, the river becomes an all-but-eternal feature. The Monte Carlo process has no effect on it. The system is stuck in a metastable state, with no practical way to reach the lower-energy state of full magnetization.

When I first noticed how a river can block magnetization, I went looking to see what others might have written about the phenomonon. I found nothing. There was lots of talk about metastability in general, but none of the sources I consulted mentioned this particular topological impediment. I began to worry that I had made some blunder in programming or had misinterpreted what I was seeing. Finally I stumbled on a 2002 paper by Spirin, Krapivsky, and Redner that reports essentially the same observations and extends the discussion to three dimensions, where the problem is even worse.

A river with perfectly straight banks looks rather unnatural—more like a canal.

But that’s not what happens. The lower part of Figure 29 shows the same stretch of river after \(1{,}000\) Monte Carlo macrosteps at \(T = 0.1\). The algorithm has not amplified the sinuosity; on the contrary, it has shaved off the peaks and filled in the troughs, generally flattening the terrain. After \(5{,}000\) steps the river has returned to a perfectly straight course. No neutral cells remain, so no further change can be expected in any human time frame.

The presence or absence of four corners makes all the difference between ponds and rivers. Ponds shrink because the corners create a consistent bias: Sites subject to flipping are more numerous inside than outside, which means, over the long run, that the outside steadily encroaches on the inside’s territory. That bias does not exist for rivers, where the number of interior and exterior neutral sites is equal on average. Figure 30 records the inside-minus-outside difference for the first \(1{,}000\) steps of a simulation beginning with a sinusoidal river.

The difference hovers near zero, though with short-lived excursions both above and below; the mean value is \(+0.062\).

Even at somewhat higher temperatures, any pattern that crosses from one side of the grid to the other will stubbornly resist extinction. Figure 31 shows snapshots every \(1{,}000\) macrosteps in the evolution of a lattice at \(T = 1.0\), which is well below the critical temperature but high enough to allow a few energetically unfavorable spin flips. The starting configuration was a sinusoidal river, but by \(1{,}000\) steps it has already become a thick, lumpy band. In subsequent snapshots the ribbon grows thicker and thinner, migrates up and down—and then abruptly disappears, sometime between the \(8{,}000\)th and the \(9{,}000\)th macrostep.

Swiss cells, with equal numbers of friends and enemies among their neighbors, appear wherever a boundary line takes a turn. All the rest of the sites along a boundary—on both sides—have three friendly neighbors and one enemy neighbor. At a site of this kind, flipping a spin carries an energy penalty of \(\Delta E = +4\). At \(T = 1.0\) the probability of such an event is roughly \(1/50\). In a \(10{,}000\)-site lattice crossed by a river there can be as many as \(200\) of these three-against-one sites, so we can expect to see a few of them flip during every macrostep. Thus at \(T = 1.0\) the river is not a completely static formation, as it is at temperatures closer to zero. The channel can shift or twist, grow wider or narrower. But these motions are glacially slow, not only because they depend on somewhat rare events but also because the probabilities are unbiased. At every step the river is equally likely to grow wider or narrower; on average, it goes nowhere.

In one last gesture to support my claim that short-lived patterns are ponds and long-lived patterns are rivers I offer Figure 32:

A troubling question is whether these uncrossable rivers that block full magnetization in Ising models also exist in physical ferromagnets. It seems unlikely. The rivers I describe above owe their existence to the models’ wraparound boundary conditions. The crystal lattices of real magnetic materials do not share that topology. Thus it seems that metastability may be an artifact or a mere incidental feature of the model, not something present in nature.

Statistical mechanics is generally formulated in terms of systems without boundaries. You construct a theory of \(N\) particles, but it’s truly valid only in the “thermodynamic limit,” where \(N \to \infty\). Under this regime the two-dimensional Ising model would be studied on a lattice extending over an infinite plane. Computer models can’t do that, and so we wind up with tricks like wraparound boundary conditions, which can be considered a hack for faking infinity.

It’s a pretty good hack. As in an infinite lattice, every site has the same local environment, with exactly four neighbors, who also have four neighbors, and so on. There are no edges or corners that require special treatment. For these reasons wraparound or periodic boundary conditions have always been the most popular choice for computational models in the sciences, going back at least as far as 1953. Still, there are glitches. If you were standing on a wraparound lattice, you could walk due north forever, but you’d keep passing your starting point again and again. If you looked into the distance, you’d see the back of your own head. For the Ising model, perhaps the most salient fact is this: On a genuinely infinite plane, every simple, finite, closed curve is a pond; no finite structure can behave like a river, transecting the entire surface so that you can’t get around it. Thus the wraparound model differs from the infinite one in ways that may well alter important conclusions.

These defects are a little worrisome. On the other hand, physical ferromagnets are also mere finite approximations to the unbounded spaces of thermodynamic theory. A single magnetic domain might have \(10^{20}\) atoms, which is large compared with the \(10^4\) sites in the models presented here, but it’s a long ways short of infinity. The domains have boundaries, which can have a major influence on their properties. All in all, it seems like a good idea to explore the space of possibile boundary conditions, including some alternatives to the wraparound convention. Hence Program 4:

An extra row of cells around the perimeter of the lattice serves to make the boundary conditions visible in this simulation. The cells in this halo layer are not active participants in the Ising process; they serve as neighbors to the cells on the periphery of the lattice, but their own states are not updated by the Monte Carlo algorithm. To mark their special role, their up and down states are indicated by red and pink instead of indigo and mauve.

The behavior of wraparound boundaries is already familiar. If you examine the red/pink stripe along the right edge of the grid, you will see that it matches the leftmost indigo/mauve column. Similar relations determine the patterns along the other edges.

The two simplest alternatives to the wraparound scheme are static borders made up of cells that are always up or always down. You can probably guess how they will affect the outcome of the simulation. Try setting the temperature around 1.5 or 2.0, then click back and forth between *all up* and *all down* as the program runs. The border color quickly invades the interior space, encircling a pond of the opposite color and eventually squeezing it down to nothing. Switching to the opposite border color brings an immediate re-enactment of the same scene with all colors reversed. The biases are blatant.

Another idea is to assign the border cells random values, chosen independently and with equal probability. A new assignment is made after every macrostep. Randomness is akin to high temperature, so this choice of boundary condition amounts to an Ising lattice surrounded by a ring of fire. There is no bias in favor of up or down, but the stimulation from the sizzling periphery creates recurrent disturbances even at temperatures near zero, so the system never attains a stable state of full magnetization.

Before I launched this project, my leading candidate for a better boundary condition was a zero border. This choice is equivalent to an “open” or “free” boundary, or to no boundary at all—a universe that just ends in blankness. Implementing open boundaries is slightly irksome because cells on the verge of nothingness require special handling: Those along the edges have only three neighbors, and those in the corners only two. A zero boundary produces the same effect as a free boundary without altering the neighbor-counting rules. The cells of the outer ring all have a numerical value of \(0\), indicated by gray. For the interior cells with numerical values of \(+1\) and \(-1\), the zero cells act as neighbors without actually contributing to the \(\Delta E\) calculations that determine whether or not a spin flips.

The zero boundary introduces no bias favoring up or down, it doesn’t heat or cool the system, and it doesn’t tamper with the topology, which remains a simple square embedded in a flat plane. Sounds ideal, no? However, it turns out the zero boundary has a lot in common with wraparound borders. In particular, it allows persistent rivers to form—or maybe I should call them lakes. I didn’t see this coming before I tried the experiment, but it’s not hard to understand what’s happening. On the wraparound lattice, elongating a rectangle until two opposite edges meet eliminates the Swiss cells at the four corners. The same thing happens when a rectangle extends all the way across a lattice with zero borders. The corner cells, now up against the border, no longer have two friendly and two enemy neighbors; instead they have two friends, one enemy, and one cel of spin zero, for a net \(\Delta E\) of \(+1\).

A pleasant surprise of these experiments was the boundary type I have labeled *sampled*. The idea is to make the boundary match the statistics of the interior of the lattice, but without regard to the geometry of any patterns there. For each border cell \(b\) we select an interior cell \(s\) at random, and assign the color of \(s\) to \(b\). The procedure is repeated after each macrostep. The border therefore maintains the same up/down proportion as the interior lattice, and always favors the majority. If the spins are evenly split between mauve and indigo, the border region shows no bias; as soon as the balance begins to tip, however, the border shifts in the same direction, supporting and hastening the trend.

If you tend to root for the underdog, this rule is not for you—but we can turn it upside down, assigning a color opposite that of a randomly chosen interior cell. The result is interesting. Magnetization is held near \(0\), but at low temperature the local correlation coefficient approaches \(1\). The lattice devolves into two large blobs of no particular shape that circle the ring like wary wrestlers, then eventually reach a stable truce in which they split the territory either vertically or horizontally. This behavior has no obvious bearing on ferromagnetism, but maybe there’s an apt analogy somewhere in the social or political sciences.

The curves in Figure 33 record the response to a sudden temperature step in systems using each of six boundary conditions. The all-up and all-down boundaries converge the fastest—which is no surprise, since they put a thumb on the scale. The response of the sampled boundary is also quick, reflecting its weathervane policy of supporting the majority. The random and zero boundaries are the slowest; they follow identical trajectories, and I don’t know why. Wraparound is right in the middle of the pack. All of these results are for Glauber dynamics, but the curves for the Metropolis algorithm are very similar.

The menu in Program 4 has one more choice, labeled *twisted*. I wrote the code for this one in response to the question, “I wonder what would happen if…?” Twisted is the same as wraparound, except that one side is given a half-twist before it is mated with the opposite edge. Thus if you stand on the right edge near the top of the lattice and walk off to the right, you will re-enter on the left near the bottom. The object formed in this way is not a torus but a Klein bottle—a “nonorientable surface without boundary.” All I’m going to say about running an Ising model on this surface is that the results are not nearly as weird as I expected. See for yourself.

I have one more toy to present for your amusement: the MCMC microscope. It was the last program I wrote, but it should have been the first.

All of the programs above produce movies with one frame per macrostep. In that high-speed, high-altitude view it can be hard to see how individual lattice sites are treated by the algorithm. The MCMC microscope provides a slo-mo close-up, showing the evolution of a Monte Carlo Ising system one microstep at a time. The algorithm proceeds from site to site (in an order determined by the visitation sequence) and either flips the spin or not (according to the acceptance function).

As the algorithm proceeds, the site currently under examination is marked by a hot-pink outline. Sites that have yet to be visited are rendered in the usual indigo or mauve; those that have already had their turn are shown in shades of gray. The *Microstep* button advances the algorithm to the next site (determined by the visitation sequence) and either flips the spin or leaves it as-is (according to the acceptance function). The *Macrostep* button performs a full sweep of the lattice and then pauses; the *Run* button invokes a continuing series of microsteps at a somewhat faster pace. Some adjustments to this protocol are needed for the simultaneous update option. In this mode no spins are changed during the scan of the lattice, but those that *will* change are marked with a small square of constrasting gray. At the end of the macrostep, all the changes are made at once.

The *Dotted Swiss* checkbox paints orange and green dots on neutral cells (those with equal numbers of friendly and enemy neighbors). Doodle mode allows you to draw on the lattice via mouse clicks and thereby set up a specific initial pattern.

I’ve found it illuminating to draw simple geometric figures in doodle mode, then watch as they are transformed and ultimately destroyed by the various algorithms. These experiments are particularly interesting with the Metropolis algorithm at very low temperature. Under these conditions the Monte Carlo process—despite its roots in randomness—becomes very nearly deterministic. Cells with \(\Delta E \le 0\) always flip; other cells never do. (What, never? Well, hardly ever.) Thus we can speak of what happens when a program is run, rather than just describing the probability distribution of possible outcomes.

Here’s a recipe to try: Set the temperature to its lower limit, choose doodle mode, *down* initialization, typewriter visitation, and the *M*-rule acceptance function. Now draw some straight lines on the grid in four orientations: vertical, horizontal, and along both diagonals. Each line can be six or seven cells long, but don’t let them touch. Lines in three of the four orientations are immediately erased when the program runs; they disappear after the first macrostep. The one survivor is the diagonal line oriented from lower left to upper right, or southwest to northeast. With each macrostep the line migrates one cell to the left, and also loses one site at the bottom. This combination of changes gives the subjective impression that the pattern is moving not only left but also upward. I’m pretty sure that this phenomenon is responsible for the fluttering wings illusion seen at much higher temperatures (and higher animation speeds).

If you perform the same experiment with the diagonal visitation order, you’ll see exactly the same outcomes. A question I can’t answer is whether there is any pattern that serves to discriminate between the typewriter and diagonal orders. What I’m seeking is some arrangement of indigo cells on a mauve background that I could draw on the grid and then look away while you ran one algorithm or the other for some fixed number of macrosteps (which I get to specify). Afterwards, I win if I can tell which visitation sequence you chose.

The checkerboard algorithm is also worth trying with the four line orientations. The eventual outcome is the same, but the intermediate stages are quite different.

Finally I offer a few historical questions that seem hard to settle, and some philosophical musings on what it all means.

The name, of course, is an allusion to the famous casino, a prodigious producer and consumer of randomness. Nicholas Metropolis claimed credit for coming up with the term. In a 1987 retrospective he wrote:

It was at that time [spring of 1947] that I suggested an obvious name for the statistical method—a suggestion not unrelated to the fact that Stan [Ulam] had an uncle who would borrow money from relatives because he “just had to go to Monte Carlo.”

An oddity of this story is that Metropolis was not at Los Alamos in 1947. He left after the war and didn’t return until 1948.

Ulam’s account of the matter does not contradict the Metropolis version, *does* contradict Metropolis, writing: “The basic idea, as well as the name was due to Stan Ulam originally.” But Rosenbluth wasn’t at Los Alamos in 1947 either.

It seems to me that the name Monte Carlo contributed very much to the popularization of this procedure. It was named Monte Carlo because of the element of chance, the production of random numbers with which to play the suitable games.

Note the anonymous passive voice: “It was named…,” with no hint of by whom. If Ulam was so carefully noncommittal, who am I to insist on a definite answer?

As far as I know, the phrase “Monte Carlo method” first appeared in public print in 1949, in an article co-authored by Metropolis and Ulam. Presumably the term was in use earlier among the denizens of the Los Alamos laboratory. Daniel McCracken, in a 1955 *Scientific American* article, said it was a code word invented for security reasons. This is not implausible. Code words were definitely a thing at Los Alamos (the place itself was designated “Project Y”), but I’ve never seen the code word status of “Monte Carlo” corroborated by anyone with first-hand knowledge.

To raise the question, of course, is to hint that it was not Metropolis.

The 1953 paper that introduced Markov chain Monte Carlo, “Equation of State Calculations by Fast Computing Machines,” had five authors, who were listed in alphabetical order: Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. The two Rosenbluths were wife and husband, as were the two Tellers. Who did what in this complicated collaboration? Apparently no one thought to ask that question until 2003, when J. E. Gubernatis of Los Alamos was planning a symposium to mark the 50th anniversary of MCMC. He got in touch with Marshall Rosenbluth, who was then in poor health. Nevertheless, Rosenbluth attended the gathering, gave a talk, and sat for an interview. (He died a few months later.)

According to Rosenbluth, the basic idea behind MCMC—sampling the states of a system according to their Boltzmann weight, while following a Markov chain from one state to the next—came from Edward Teller. Augusta Teller wrote a first draft of a computer program to implement the idea. Then the Rosenbluths took over. In particular, it was Arianna Rosenbluth who wrote the program that produced all the results reported in the 1953 paper. Gubernatis adds:

Marshall’s recounting of the development of the Metropolis algorithm first of all made it very clear that Metropolis played no role in its development other than providing computer time.

In his interview, Rosenbluth was even blunter: “Metropolis was boss of the computer laboratory. We never had a single scientific discussion with him.”

These comments paint a rather unsavory portrait of Metropolis as a credit mooch. I don’t know to what extent that harsh verdict might be justified. In his own writings, Metropolis makes no overt claims about his contributions to the work. On the other hand, he also makes no *dis*claimers; he never suggests that someone else’s name might be more appropriately attached to the algorithm.

An interesting further question is who actually wrote the 1953 paper—who put the words together on the page. Internal textual evidence suggests there were at least two writers. Halfway through the article there’s a sudden change of tone, from gentle exposition to merciless technicality.

In recent years the algorithm has acquired the hyphenated moniker Metropolis-Hastings, acknowledging the contributions of W. Keith Hastings, a Canadian mathematician and statistician. Hastings wrote a 1970 paper that generalized the method, showing it could be applied to a wider class of problems, with probability distributions other than Boltzmann’s. Hastings is also given credit for rescuing the technique from captivity among the physicists and bringing it home to statistics, although it was another 20 years before the statistics community took much notice.

I don’t know who started the movement to name the generalized algorithm “Metropolis-Hastings.” The hyphenated term was already fairly well established by 1995, when Siddhartha Chib and Edward Greenberg put it in the title of a review article.

In this case there is no doubt or controversy about authorship. Glauber wrote the 1963 paper, and he did the work reported in it. On the other hand, Glauber did *not* invent the Monte Carlo algorithm that now goes by the name “Glauber dynamics.” His aim in tackling the Ising model was to find exact, mathematical solutions, in the tradition of Ising and Onsager. (Those two authors are the only ones cited in Glauber’s paper.) He never mentions Monte Carlo methods or any other computational schemes.

So who *did* devise the algorithm? The two main ingredients—the *G*-rule and the random visitation sequence—were already on the table in the 1950s. A form of the *G*-rule acceptance function \(e^{-\Delta E/T} / (1 + e^{-\Delta E/T})\) was proposed in 1954 by John G. Kirkwood of Yale University, a major figure in statistical mechanics at midcentury. He suggested it to the Los Alamos group as an alternative to the *M*-rule. Although the suggestion was not taken, the group did acknowledge that it would produce valid simulations. The random visitation sequence was used in a followup study by the Los Alamos group in 1957. (By then the group was led by William W. Wood, who had been a student of Kirkwood.)

Those two ingredients first came together a few years later in work by P. A. Flinn and G. M. McManus, who were then at Westinghouse Research in Pittsburgh. Their 1961 paper describes a computer simulation of an Ising model with both random visitation order and the \(e^{-\Delta E/T} / (1 + e^{-\Delta E/T})\) acceptance function, two years before Glauber’s article appeared. On grounds of publication priority, shouldn’t the Monte Carlo variation be named for Flinn and McManus rather than Glauber?

For a while, it was. There were dozens of references to Flinn and McManus throughout the 1960s and 70s. For example, an article by G. W. Cunningham and P. H. E. Meijer compared and evaluated the two main MCMC methods, identifying them as algorithms introduced by “Metropolis *et al*.” and by “Flinn and McManus.” A year later another compare-and-contrast article by John P. Valleau and Stuart G. Whittington adopted the same terminology. Neither of these articles mentions Glauber.

According to Semantic Scholar, the phrase “Glauber dynamics” first appeared in the physics literature in 1977, in an article by Ph. A. Martin. But this paper is a theoretical work, with no computational component, along the same lines as Glauber’s own investigation. Among the Semantic Scholar listings, “Glauber dynamics” was first mentioned in the context of Monte Carlo studies by A. Sadiq and Kurt Binder, in 1984. After that, the balance shifted strongly toward Glauber.

In bringing up the disappearance of Flinn and McManus from the Ising and Monte Carlo literature, I don’t mean to suggest that Glauber doesn’t deserve his recognition. His main contribution to studies of the Ising model—showing that it could give useful results away from equilibrium—is of the first importance. On the other hand, attaching his name to a Monte Carlo algorithm is unhelpful. If you turn to his 1963 paper to learn about the origin of the algorithm, you’ll be disappointed.

One more oddity. I have been writing the *G*-rule as

\[\frac{e^{-\Delta E/T}}{1 + e^{-\Delta E/T}},\]

which is the way it appeared in Flinn and McManus, as well as in many recent accounts of the algorithm. However, nothing resembling this expression is to be found in Glauber’s paper. Instead he defined the rule in terms of the hyperbolic tangent. Reconstructing various bits of his mathematics in a form that could serve as a Monte Carlo acceptance function, I come up with:

\[\frac{1}{2}\left(1 -\tanh \frac{\Delta E}{2 T}\right).\]

The two expressions are mathematically synonymous, but the prevalence of the first form suggests that some authors who cite Glauber rather than Flinn and McManus are not getting their notation from the paper they cite.

When I first heard of the Ising model, sometime in the 1970s, I would read statements along the lines of “as the system cools to the critical temperature, fluctuations grow in scale until they span the entire lattice.” I wanted to *see* what that looked like. What kinds of patterns or textures would appear, and how they would evolve over time? In those days, live motion graphics were too much to ask for, but it seemed reasonable to expect at least a still image, or perhaps a series of them covering a range of temperatures.

In my reading, however, I found no pictures. Part of the reason was surely technological. Turning computations into graphics wasn’t so easy in those days. But I suspect another motive as well. A computational scientist who wanted to be taken seriously was well advised to focus on quantitative results. A graph of magnetization as a function of temperature was worth publishing; a snapshot of a single lattice configuration might seem frivolous—not real physics but a plaything like the Game of Life. Nevertheless, I still yearned to see what it would look like.

In 1979 I had an opportunity to force the issue. I was working with Kenneth G. Wilson, a physicist then at Cornell University, on a *Scientific American* article about “Problems in Physics with Many Scales of Length.” The problems in question included the Ising model, and I asked Wilson if he could produce pictures showing spin configurations at various temperatures. He resisted; I persisted; a few weeks later I received a fat envelope of fanfold paper, covered in arrays of \(1\)s and \(0\)s. With help from the *Scientific American* art department the numbers were transformed into black and white squares:

This particular image, one of three we published, is at the critical temperature. Wilson credited his colleagues Stephen Shenker and Jan Tobochnik for writing the program that produced it.

The lattice pictures made by Wilson, Shenker, and Tobochnik were the first I had ever seen of an Ising model at work, but they were not the first to be published. In recent weeks I’ve discovered a 1974 paper by P. A. Flinn in which black-and-white spin tableaux form the very centerpiece of the presentation. Flinn discusses aspects of the appearance of these grids that would be very hard to reduce to simple numerical facts:

Phase separation may be seen to occur by the formation and growth of clusters, but they look rather more like “seaweed” than like the roughly round clusters of traditional theory. The structures look somewhat like those observed in phase-separated glass.

I also found one even earlier instance of lattice diagrams, in a 1963 paper by J. R. Beeler, Jr., and J. A. Delaney. Are they the first?

Modeling calls for a curious mix of verisimilitude and fakery. A miniature steam locomotive chugging along the tracks of a model railroad reproduces in meticulous detail the pistons and linkage rods that drive the wheels of the real locomotive. But in the model it’s the wheels that impart motion to the links and pistons, not the other way around. The model’s true power source is hidden—an electric motor tucked away inside, where the boiler ought to be.

Scientific models also rely on shortcuts and simplifications. In a physics textbook you will meet the ideal gas, the frictionless pendulum, the perfectly elastic spring, the falling body that encounters no air resistance, the planet whose entire mass is concentrated at a dimensionless point. Such idealizations are not necessarily defects. By brushing aside irrelevant details, a good model allows a deeper truth to shine through. The problem, of course, is that some details are not irrelevant.

The Ising model is a fascinating case study in this process. Lenz and Ising set out to explain ferromagnetism, and almost all later discussions of the model (including the one you are reading right now) put some emphasis on that connection. The original aim was to find the simplest framework that would exhibit important properties of real ferromagnets, most notably the sudden onset of magnetization at the Curie temperature. As far as I can tell, the Ising model has failed in this respect. Some of the omitted details were of the essence; quantum mechanics just won’t go away, no matter how much we might like it to. These days, serious students of magnetism seem to have little interest in simple grids of flipping spins. A 2006 review of “modeling, analysis, and numerics of ferromagnetism,” by Martin Kružík and Andreas Prohl, doesn’t even mention the Ising model.

Yet the model remains wildly popular, the subject of hundreds of papers every year. Way back in 1967, Stephen G. Brush wrote that the Ising model had become “the preferred basic theory of all cooperative phenomena.” I’d go even further. I think it’s fair to say the Ising model has become an object of study for its own sake. The quest is to understand the phase diagram of the Ising system itself, whether or not it tells us anything about magnets or other physical phenomena.

Uprooting the Ising system from its ancestral home in physics leaves us with a model that is not a model *of* anything. It’s like a map of an imaginary territory; there is no ground truth. You can’t check the model’s accuracy by comparing its predictions with the results of experiments.

Seeing the Ising model as a free-floating abstraction, untethered from the material world, is a prospect I find exhilarating. We get to make our own universe—and we’ll do it right this time, won’t we! However, losing touch with physics is also unsettling. On what basis are we to choose between versions of the model, if not through fidelity to nature? Are we to be guided only by taste or convenience? A frequent argument in support of Glauber dynamics is that it seems more “natural” than the Metropolis algorithm. I would go along with that judgment: The random visitation sequence and the smooth, symmetrical curve of the *G*-rule both seem more like something found in nature than the corresponding Metropolis apparatus. But does naturalness matter if the model is solely a product of human imagination?

Bauer, W. F. 1958. The Monte Carlo method. *Journal of the Society for Industrial and Applied Mathematics* 6(4):438–451.

http://www.cs.fsu.edu/~mascagni/Bauer_1959_Journal_SIAM.pdf

Beeler, J. R. Jr., and J. A. Delaney. 1963. Order-disorder events produced by single vacancy migration. *Physical Review* 130(3):962–971.

Binder, Kurt. 1985. The Monte Carlo method for the study of phase transitions: a review of some recent progress. *Journal of Computational Physics* 59:1–55.

Binder, Kurt, and Dieter W. Heermann. 2002. *Monte Carlo Simulation in Statistical Physics: An Introduction*. Fourth edition. Berlin: Springer-Verlag.

Brush, Stephen G. 1967. History of the Lenz-Ising model. *Reviews of Modern Physics* 39:883-893.

Chib, Siddhartha, and Edward Greenberg. 1995. Understanding the Metropolis-Hastings Algorithm. *The American Statistician* 49(4): 327–335.

Cipra, Barry A. 1987. An introduction to the Ising model. *American Mathematical Monthly* 94:937-959.

Cunningham, G. W., and P. H. E. Meijer. 1976. A comparison of two Monte Carlo methods for computations in statistical mechanics. *Journal Of Computational Physics* 20:50-63.

Davies, E. B. 1982. Metastability and the Ising model. *Journal of Statistical Physics* 27(4):657–675.

Diaconis, Persi. 2009. The Markov chain Monte Carlo revolution. *Bulletin of the American Mathematical Society* 46(2):179–205.

Eckhardt, R. 1987. Stan Ulam, John von Neumann, and the Monte Carlo method. *Los Alamos Science* 15:131–137.

Flinn, P. A., and G. M. McManus. 1961. Monte Carlo calculation of the order-disorder transformation in the body-centered cubic lattice. *Physical Review* 124(1):54–59.

Flinn, P. A. 1974. Monte Carlo calculation of phase separation in a two-dimensional Ising system. *Journal of Statistical Physics* 10(1):89–97.

Fosdick, L. D. 1963. Monte Carlo computations on the Ising lattice. *Methods in Computational Physics* 1:245–280.

Geyer, Charles J. 2011. Intorduction to Markov chain Monte Carlo. In *Handbook of Markov Chain Monte Carlo*, edited by Steve Brooks, Andrew Gelman, Galin Jones and Xiao-Li Meng, pp. 3–48. Boca Raton: Taylor & Francis.

Glauber, R. J., 1963. Time-dependent statistics of the Ising model. *Journal of Mathematical Physics* 4:294–307.

Gubernatis, James E. (editor). 2003. *The Monte Carlo Method in the Physical Sciences: Celebrating the 50th Anniversary of the Metropolis Algorithm*. Melville, N.Y.: American Institute of Physics.

Halton, J. H. 1970. A retrospective and prospective survey of the Monte Carlo method. *SIAM Review* 12(1):1–63.

Hammersley, J. M., and D. C. Handscomb. 1964. *Monte Carlo Methods*. London: Chapman and Hall.

Hastings, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. *Biometrika* 57(1):97–109.

Hayes, Brian. 2000. Computing science: The world in a spin. *American Scientist*, Vol. 88, No. 5, September-October 2000, pages 384-388. http://bit-player.org/bph-publications/AmSci-2000-09-Hayes-Ising.pdf

Hitchcock, David B. 2003. A history of the Metropolis–Hastings algorithm. *The American Statistician* 57(4):254–257. https://doi.org/10.1198/0003130032413

Hurd, Cuthbert C. 1985. A note on early Monte Carlo computations and scientific meetings. *Annals of the History of Computing* 7(2):141–155.

Ising, Ernst. 1925. Beitrag zur Theorie des Ferromagnetismus. *Zeitschrift für Physik* 31:253–258.

Janke, Wolfhard, Henrik Christiansen, and Suman Majumder. 2019. Coarsening in the long-range Ising model: Metropolis versus Glauber criterion. *Journal of Physics: Conference Series*, Volume 1163, International Conference on Computer Simulation in Physics and Beyond 24–27 September 2018, Moscow, Russian Federation. https://iopscience.iop.org/article/10.1088/1742-6596/1163/1/012002

Kobe, S. 1995. Ernst Ising—physicist and teacher. Actas: Noveno Taller Sur de Fisica del Solido, Misión Boroa, Chile, 26–29 April 1995. http://arXiv.org/cond-mat/9605174

Kružík, Martin, and Andreas Prohl. 2006. Recent developments in the modeling, analysis, and numerics of ferromagnetism. *SIAM Review* 48(3):439–483.

Liu, Jun S. 2004. *Monte Carlo Strategies in Scientific Computing*. New York: Springer Verlag.

Lu, Wentao T., and F. Y. Fu. 2000. Ising model on nonorientable surfaces: Exact solution for the Möbius strip and the Klein bottle. arXiv: cond-mat/0007325

Martin, Ph. A. 1977. On the stochastic dynamics of Ising models. *Journal of Statistical Physics* 16(2):149–168.

McCoy, Barry M., and Jean-Marie Maillard. 2012. The importance of the Ising model. *Progress in Theoretical Physics* 127:791-817. https://arxiv.org/abs/1203.1456v1

McCracken, Daniel D. 1955. The Monte Carlo method. *Scientific American* 192(5):90–96.

Metropolis, Nicholas, and S. Ulam. 1949. The Monte Carlo method. *Journal of the American Statistical Association* 247:335–341.

Metropolis, Nicholas, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. Equation of state calculations by fast computing machines. *The Journal of Chemical Physics* 21(6):1087–1092.

Metropolis, N. 1987. The beginning of the Monte Carlo method. *Los Alamos Science* 15:125–130.

Moore, Cristopher, and Stephan Mertens. 2011. *The Nature of Computation*. Oxford: Oxford University Press.

Onsager, Lars. 1944. Crystal Statistics. I. A two-dimensional model with an order-disorder transition. *Physical Review* 65:117–149.

Pérez, Gabriel, Francisco Sastre, and Rubén Medina. 2002. Critical exponents for extended dynamical systems with simultaneous updating: the case of the Ising model. *Physica D* 168–169:318–324.

Peierls, R. 1936. On Ising’s model of ferromagnetism. *Proceedings of the Cambridge Philosophical Society, Mathematical and Physical Sciences* 32:477–481.

Peskun, P. H. 1973. Optimum Monte-Carlo sampling using Markov chains. *Biometrika* 60(3):607–612.

Richey, Matthew. 2010. The evolution of Markov chain Monte Carlo methods. *American Mathematical Monthly* 117:383–413.

Robert, Christian, and George Casella. 2011. A short history of Markov chain Monte Carlo: Subjective recollections from incomplete data. *Statistical Science* 26(1):102–115. Also appears as a chapter in *Handbook of Markov Chain Monte Carlo*. Also available as arXiv preprint 0808.2902v7.

Rosenbluth, Marshall N. 2003a. Genesis of the Monte Carlo algorithm for statistical mechanics. AIP Conference Proceedings 690, 22. https://doi.org/10.1063/1.1632112.

Rosenbluth, Marshall N. 2003b. Interviewe with Kai-Henrik Barth, La Jolla, California, August 11, 2003. Niels Bohr Library & Archives, American Institute of Physics, College Park, MD. https://www.aip.org/history-programs/niels-bohr-library/oral-histories/28636-1

Sadiq, A., and K. Binder. 1984. Dynamics of the formation of two-dimensional ordered structures. *Journal of Statistical Physics* 35(5/6):517–585.

Spirin, V., P. L. Krapivsky, and S. Redner. 2002. Freezing in Ising ferromagnets. *Physical Review E* 65(1):016119. https://arxiv.org/abs/cond-mat/0105037.

Stigler, Stephen M. 1991. Stochastic simulation in the nineteenth century. Statistical Science 6:89-97.

Stoll, E., K. Binder, and T. Schneider. 1973. Monte Carlo investigation of dynamic critical phenomena in the two-dimensional kinetic Ising model. *Physical Review B* 8(7):3266–3289.

Tierney, Luke. 1994. Markov chains for exploring posterior distributions. *The Annals of Statistics* 22(4):1701–1762.

Ulam, Stanislaw M. 1976, 1991. *Adventures of a Mathematician*. Berkeley: University of California Press.

Valleau, John P., and Stuart G. Whittington. 1977. Monte Carlo in statistical mechanics: choosing between alternative transition matrices. *Journal of Computational Physics *24:150–157.

Wilson, Kenneth G. 1979. Problems in physics with many scales of length. *Scientific American* 241(2):158–179.

Wood, W. W. 1986. Early history of computer simulations in statistical mechanics. In *Molecular-Dynamics Simulation of Statistical-Mechanics Systems*, edited by G. Ciccotti and W. G. Hoover (North-Holland, New York): pp. 3–14. https://digital.library.unt.edu/ark:/67531/metadc1063911/m1/1/

Wood, William W. 2003. A brief history of the use of the Metropolis method at LANL in the 1950s. AIP Conference Proceedings 690, 39. https://doi.org/10.1063/1.1632115

]]>This flimsy slip of paper seems like an odd scrap to preserve for the ages, but when I pulled it out of the envelope, I knew instantly where it came from and why I had saved it.

The year was 1967. I was 17 then; I’m 71 now. Transposing those two digits takes just a flick of the fingertips. I can blithely skip back and forth from one prime number to the other. But the span of lived time between 1967 and 2021 is a chasm I cannot so easily leap across. At 17 I was in a great hurry to grow up, but I couldn’t see as far as 71; I didn’t even try. Going the other way—revisiting the mental and emotional life of an adolescent boy—is also a journey deep into alien territory. But the straw wrapper helps—it’s a Proustian *aide memoire*.

In the spring of 1967 I had a girlfriend, Lynn. After school we would meet at the Maple Diner, where the booths had red leatherette upholstery and formica tabletops with a boomerang motif. We’d order two Cokes and a plate of french fries to share. The waitress liked us; she’d make sure we had a full bottle of ketchup. I mention the ketchup because it was a token of our progress toward intimacy. On our first dates Lynn had put only a dainty dab on her fries, but by April we were comfortable enough to reveal our true appetites.

One afternoon I noticed she was fiddling intently with the wrapper from her straw, folding and refolding. I had no idea what she was up to. A teeny paper airplane she would sail over my head? When she finished, she pushed her creation across the table:

What a wallop there was in that little wad of paper. At that point in our romance, the words had not yet been spoken aloud.

How did I respond to Lynn’s folded declaration? I can’t remember; the words are lost. But evidently I got through that awkward moment without doing any permanent damage. A year later Lynn and I were married.

Today, at 71, with the preserved artifact in front of me, my chief regret is that I failed to take up the challenge implicit in the word game Lynn had invented. Why didn’t I craft a reply by folding my own straw wrapper? There are quite a few messages I could have extracted by strategic deletions from “It’s a pleasure to serve you.”

itsapleasuretoserveyou==> I love you.

itsapleasuretoserveyou==> I please you.

itsapleasuretoserveyou==> I tease you.

itsapleasuretoserveyou==> I pleasure you.

itsapleasuretoserveyou==> I pester you.

itsapleasuretoserveyou==> I peeve you.

itsapleasuretoserveyou==> I salute you.

itsapleasuretoserveyou==> I leave you.

Not all of those statements would have been suited to the occasion of our rendezvous at the Maple Diner, but over the course of our years together—17 years, as it turned out—there came a moment for each of them.

How many words can we form by making folds in the straw-paper slogan? I could not have answered that question in 1967. I couldn’t have even asked it. But times change. Enumerating all the foldable messages now strikes me as an obvious thing to do when presented with the straw wrapper. Furthermore, I have the computational means to do it—although the project was not quite as easy as I expected.

A first step is to be explicit about the rules of the game. We are given a source text, in this case “It’s a pleasure to serve you.” Let us ignore the spaces between words as well as all punctuation and capitalization; in this way we arrive at the normalized text “itsapleasuretoserveyou”. A word is *foldable* if all of its letters appear in the normalized text in the correct order (though not necessarily consecutively). The folding operation amounts to an editing process in which our only permitted act is deletion of letters; we are not allowed to insert, substitute, or permute. If two or more foldable words are to be combined to make a phrase or sentence, they must follow one another in the correct order without overlaps.

So much for foldability. Next comes the fraught question: What is a word? Linguists and lexicographers offer many subtly divergent opinions on this point, but for present purposes a very simple definition will suffice: A finite sequence of characters drawn from the 26-letter English alphabet is a word if it can legally be played in a game of Scrabble. I have been working with a word list from the 2015 edition of Collins Scrabble Words, which has about 270,000 entries. (There are a number of alternative lists, which I discuss in an appendix at the end of this article.)

Scrabble words range in length from 2 to 15 letters. The upper limit—determined by the size of the game board—is not much of a concern. You’re unlikely to meet a straw-paper text that folds to yield words longer than *sesquipedalian*. The absence of 1-letter words is more troubling, but the remedy is easy: I simply added the words *a*, *I*, and *O* to my copy of the Scrabble list.

My first computational experiments with foldable words searched for examples at random. Writing a program for random sampling is often easier than taking an exact census of a population, and the sample offers a quick glimpse of typical results. The following Python procedure generates random foldable sequences of letters drawn from a given source text, then returns those sequences that are found in the Scrabble word list. (The parameter *k* is the length of the words to be generated, and *reps* specifies the number of random trials.)

```
def randomFoldableWords(text, lexicon, k, reps):
normtext = normalize(text)
n = len(normtext)
findings = []
for i in range(reps):
indices = random.sample(range(n), k)
indices.sort()
letters = ""
for idx in indices:
letters += normtext[idx]
if letters in lexicon:
findings.append(letters)
return findings
```

Here are the six-letter foldable words found by invoking the program as follows: `randomFoldableWords(scrabblewords, 6, 10000)`

.

please, plater, searer, saeter, parter, sleety, sleeve, parser, purvey, laster, islets, taster, tester, slarts, paseos, tapers, saeter, eatery, salute, tsetse, setose, salues, sparer

Note that the word saeter (you could look it up—I had to) appears twice in this list. The frequency of such repetitions can yield an estimate of the total population size. A variant of the mark-and-recapture method, well-known in wildlife ecology, led me to an estimate of 92 six-letter foldable Scrabble words in the straw-wrapper slogan. The actual number turns out to be 106.

Samples and estimates are helpful, but they leave me wondering, What am I missing? What strange and beautiful word has failed to turn up in any of the samples, like the big fish that never takes the bait? I had to have an exhaustive list.

In many word games, the tool of choice for computer-aided playing (or cheating) is the regular expression, or regex. A regex is a pattern defining a set of strings, or character sequences; from a collection of strings, a regex search will pick out those that match the pattern. For example, the regular expression `^.*love.*$`

selects from the Scrabble word list all words that have the letter sequence *love* somewhere within them. There are 137 such words, including some that I would not have thought of, such as *rollover* and *slovenly*. The regex `^.*l.*o.*v.*e.*$`

finds all words in which *l, o, v,* and *e* appear in sequence, whether of not they are adjacent. The set has 267 members, including such secret-lover gems as *bloviate*, *electropositive*, and *leftovers*.

A solution to the foldable words problem could surely be crafted with regular expressions, but I am not a regex wizard. In search of a more muggles-friendly strategy, my first thought was to extend the idea behind the random-sampling procedure. Instead of selecting foldable sequences at random, I’d generate all of them, and check each one against the word list.

The procedure below generates all three-letter strings that can be folded from the given text, and returns the subset of those strings that appear in the Scrabble word list:

```
def foldableStrings3(lexicon, text):
normtext = normalize(text)
n = len(normtext)
words = []
for i in range(0, n-2):
for j in range(i+1, n-1):
for k in range(j+1, n):
s = normtext[i] + normtext[j] + normtext[k]
if s in lexicon:
words.append(s)
return(words)
```

At the heart of the procedure are three nested loops that methodically step through all the foldable combinations: For any initial letter `text[i]`

we can choose any following letter `text[j]`

with` j > i`

; likewise `text[j]`

can be followed by any `text[k]`

with `k > j`

. This scheme works perfectly well, finding 348 instances of three-letter words. I speak of “instances” because some words appear in the list more than once; for example, *pee* can be formed in three ways. If we count only unique words, there are 137.

Following this model, we could write a separate routine for each word length from 1 to 15 letters, but that looks like a dreary and repetitious task. Nobody wants to write a procedure with loops nested 15 deep. An alternative is to write a meta-procedure, which would generate the appropriate procedure for each word length. I made a start on that exercise in advanced loopology, but before I got very far I realized there’s an easier way. I was wondering: In a text of *n* letters, how many foldable substrings exist—whether or not they are recognizable words? There are several ways of answering this question, but to me the most illuminating argument comes from an inclusion/exclusion principle. Consider the first letter of the text, which in our case is the letter *I*. In the set of all foldable strings, half include this letter and half exclude it. The same is true of the second letter, and the third, and so on. Thus each letter added to the text doubles the number of foldable strings, which means the total number of strings is simply \(2^n\). (Included in this count is the empty string, made up of no letters.)

This observation suggests a simple algorithm for generating all the foldable strings in any *n*-letter text. Just count from \(0\) to \(2^{n} - 1\), and for each value along the way line up the binary representation of the number with the letters of the text. Then select those letters that correspond to a `1`

bit, like so:

itsapleasuretoserveyou 0000100000110011111000

And so we see that the word `preserve`

corresponds to the binary representation of the number `134392`

.

Counting is something that computers are good at, so a word-search procedure based on this principle is straightforward:

```
def foldablesByCounting(lexicon, text):
normtext = normalize(text)
n = len(normtext)
words = []
for i in range(2**n - 1):
charSeq = ''
positions = positionsOf1Bits(i, n)
for p in positions:
charSeq += normtext[p]
if charSeq in lexicon:
words.append(charSeq)
return(words)
```

The outer loop (variable `i`

) counts from \(0\) to \(2^{n} - 1\); for each of these numbers the inner loop (variable `p`

) picks out the letters corresponding to 1 bits. The program produces the output expected. Unfortunately, it does so very slowly. For every character added to the text, running time roughly doubles. I haven’t the patience to plod through the \(2^{22}\) patterns in “itsapleasuretoserveyou”; estimates based on shorter phrases suggest the running time would be more than three hours.

In the middle of the night I realized my approach to this problem was totally backwards. Instead of blindly generating all possible character strings and filtering out the few genuine words, I could march through the list of Scrabble words and test each of them to see if it’s foldable. At worst I would have to try some 270,000 words. I could speed things up even more by making a preliminary pass through the Scrabble list, discarding all words that include characters not present in the normalized text. For the text “It’s a pleasure to serve you,” the character set has just 12 members: `aeiloprstuvy`

. Allowing only words formed from these letters slashes the Scrabble list down to a length of 12,816.

To make this algorithm work, we need a procedure to report whether or not a word can be formed by folding the given text. The simplest approach is to slide the candidate word along the text, looking for a match for each character in turn:

taste itsapleasuretoserveyoutaste itsapleasuretoserveyout aste itsapleasuretoserveyout a ste itsapleasuretoserveyout a s te itsapleasuretoserveyout a s t eitsapleasuretoserveyou

If every letter of the word finds a mate in the text, the word is foldable, as in the case of `taste`

, shown above. But an attempt to match `tastes`

would fall off the end of the text looking for a second `s`

, which does not exist.

The following code implements this idea:

```
def wordIsFoldable(word, text):
normtext = normalize(text)
t = 0 # pointer to positions in normtext
w = 0 # pointer to positions in word
while t < len(normtext):
if word[w] == normtext[t]: # matching chars in word and text
w += 1 # move to next char in word
if w == len(word): # matched all chars in word
return(True) # so: thumbs up
t += 1 # move to next char in text
return(False) # fell off the end: thumbs down
```

All we need to do now is embed this procedure in a loop that steps through all the candidate Scrabble words, collecting those for which `wordIsFoldable`

returns `True`

.

There’s still some waste motion here, since we are searching letter-by-letter through the same text, and repeating the same searches thousands of times. The source code (available on GitHub as a Jupyter notebook) explains some further speedups. But even the simple version shown here runs in less than two tenths of a second, so there’s not much point in optimizing.

I can now report that there are 778 unique foldable Scrabble words in “It’s a pleasure to serve you” (including the three one-letter words I added to the list). Words that can be formed in multiple ways bring the total count to 899.

And so we come to the tah-dah! moment—the unveiling of the complete list. I have organized the words into groups based on each word’s starting position within the text. (By Python convention, the positions are numbered from 0 through \(n-1\).) Within each group, the words are sorted according to the position of their last character; that position is given in the subscript following the word. For example, *tapestry* is in Group 1 because it begins at position 1 in the text (the *t* in *It’s*), and it carries the subscript 19 because it ends at position 19 (the *y* in *you*).

This arrangement of the words is meant to aid in contructing multiword phrases. If a word ends at position \(m\), the next word in the phrase must come from a group numbered \(m+1\) or greater.

**Group 0:** i_{0} it_{1} is_{2} its_{2} ita_{3} isle_{6} ilea_{7} isles_{8} itas_{8} ire_{11} issue_{11} iure_{11} islet_{12} io_{13} iso_{13} ileus_{14} ios_{14} ires_{14} islets_{14} isos_{14} issues_{14} issuer_{16} ivy_{19}

**Group 1:** ta_{3} tap_{4} tae_{6} tale_{6} tape_{6} te_{6} tala_{7} talea_{7} tapa_{7} tea_{7} taes_{8} talas_{8} tales_{8} tapas_{8} tapes_{8} taps_{8} tas_{8} teas_{8} tes_{8} tapu_{9} tau_{9} talar_{10} taler_{10} taper_{10} tar_{10} tear_{10} tsar_{10} taleae_{11} tare_{11} tease_{11} tee_{11} tapet_{12} tart_{12} tat_{12} taut_{12} teat_{12} test_{12} tet_{12} tret_{12} tut_{12} tao_{13} taro_{13} to_{13} talars_{14} talers_{14} talus_{14} taos_{14} tapers_{14} tapets_{14} tapus_{14} tares_{14} taros_{14} tars_{14} tarts_{14} tass_{14} tats_{14} taus_{14} tauts_{14} tears_{14} teases_{14} teats_{14} tees_{14} teres_{14} terts_{14} tests_{14} tets_{14} tres_{14} trets_{14} tsars_{14} tuts_{14} tasse_{15} taste_{15} tate_{15} terete_{15} terse_{15} teste_{15} tete_{15} toe_{15} tose_{15} tree_{15} tsetse_{15} taperer_{16} tapster_{16} tarter_{16} taser_{16} taster_{16} tater_{16} tauter_{16} tearer_{16} teaser_{16} teer_{16} teeter_{16} terser_{16} tester_{16} tor_{16} tutor_{16} tav_{17} tarre_{18} testee_{18} tore_{18} trove_{18} tutee_{18} tapestry_{19} tapstry_{19} tarry_{19} tarty_{19} tasty_{19} tay_{19} teary_{19} terry_{19} testy_{19} toey_{19} tory_{19} toy_{19} trey_{19} troy_{19} try_{19} too_{20} toro_{20} toyo_{20} tatou_{21} tatu_{21} tutu_{21}

**Group 2:** sap_{4} sal_{5} sae_{6} sale_{6} sea_{7} spa_{7} sales_{8} sals_{8} saps_{8} seas_{8} spas_{8} sau_{9} sar_{10} sear_{10} ser_{10} slur_{10} spar_{10} spear_{10} spur_{10} sur_{10} salse_{11} salue_{11} seare_{11} sease_{11} seasure_{11} see_{11} sere_{11} sese_{11} slae_{11} slee_{11} slue_{11} spae_{11} spare_{11} spue_{11} sue_{11} sure_{11} salet_{12} salt_{12} sat_{12} saut_{12} seat_{12} set_{12} slart_{12} slat_{12} sleet_{12} slut_{12} spart_{12} spat_{12} speat_{12} spet_{12} splat_{12} spurt_{12} st_{12} suet_{12} salto_{13} so_{13} salets_{14} salses_{14} saltos_{14} salts_{14} salues_{14} sapless_{14} saros_{14} sars_{14} sass_{14} sauts_{14} sears_{14} seases_{14} seasures_{14} seats_{14} sees_{14} seres_{14} sers_{14} sess_{14} sets_{14} slaes_{14} slarts_{14} slats_{14} sleets_{14} slues_{14} slurs_{14} sluts_{14} sos_{14} spaes_{14} spares_{14} spars_{14} sparts_{14} spats_{14} spears_{14} speats_{14} speos_{14} spets_{14} splats_{14} spues_{14} spurs_{14} spurts_{14} sues_{14} suets_{14} sures_{14} sus_{14} salute_{15} saree_{15} sasse_{15} sate_{15} saute_{15} setose_{15} slate_{15} sloe_{15} sluse_{15} sparse_{15} spate_{15} sperse_{15} spree_{15} saeter_{16} salter_{16} saluter_{16} sapor_{16} sartor_{16} saser_{16} searer_{16} seater_{16} seer_{16} serer_{16} serr_{16} slater_{16} sleer_{16} spaer_{16} sparer_{16} sparser_{16} spearer_{16} speer_{16} spuer_{16} spurter_{16} suer_{16} surer_{16} sutor_{16} sav_{17} sov_{17} salve_{18} save_{18} serre_{18} serve_{18} slave_{18} sleave_{18} sleeve_{18} slove_{18} sore_{18} sparre_{18} sperre_{18} splore_{18} spore_{18} stere_{18} sterve_{18} store_{18} stove_{18} salary_{19} salty_{19} sassy_{19} saury_{19} savey_{19} say_{19} serry_{19} sesey_{19} sey_{19} slatey_{19} slaty_{19} slavey_{19} slay_{19} sleety_{19} sley_{19} slurry_{19} sly_{19} soy_{19} sparry_{19} spay_{19} speary_{19} splay_{19} spry_{19} spurrey_{19} spurry_{19} spy_{19} stey_{19} storey_{19} story_{19} sty_{19} suety_{19} surety_{19} surrey_{19} survey_{19} salvo_{20} servo_{20} stereo_{20} sou_{21} susu_{21}

**Group 3:** a_{3} al_{5} ae_{6} ale_{6} ape_{6} aa_{7} ala_{7} aas_{8} alas_{8} ales_{8} als_{8} apes_{8} as_{8} alu_{9} alar_{10} aper_{10} ar_{10} alae_{11} alee_{11} alure_{11} apse_{11} are_{11} aue_{11} alert_{12} alt_{12} apart_{12} apert_{12} apt_{12} aret_{12} art_{12} at_{12} aero_{13} also_{13} alto_{13} apo_{13} apso_{13} auto_{13} aeros_{14} alerts_{14} altos_{14} alts_{14} alures_{14} alus_{14} apers_{14} apos_{14} apres_{14} apses_{14} apsos_{14} apts_{14} ares_{14} arets_{14} ars_{14} arts_{14} ass_{14} ats_{14} aures_{14} autos_{14} alate_{15} aloe_{15} arete_{15} arose_{15} arse_{15} ate_{15} alastor_{16} alerter_{16} alter_{16} apter_{16} aster_{16} arere_{18} ave_{18} aery_{19} alary_{19} alay_{19} aleatory_{19} apay_{19} apery_{19} arsey_{19} arsy_{19} artery_{19} artsy_{19} arty_{19} ary_{19} ay_{19} aloo_{20} arvo_{20} avo_{20} ayu_{21}

**Group 4:** pe_{6} pa_{7} pea_{7} plea_{7} pas_{8} peas_{8} pes_{8} pleas_{8} plu_{9} par_{10} pear_{10} per_{10} pur_{10} pare_{11} pase_{11} peare_{11} pease_{11} pee_{11} pere_{11} please_{11} pleasure_{11} plue_{11} pre_{11} pure_{11} part_{12} past_{12} pat_{12} peart_{12} peat_{12} pert_{12} pest_{12} pet_{12} plast_{12} plat_{12} pleat_{12} pst_{12} put_{12} pareo_{13} paseo_{13} peso_{13} pesto_{13} po_{13} pro_{13} pareos_{14} pares_{14} pars_{14} parts_{14} paseos_{14} pases_{14} pass_{14} pasts_{14} pats_{14} peares_{14} pears_{14} peases_{14} peats_{14} pees_{14} peres_{14} perts_{14} pesos_{14} pestos_{14} pests_{14} pets_{14} plats_{14} pleases_{14} pleasures_{14} pleats_{14} plues_{14} plus_{14} pos_{14} pros_{14} pures_{14} purs_{14} pus_{14} puts_{14} parse_{15} passe_{15} paste_{15} pate_{15} pause_{15} perse_{15} plaste_{15} plate_{15} pose_{15} pree_{15} prese_{15} prose_{15} puree_{15} purse_{15} parer_{16} parr_{16} parser_{16} parter_{16} passer_{16} paster_{16} pastor_{16} pater_{16} pauser_{16} pearter_{16} peer_{16} perter_{16} pester_{16} peter_{16} plaster_{16} plater_{16} pleaser_{16} pleasurer_{16} pleater_{16} poser_{16} pretor_{16} proser_{16} puer_{16} purer_{16} purr_{16} purser_{16} parev_{17} pav_{17} perv_{17} pareve_{18} parore_{18} parve_{18} passee_{18} pave_{18} peeve_{18} perve_{18} petre_{18} pore_{18} preeve_{18} preserve_{18} preve_{18} prore_{18} prove_{18} parry_{19} party_{19} pastry_{19} pasty_{19} patsy_{19} paty_{19} pay_{19} peatery_{19} peaty_{19} peavey_{19} peavy_{19} peeoy_{19} peery_{19} perry_{19} pervy_{19} pesty_{19} plastery_{19} platy_{19} play_{19} ploy_{19} plurry_{19} ply_{19} pory_{19} posey_{19} posy_{19} prey_{19} prosy_{19} pry_{19} pursy_{19} purty_{19} purvey_{19} puy_{19} parvo_{20} poo_{20} proo_{20} proso_{20} pareu_{21} patu_{21} poyou_{21}

**Group 5:** la_{7} lea_{7} las_{8} leas_{8} les_{8} leu_{9} lar_{10} lear_{10} lur_{10} lare_{11} lase_{11} leare_{11} lease_{11} leasure_{11} lee_{11} lere_{11} lure_{11} last_{12} lat_{12} least_{12} leat_{12} leet_{12} lest_{12} let_{12} lo_{13} lares_{14} lars_{14} lases_{14} lass_{14} lasts_{14} lats_{14} leares_{14} lears_{14} leases_{14} leasts_{14} leasures_{14} leats_{14} lees_{14} leets_{14} leres_{14} leses_{14} less_{14} lests_{14} lets_{14} los_{14} lues_{14} lures_{14} lurs_{14} laree_{15} late_{15} leese_{15} lose_{15} lute_{15} laer_{16} laser_{16} laster_{16} later_{16} leaser_{16} leer_{16} lesser_{16} lor_{16} loser_{16} lurer_{16} luser_{16} luter_{16} lav_{17} lev_{17} luv_{17} lave_{18} leave_{18} lessee_{18} leve_{18} lore_{18} love_{18} lurve_{18} lay_{19} leary_{19} leavy_{19} leery_{19} levy_{19} ley_{19} lory_{19} lovey_{19} loy_{19} lurry_{19} laevo_{20} lasso_{20} levo_{20} loo_{20} lassu_{21} latu_{21} lou_{21}

**Group 6:** ea_{7} eas_{8} es_{8} eau_{9} ear_{10} er_{10} ease_{11} ee_{11} ere_{11} east_{12} eat_{12} est_{12} et_{12} euro_{13} ears_{14} eases_{14} easts_{14} eats_{14} eaus_{14} eres_{14} eros_{14} ers_{14} eses_{14} ess_{14} ests_{14} euros_{14} erose_{15} esse_{15} easer_{16} easter_{16} eater_{16} err_{16} ester_{16} erev_{17} eave_{18} eve_{18} easy_{19} eatery_{19} eery_{19} estro_{20} evo_{20}

**Group 7:** a_{7} as_{8} ar_{10} ae_{11} are_{11} aue_{11} aret_{12} art_{12} at_{12} auto_{13} ares_{14} arets_{14} ars_{14} arts_{14} ass_{14} ats_{14} aures_{14} autos_{14} arete_{15} arose_{15} arse_{15} ate_{15} aster_{16} arere_{18} ave_{18} aery_{19} arsey_{19} arsy_{19} artery_{19} artsy_{19} arty_{19} ary_{19} ay_{19} aero_{20} arvo_{20} avo_{20} ayu_{21}

**Group 8:** sur_{10} sue_{11} sure_{11} set_{12} st_{12} suet_{12} so_{13} sets_{14} sos_{14} sues_{14} suets_{14} sures_{14} sus_{14} see_{15} sese_{15} setose_{15} seer_{16} ser_{16} suer_{16} surer_{16} sutor_{16} sov_{17} sere_{18} serve_{18} sore_{18} stere_{18} sterve_{18} store_{18} stove_{18} sesey_{19} sey_{19} soy_{19} stey_{19} storey_{19} story_{19} sty_{19} suety_{19} surety_{19} surrey_{19} survey_{19} servo_{20} stereo_{20} sou_{21} susu_{21}

**Group 9:** ur_{10} ure_{11} ut_{12} ures_{14} us_{14} uts_{14} use_{15} ute_{15} ureter_{16} user_{16} uey_{19} utu_{21}

**Group 10:** re_{11} ret_{12} reo_{13} reos_{14} res_{14} rets_{14} ree_{15} rete_{15} roe_{15} rose_{15} rev_{17} reeve_{18} resee_{18} reserve_{18} retore_{18} rore_{18} rove_{18} retry_{19} rory_{19} rosery_{19} rosy_{19} retro_{20} roo_{20}

**Group 11:** et_{12} es_{14} ee_{15} er_{16} ere_{18} eve_{18} eery_{19} evo_{20}

**Group 12:** to_{13} te_{15} toe_{15} tose_{15} tor_{16} tee_{18} tore_{18} toey_{19} tory_{19} toy_{19} trey_{19} try_{19} too_{20} toro_{20} toyo_{20}

**Group 13:** o_{13} os_{14} oe_{15} ose_{15} or_{16} ore_{18} oy_{19} oo_{20} ou_{21}

**Group 14:** ser_{16} see_{18} sere_{18} serve_{18} sey_{19} servo_{20} so_{20} sou_{21}

**Group 15:** er_{16} ee_{18} ere_{18} eve_{18} evo_{20}

**Group 16:** re_{18} reo_{20}

**Group 17:**

**Group 18:**

**Group 19:** yo_{20} you_{21} yu_{21}

**Group 20:** o_{20} ou_{21}

**Group 21:**

Naturally, I’ve tried out the code on a few other well-known phrases.

If Lynn and I had met at a different dining establishment, she might have found a straw with the statement, “It takes two hands to handle a Whopper.” There’s quite a diverse assortment of possible messages lurking in this text, with 1,154 unique foldable words and almost 2,000 word instances. Perhaps she would have chosen the upbeat “Inhale hope.” Or, in a darker mood, “I taste woe.”

If we had been folding dollar bills instead of straw wrappers, “In God We Trust” might have become the forward-looking proclamation, “I go west!” Horace Greeley’s marching order on the same theme, “Go west, young man,” gives us the enigmatic “O, wet yoga!” or, perhaps more aptly, “Gunman.”

Jumping forward from 1967 to 2021—from the Summer of Love to the Winter of COVID—I can turn “Wear a mask. Wash your hands.” into the plaintive, “We ask: Why us?” With “Maintain social distance,” the best I can do is “A nasal dance” or “A sad stance.”

And then there’s “Make America Great Again.” It yields “Meme rage.” Also “Make me ragtag.”

In a project like this one, you might think that getting a suitable list of English words would be the easy part. In fact it seems to be the main trouble spot.

The Scrabble lexicon I’ve been relying on derives from a word list known as SOWPODS, compiled by two associations of Scrabble players starting in the 1980s. Current editions of the list are distributed by a commercial publisher, Collins Dictionaries. If I understand correctly, all versions of the list are subject to copyright (see discussion on Stack Exchange) and cannot legally be distributed without permission. But no one seems to be much bothered by that fact. Copies of the lists in plain-text format, with one word per line, are easy to find on the internet—and not just on dodgy sites that specialize in pirated material.

There are alternative lists without legal encumbrances. Indeed, there’s a good chance you already have one such list pre-installed on your computer. A file called `words`

is included in most distributions of the Unix operating system, including MacOS; my copy of the file lives in `usr/share/dict/words`

. If you don’t have or can’t find the Unix `words`

file, I suggest downloading the Natural Language Toolkit, a suite of data files and Python programs that includes a lexicon almost identical to Unix words, as well as many other linguistic resources.

The Scrabble list has one big advantage over `words`

: It includes plurals and inflected forms of verbs—not just *test* but also *tests*, *tested*, and *testing*. [Bad example; see comments below.] The `words`

file is more like a list of dictionary head words, with only the stem form explicitly included. On the other hand, `words`

has an abundance of names and other proper nouns, as well as abbreviations, which are excluded from the Scrabble list since they are not legal plays in the board game.

How about combining the two word lists? Their union has just under 400,000 entries—quite a large lexicon. Using this augmented list for the analysis of “It’s a pleasure to serve you,” my program finds an additional 219 foldable words, beyond the 778 found with the Scrabble list alone. Here they are:

aaru aer aerose aes alares alaster alea alerse aleut alo alose alur aly ao apa apar aperu apus aro arry aru ase asor asse ast astor atry aueto aurore aus ausu aute e eastre eer erse esere estre eu ey iao ie ila islay ist isuret itala itea iter ito iyo l laet lao larry larve lastre lasty latro laur leo ler lester lete leto loro lu lue luo lut luteo lutose ly oer ory ovey p parsee parto passo pastose pato pau paut pavo pavy peasy perty peru pess peste pete peto petr plass platery pluto poe poy presee pretry pu purre purry puru r reve ro roer roey roy s sa saa salar salat salay saltee saltery salvy sao sapa saple sapo sare sart saur sauty sauve se seary seave seavy seesee sero sert sesuto sla slare slav slete sloo sluer soe sory soso spary spass spave spleet splet splurt spor spret sprose sput ssu stero steve stre strey stu sueve suto sutu suu t taa taar tal talao talose taluto tapeats tapete taplet tapuyo tarr tarse tartro tarve tasser tasu taur tave tavy teaer teaey teart teasy teaty teave teet teety tereu tess testor toru torve tosy tou treey tsere tst tu tue tur turr turse tute tutory u uro urs uru usee v vu y

Many of the proper nouns in this list are present in the vocabulary of most English speakers: *Aleut, Peru, Pluto, Slav*; the same is true of personal names such as *Larry, Leo, Stu, Tess*. But the rest of the words are very unlikely to turn up in the smalltalk of teenage sweethearts. Indeed, the list is full of letter sequences I simply don’t recognize as English words. Please define *isuret, ovey, spleet,* or *sput*.

There are even bigger word lists out there. In 2006 Google extracted 13.5 million unique English words from public web pages. (The sheer number implies a very liberal definition of *English* and *word*.) A good place to start exploring this archive is Peter Norvig’s website, which offers a file with the 333,333 most frequent words from the corpus. The list begins as you might expect: *the, of, and, to, a, in, for*…; but the weirdness creeps in early. The single letters *c, e, s,* and *x* are all listed among the 100 most common “words,” and the rest of the alphabet turns up soon after. By the time we get to the end of the file, it’s mostly typos *(mepquest, halloweeb, scholarhips)*, run-together words *(dietsdontwork, weightlossdrugs)*, and hundreds of letter strings that have some phonetic or orthographic resemblance to *Google* or *Yahoo!* or both *(hoogol, googgl, yahhol, gofool, yogol)*. (I suspect that much of this rubbish was scraped not from the visible text of web pages but from metadata stuffed into headers for purposes of search-engine optimization.)

Applying the Google list to the search for foldable words more than doubles the volume of results, but it contributes almost nothing to the stock of words that might form interesting messages. I found 1,543 new words, beyond those that are also present in the union of the Scrabble and Unix lists. In alphabetical order, the additions begin: *aae, aao, aaos, aar, aare, aaro, aars, aart, aarts, aase, aass, aast, aasu, aat, aats, aatsr, aau, aaus, aav, aave, aay, aea, aeae….* I’m not going to be folding up any straw wrappers with those words for my sweetheart.

What we really need, I begin to think, is not a longer word list but a shorter and more discriminating one.

]]>The tableau presented below is a product of my amateur efforts to address these questions. It’s a simple exercise in the mechanics of probability. I take a sample of the U.S. population, roughly 10,000 people, and randomly assign them to clusters of size \(n\), where \(n\) can range from 1 to 32. (In any single run of the model, \(n\) is fixed; all the groups are the same size.) Each cluster represents a Thanksgiving gathering. If a cluster includes someone infected with SARS-CoV-2, the disease may spread to the uninfected and susceptible members of the same group.

With the model’s default settings, \(n = 12\). The population sample consists of 9,900 people, represented as tiny colored dots arranged in 825 clusters of 12 dots each. Most of the dots are green, indicating susceptible individuals. Red dots are the infectious spreaders. Purple dots represent the unfortunates who are newly infected as a result of mingling with spreaders in these holiday get-togethers. I count the purple dots and estimate the rate of new infections per 100,000 population.

You can explore the model on your own. Twiddle with the sliders in the control panel, then press the “Go” button to generate a new sample population and a new cycle of infections. For example, by moving the group-size slider you can get a thousand clusters of 10 persons each, or 400 clusters of 25 each.

Before going any further with this discussion, I should make clear that the simulation is *not* offered as a prediction of how Covid-19 will spread during tomorrow’s Thanksgiving festivities. This is not a guide to personal risk assessment. If you play around with the controls, you’ll soon discover you can make the model say anything you wish. Depending on the settings you choose, the result can lie anywhere along the entire spectrum of possible outcomes, from nobody-gets-sick to everybody’s-got-it. There are settings that lead to impossible states, such as infection rates beyond 100 percent. Even so, I’m not totally convinced that the model is useless. It might point to combinations of parameters that would limit the damage.

The crucial input that drives the model is the daily tally of Covid cases for the entire country, expressed as a rate of new infections per 100,000 population. The official version of this statistic is published by the CDC; a few other organizations, including Johns Hopkins and the New York Times, maintain their own daily counts. The CDC report for November 24 cites a seven-day rolling average of 52.3 new cases per 100,000 people. For the model I set the default rate at 50, but the slider marked “daily new cases per 100,000 population” will accommodate any value between 0 and 500.

From the daily case rate we can estimate the prevalence of the disease: the total number of active cases at a given moment. In the model, the prevalence is simply 14 times the daily case rate. In effect, I am assuming (or pretending) that the daily rate is unchanging and that everyone’s illness lasts 14 days from the moment of infection to full recovery. Neither of these assumptions is true. In a model of ongoing disease propagation, where today’s events determine what happens next week, the steady-state approximation would be unacceptable. But this model produces only a snapshot on one particular day of the year, and so dynamics are not very important.

What we *do* need to consider in more detail is the sequence of stages in a case of Covid-19. The archetypal model in epidemiology has three stages: susceptible *(S)*, infected *(I)*, and recovered *(R)*; *R* stands for “removed,” acknowledging that recovery isn’t the only possible end of an illness. But I am going to look away from the grimmer aspects of this story.*(U)*, infectious *(I)*, and symptomatic *(Q)*, which gives us a SUIQR model. An incubating patient has been infected but is not yet producing enough virus particles to infect others. The infectious stage is the most dangerous period: Patients have no conspicuous symptoms and are still unaware of their own infection, but nonetheless they are spewing virus particles with every breath.

During the symptomatic phase, patients know they are sick and should be in quarantine; hence the letter *Q*. For the purposes of the model I assume that everyone in category *Q* will decline the invitation to Thanksgiving dinner. *x*, which I think of as an empty chair at the dinner table. The purple dots for newly acquired infections add a sixth category to the model, although they really belong to the incubating *U* class.

A parameter of some importance is the duration of the presymptomatic infectious stage, since the red-dot people in that category are the only ones actually spreading the disease in my model of Thanksgiving gatherings. I made a foray into the medical literature to pin down this number, but what I learned is that after a year of intense scrutiny there’s still a lot we don’t know about Covid-19. The typical period from infection to the onset of symptoms (encompassing both the *U* and *I* stages of my model) is four or five days, but apparently it can range from less than two days to three weeks. The graph below is based on a paper by Conor McAloon and colleagues that aggregates results of eight studies carried out early in the pandemic (when it’s easier to determine the date of infection, since cases are rare and geographically isolated).

Ultimately I decided, for the sake of simplicity (or lazy convenience) to collapse this distribution to its median, which is about five days. Then there’s the question of when within this period an infected person becomes dangerous to those nearby. Various sources [Harvard, MIT, Fox News] suggest that infected individuals begin spreading the virus two or three days before they show symptoms, and that the moment of maximum infectiousness comes shortly before symptom onset. I chose to interpret “two or three days” as 2.5 days.

What all this boils down to is the following relation: If the national new-case rate is 50 per 100,000, then among Thanksgiving celebrants in the model, 125 per 100,000 are Covid spreaders. That’s 0.125 percent. Turn to the person on your left. Turn to your right. Are you feeling lucky?

The model’s default settings assume a new-case rate of \(50\) per \(100,000\), a Thanksgiving group size of \(12\), and a \(0.25\) probability of transmitting the virus from an infectious person to a susceptible person. Let’s do some back-of-the-envelope calculating. As noted above, the \(50/100{,}000\) new case rate translates into \(125/100{,}000\) infectious individuals. Among the \(\approx 10,000\) members of the model population, we shoud expect to see \(12\) or 1\(\)3 red-dot *I*s. Because the number of *I*s is much smaller than the number of groups \((825)\), it’s unlikely that more than one red dot will turn up in any single group of \(12\). In each group with a single spreader, we can expect the virus to jump to \(0.25 \times 11 = 2.75\) of the spreader’s companions. This assumes that all the companions are green-dot susceptibles, which isn’t quite true. There are also yellow-dot incubating and blue-dot recovered people, as well as the red-*x* empty chairs of those in quarantine. But these are small corrections. The envelope estimate gives \(344/100,000\) new infections on Thanksgiving day; the computer model yields 325 per 100,000, when averaged over many runs.

But the average doesn’t tell the whole story. The variance of these outcomes is quite high, as you’ll see if you press the “Go” button repeatedly. Counting the number of new infections in each of a million runs of the model, the distribution looks like this:

The peak of the curve is at 30 new infections per model run, which corresponds to about 300 cases for 100,000 population, but you shouldn’t be surprised to see a result of 150 or 500.

If the effect of Thanksgiving gatherings in the real world matches the results of this model, we’re in serious trouble. A rate of 300 cases per 100,000 people corresponds to just under a million new cases in the U.S. population. All of those infections would arise on a single day (although few of them would be detected until about a week later). That’s an outburst of contagion more than five times bigger than the worst daily toll recorded so far.

But there are plenty of reasons to be skeptical of this result.

Even in a “normal” year, not everyone in America sits down at a table for 12 to exchange gossip and germs, and surely many more will be sitting out this year’s events. According to a survey attributed

Another potential mitigating factor is that people invited to your holiday celebration are probably not selected at random from the whole population, as they are in the model. Guests tend to come in groups, often family units. If your aunt and uncle and their three kids all live together, they probably get sick together, too. Thus a gathering of 12 individuals might better be treated as an assembly of three or four “pods.” One way to introduce this idea into the computer model is to enforce nonzero correlations between the people selected for each group. If one attendee is infectious, that raises the probability that others will also be infectious, and vice versa. As the correlation coefficient increases, groups are increasingly homogeneous. If lots of spreaders are crowded in one group, they can’t infect the vulnerable people in other groups. In the model, a correlation coefficient of 0.5 reduces the average number of new cases from 32.5 to 23.5. (Complete or perfect correlation eliminates contagion altogether, but this is highly unrealistic.)

Geography should also be considered. The national average case rate of 50 per 100,000 conceals huge local and regional variations. In Hawaii the rate is about 5 cases, so if you and all your guests are Hawaiians, you’ll have to be quite unlucky to pick up a Covid case at the Thanksgiving luau. At the other end of the scale, there are counties in the Great Plains states that have approached 500 cases per 100,000 in recent weeks. A meal with a dozen attendees in one of those hotspots looks fairly calamitous: The model shows 3,000 new cases for 100,000, or 3 percent of the population.

If you are determined to have a big family meal tomorrow and you want to minimize the risks, there are two obvious strategies. You can reduce the chance that your gathering includes someone infectious, or you can reduce the likelihood that any infectious person who happens to be present will transmit the virus to others. Most of the recommendations I’ve read in the newspaper and on health-care websites focus on the latter approach. They urge us to wear masks, the keep everyone at arms’ length, to wash our hands, to open all the windows (or better yet to hold the whole affair outdoors). Making it a briefer event should also help.

In the model, any such measures are implemented by nudging the slider for transmission probability toward smaller values. The effect is essentially linear over a broad range of group sizes. Reducing the transmission probability by half reduces the number of new infections proportionally.

The trouble is, I have no firm idea of what the actual transmission probability might be, or how effective those practices would be in reducing it. A recent study by a group at Vanderbilt University found a transmission rate within households of greater than 50 percent. I chose 25 percent as the default value in the model on the grounds that spending a single day together should be less risky than living permanently under the same roof. But the range of plausible values remains quite wide. Perhaps studies done in the aftermath of this Thanksgiving will yield better data.

As for reducing the chance of having an infectious guest, one approach is simply to reduce the size of the group. In this case the effect is better than linear, but only slightly so. Splitting that 12-person meal into two separate 6-seat gatherings cuts the infection rate by a little more than half, from 32.5 to 15.2. And, predictably, larger groups have worse outcomes. Pack in 24 people per group and you can expect 70 infections. Neither of these strategies seems likely to cut the infection rate by a factor of 10 or more. Unless, of course, everyone eats alone. Set the group-size slider to 1, and no one gets sick.

Another factor to keep in mind is that this model counts only infections passed from person to person during a holiday get-together. Leaving all those cases aside, the country has quite a fierce rate of “background” transmission happening on days with no special events. If the Thanksgiving cases are to be added to the background cases, we’re even worse off than the model would suggest. But the effect could be just the opposite. A family holiday is an occasion when most people skip some ordinary activities that can also be risky. Most of us have the day off from work. We are less likely to go out to a bar or a restaurant. It’s even possible that the holiday will actually suppress the overall case rate. But don’t bet your life on it.

There’s one more wild card to be taken into account. A tacit assumption in the structure of the model is that the reported Covid case count accurately reflects the prevalence of the disease in the population. This is surely not quite true. There are persistent reports of asymptomatic cases—people who are infected and infectious, but who never feel unwell. Those cases are unlikely to be recorded. Others may be ill and suspect the cause is Covid but avoid getting medical care for one reason or another. (For example, they may fear losing their job.) All in all, it seems likely the CDC is under-reporting the number of infections.

Early in the course of the epidemic, a group at Georgia Tech led by Aroon Chande built a risk-estimating web tool based on case rates for individual U.S. counties. They included an adjustment for “ascertainment bias” to compensate for cases omitted from official public health estimates. Their model multiplies the reported case counts by a factor of either 5 or 10. This adjustment may well have been appropriate last spring, when Covid testing was hard to come by even for those with reasonable access to medical services. It seems harder to justify such a large multiplier now, but the model, which is still being maintained, continues to insert a fivefold or tenfold adjustment. Out of curiosity, I have included a slider that can be set to make a similar adjustment.

Is it possible that we are still counting only a tenth of all the cases? If so, the cumulative total of infections since the virus first came ashore in the U.S. is 10 times higher than official estimates. Instead of 12.5 million total cases, we’ve experienced 125 million; more than a third of the population has already been through the ordeal and (mostly) come out the other side. We’ll know the answer soon. At the present infection rate (multiplied by 10), we will have burned through another third of the population in just a few weeks, and infection rates should fall dramatically through herd immunity. (I’m not betting my life on this one either.)

One other element of the Covid story that ought to be in the model is testing, which provides another tool for improving the chances that we socialize only with safe companions. If tests were completely reliable, their effect would merely be to move some fraction of the dangerous red-dot category into the less-dangerous red-*x* quarantined camp. But false-positive and false-negative testing results complicate the situation. (If the actual infection rate is low, false positives may outnumber true positives.)

I offer no conclusions or advice as a result of my little adventure in computational epidemiology. You should not make life-or-death decisions based on the writings of some doofus at a website called bit-player. (Nor based on a tweet from @realDonaldTrump.)

I *do* have some stray thoughts about the nature of holidays in Covid times. In the U.S. most of our holidays, both religious and secular, are intensely social, convivial occasions. Thanksgiving is a feast, New Year’s Eve is a party, Mardi Gras is a parade, St. Patrick’s Day is a pub crawl, July Fourth is a picnic. I’m not asking to abolish these traditions, some of which I enjoy myself. But they are not helping matters in the midst of a raging epidemic. Every one of these occasions can be expected to produce a spike in that curve we’re supposed to be flattening.

I wish we could find a spot on the calendar for a new kind of holiday—a day or a weekend for silent and solitary contemplative respite. Close the door, or go off by yourself. Put a dent in the curve.

]]>In that essay I also mentioned three other questions about trees that have long been bothering me. In this sequel I want to poke at those other questions a bit more deeply.

Botanists have an elaborate vocabulary for describing leaf shapes: *cordate* (like a Valentine heart), *cuneate* (wedgelike), *ensiform* (sword shaped), *hastate* (like an arrowhead, with barbs), *lanceolate* (like a spearhead), *oblanceolate* (a backwards spearhead), *palmate* (leaflets radiating like fingers), *pandurate* (violin shaped), *reniform* (kidney shaped), *runcinate* (saw-toothed), *spatulate* (spoonlike). That’s not, by any means, a complete list.

Steven Vogel, in his 2012 book *The Life of a Leaf*, enumerates many factors and forces that might have an influence on leaf shape. For example, leaves can’t be too heavy, or they would break the limbs that hold them aloft. On the other hand, they can’t be too delicate and wispy, or they’ll be torn to shreds by the wind. Leaves also must not generate too much aerodynamic drag, or the whole tree might topple in a storm.

Job One for a leaf is photosynthesis: gathering sunlight, bringing together molecules of carbon dioxide and water, synthesizing carbohydrates. Doing that efficiently puts further constraints on the design. As much as possible, the leaf should turn its face to the sun, maximizing the flux of photons absorbed. But temperature control is also important; the biosynthetic apparatus shuts down if the leaf is too hot or too cold.

Vogel points out that subtle features of leaf shape can have a measurable impact on thermal and aerodynamic performance. For example, convective cooling is most effective near the margins of a leaf; temperature rises with distance from the nearest edge. In environments where overheating is a risk, shapes that minimize this distance—such as the *lobate* forms of oak leaves—would seem to have an advantage over simpler, disklike shapes. But the choice between frilly and compact forms depends on other factors as well. Broad leaves with convex shapes intercept the most sunlight, but that may not always be a good thing. Leaves with a lacy design let dappled sunlight pass through, allowing multiple layers of leaves to share the work of photosynthesis.

Natural selection is a superb tool for negotiating a compromise among such interacting criteria. If there is some single combination of traits that works best for leaves growing in a particular habitat, I would expect evolution to find it. But I see no evidence of convergence on an optimal solution. On the contrary, even closely related species put out quite distinctive leaves.

Take a look at the three oak leaves in the upper-left quadrant of the image above. They are clearly variations on a theme. What the leaves have in common is a sequence of peninsular protrusions springing alternately to the left and the right of the center line. The variations on the theme have to do with the number of peninsulas (three to five per side in these specimens), their shape (rounded or pointy), and the depth of the coves between peninsulas. Those variations could be attributed to genetic differences at just a few loci. But *why* have the leaves acquired these different characteristics? What evolutionary force makes rounded lobes better for white oak trees and pointy ones better for red oak and pin oak?

Much has been learned about the developmental mechanisms that generate leaf shapes. Biochemically, the main actors are the plant hormones known as auxins; their spatial distribution and their transport through plant tissues regulate local growth rates and hence the pattern of development. (A 2014 review article by Jeremy Dkhar and Ashwani Pareek covers these aspects of leaf form in great detail.) On the mathematical and theoretical side, Adam Runions, Miltos Tsiantis, and Przemyslaw Prusinkiewicz have devised an algorithm that can generate a wide spectrum of leaf shapes with impressive verisimilitude. (Their 2017 paper, along with source code and videos, is at algorithmicbotany.org/papers/leaves2017.html.) With different parameter values the same program yields shapes that are recognizable as oaks, maples, sycamores, and so on. Again, however, all this work addresses questions of *how*, not *why*.

Another property of tree leaves—their size—*does* seem to respond in a simple way to evolutionary pressures. Across all land plants (not just trees), leaf area varies by a factor of a million—from about 1 square millimeter per leaf to 1 square meter. A 2017 paper by Ian J. Wright and colleagues reports that this variation is strongly correlated with climate. Warm, moist regions favor large leaves; think of the banana. Cold, dry environments, such as alpine ridges, host mainly tiny plants with even tinier leaves. So natural selection is alive and well in the realm of tree leaves; it just appears to have no clear preferences when it comes to shape.

Or am I missing something important? Elsewhere in nature we find flamboyant variations that seem gratuitous if you view them strictly in the grim context of survival-of-the-fittest. I’m thinking of the fancy-dress feathers of birds, for example. Cardinals and bluejays both frequent my back yard, but I don’t spend much time wondering whether red or blue is the optimal color for survival in that habitat. Nor do I expect the two species to converge on some shade of purple. Their gaudy plumes are not adaptations to the physical environment but elements of a communication system; they send signals to rivals or potential mates. Could something similar be going on with leaf shape? Do the various oak species maintain distinctive leaves to identity themselves to animals that help with pollination or seed dispersal? I rate this idea unlikely, but I don’t have a better one.

Surely this question is too easy! We know why trees grow tall. They reach for the sky. It’s their only hope of escaping the gloomy depths of the forest’s lower stories and getting a share of the sunshine. In other words, if you are a forest tree, you need to grow tall because your neighbors are tall; they overshadow you. And the neighbors grow tall because you’re tall. It’s is a classic arms race. Vogel has an acute commentary on this point:

In every lineage that has independently come up with treelike plants, a variety of species achieve great height. That appears to me to be the height of stupidity…. We’re looking at, almost surely, an object lesson in the limitations of evolutionary design….

A trunk limitation treaty would permit all individuals to produce more seeds and to start producing seeds at earlier ages. But evolution, stupid process that it is, hasn’t figured that out—foresight isn’t exactly its strong suit.

Vogel’s trash-talking of Darwinian evolution is meant playfully, of course. But I think the question of height-limitation treaties*tree*ties?) deserves more serious attention.

Forest trees in the eastern U.S. often grow to a height of 25 or 30 meters, approaching 100 feet. It takes a huge investment of material and energy to erect a structure that tall. To ensure sufficient strength and stiffness, the girth of the trunk must increase as the \(\frac{3}{2}\) power of the height, and so the cross-sectional area \((\pi r^2)\) grows as the cube of the height. It follows that doubling the height of a tree trunk multiplies its mass by a factor of 16.

Great height imposes another, ongoing, metabolic cost. Every day, a living tree must lift 500 liters of water—weighing 500 kilograms—from the root zone at ground level up to the leaves in the crown. It’s like carrying enough water to fill four or five bathtubs from the basement of a building to the 10th floor.

Height also exacerbates certain hazards to the life and health of the tree. A taller trunk forms a longer lever arm for any force that might tend to overturn the tree. Compounding the risk, average wind speed increases with distance above the ground.

Standing on the forest floor, I tilt my head back and stare dizzily upward toward the leafy crowns, perched atop great pillars of wood. I can’t help seeing these plants on stilts as a colossal waste of resources. It’s even sillier than the needlelike towers of apartments for billionaires that now punctuate the Manhattan skyline. In those buildings, all the floors are put to *some* use. In the forest, the tree trunks are denuded of leaves and sometimes of branches over 90 percent of their length; only the penthouses are occupied.

If the trees could somehow get together and negotiate a deal—a zoning ordinance or a building code—they would *all* benefit. Perhaps they could decree a maximum height of 10 meters. Nothing would change about the crowns of the trees; the rule would simply chop off the bottom 20 meters of the trunk.

If every tree would gain from the accord, why don’t we see such amputated forests evolving in nature? The usual response to this why-can’t-everybody-get-along question is that evolution just doesn’t work that way. Natural selection is commonly taken to be utterly selfish and individualist, even when it hurts. A tree reaching the 10-meter limit would say to itself: “Yes, this is good; I’m getting plenty of light without having to stand on tiptoe. But it could be even better. If I stretched my trunk another meter or two, I’d collect an even bigger share of solar energy.” Of course the other trees reason with themselves in exactly the same way, and so the futile arms race resumes. As Vogel said, foresight is not evolution’s strong suit.

I am willing to accept this dour view of evolution, but I am not at all sure it actually explains what we see in the forest. If evolution has no place for cooperative action in a situation like this one, how does it happen that all the trees do in fact stop growing at about the same height? Specifically, if an agreement to limit height to 10 meters would be spoiled by rampant cheating, why doesn’t the same thing happen at 30 meters?

One might conjecture that 30 meters is a physiological limit, that the trees would grow taller if they could, but some physical constraint prevents it. Perhaps they just can’t lift the water any higher. I would consider this a very promising hypothesis if it weren’t for the sequoias and the coast redwoods in the western U.S. Those trees have not heard about any such physical barriers. They routinely grow to 70 or 80 meters, and a few specimens have exceeded 100 meters. Thus the question for the East Coast trees is not just “Why are you so tall?” but also “Why aren’t you taller?”

I can think of at least one good reason for forest trees to grow to a uniform height. If a tree is shorter than average, it will suffer for being left in the shade. But standing head and shoulders above the crowd also has disadvantages: Such a standout tree is exposed to stronger winds, a heavier load of ice and snow, and perhaps higher odds of lightning strikes. Thus straying too far either below or above the mean height may be punished by lower reproductive success. But the big question remains: How do all the trees reach consensus on what height is best?

Another possibility: Perhaps the height of forest trees is not a result of an arms-race after all but instead is a response to predation. The trees are holding their leaves on high to keep them away from herbivores. I can’t say this is wrong, but it strikes me as unlikely. No giraffes roam the woods of North America (and if they did, 10 meters would be more than enough to put the leaves out of reach). Most of the animals that nibble on tree leaves are arthropods, which can either fly (adult insects) or crawl up the trunk (caterpillars and other larvae). Thus height cannot fully protect the leaves; at best it might provide a deterrent. Tree leaves are not a nutritious diet; perhaps some small herbivores consider them worth a climb of 10 meters, but not 30.

To a biologist, a tree is a woody plant of substantial height. To a mathematician, a tree is a graph without loops. It turns out that math-trees and bio-trees have some important properties in common.

The diagram below shows two mathematical graphs. They are collections of dots (known more formally as vertices), linked by line segments (called edges). A graph is said to be *connected* if you can travel from any vertex to any other vertex by following some sequence of edges. Both of the graphs shown here are connected. Trees form a subspecies of connected graphs. They are *minimally* connected: Between any two vertices there is exactly one *x, y, x, z* is not a path.*a* to *b*. The graph at right is not a tree. There are two routes from *a* to *b* (red and yellow lines).

Here’s another way to describe a math-tree. It’s a graph that obeys the antimatrimonial rule: What branching puts asunder, let no one join together again. Bio-trees generally work the same way: Two limbs that branch away from the trunk will not later return to the trunk or fuse with each other. In other words, there are no cycles, or closed loops. The pattern of radiating branches that never reconverge is evident in the highly regular structure of the bio-tree pictured below. (The tree is a Norfolk Island pine, native to the South Pacific, but this specimen was photographed on Sardinia.)

Trees have achieved great success without loops in their branches. Why would a plant ever want to have its structural elements growing in circles?

I can think of two reasons. The first is mechanical strength and stability. Engineers know the value of triangles (the smallest closed loops) in building rigid structures. Also arches, where two vertical elements that could not stand alone lean on each other. Trees can’t take advantage of these tricks; their limbs are cantilevers, supported only at the point of juncture with the trunk or the parent limb. Loopy structures would allow for various kinds of bracing and buttressing.

The second reason is reliability. Providing multiple channels from the roots to the leaves would improve the robustness of the tree’s circulatory system. An injury near the base of a limb would no longer doom all the structures beyond the point of damage.

Networks with multiple paths between nodes are exploited elsewhere in nature, and even in other aspects of the anatomy of trees. The reticulated channels in the image below are veins distributing fluids and nutrients within a leaf from a red oak tree. The very largest veins (or ribs) have a treelike arrangement, but the smaller channels form a nested hierarchy of loops within loops. (The pattern reminds me of a map of an ancient city.) Because of the many redundant pathways, an insect taking a chomp out of the middle of this network will not block communication with the rest of the leaf.

The absence of loops in the larger-scale structure of trunk and branches may be a natural consequence of the developmental program that guides the growth of a tree. Aristid Lindenmayer, a Hungarian-Dutch biologist, invented a family of formal languages (now called L-systems) for describing such growth. The languages are rewriting systems: You start with a single symbol (the *axiom*) and replace it with a string of symbols specified by the rules of a grammar. Then the string resulting from this substitution becomes a new input to the same rewriting process, with each of its symbols being replaced by another string formed according to the grammar rules. In the end, the symbols are interpreted as commands for constructing a geometric figure.

Here’s an L-system grammar for drawing cartoonish two-dimensional trees:

f ⟶ f [r f] [l f] l ⟶ l r ⟶ r

The symbols `f`

, `l`

, and `r`

are the basic elements of the language; when interpreted as drawing commands, they stand for *forward*, *left*, and *right*. The first rule of the grammar replaces any occurrence of `f`

with the string `f [l f] [r f]`

; the second and third rules change nothing, replacing `l`

and `r`

with themselves. Square brackets enclose a subprogram. On reaching a left bracket, the system makes note of its current position and orientation in the drawing. Then it executes the instructions inside the brackets, and finally on reaching the right bracket backtracks to the saved position and orientation.

Starting with the axiom `f`

, the grammar yields a succession of ever-more-elaborate command sequences:

Stage 0: f Stage 1: f [r f] [l f] Stage 2: f [r f] [l f] [r f [r f] [l f]] [l f [r f] [l f]]]

When this rewriting process is continued for a few further stages and then converted to graphic output, we see a sapling growing into a young tree, with a shape reminiscent of an elm.*forward* step is reduced by a factor of 0.6. And all turns, both *left* and *right*, are through an angle of 20 degrees.

L-systems like this one can produce a rich variety of branching structures. More elaborate versions of the same program can create realistic images of biological trees. (The Algorithmic Botany website at the University of Calgary has an abundance of examples.) What the L-systems *can’t* do is create closed loops. That would require a fundamentally different kind of grammar, such as a transformation rule that takes two symbols or strings as input and produces a conjoined result. (Note that in the stage 5 diagram above, two branches of the tree appear to overlap, but they are not joined. The graph has no vertex at the intersection point.)

If the biochemical mechanisms governing the growth and development of trees operate with the same constraints as L-systems, we have a tidy explanation for the absence of loops in the branching of bio-trees. But perhaps the explanation is a little too tidy. I’ve been saying that trees don’t do loops, and it’s generally true. But what about the tree pictured below—a crepe myrtle I photographed some years ago on a street in Raleigh, North Carolina? (It reminds me of a sinewy Henry Moore sculpture.)

This plant is a tree in the botanical sense, but it’s certainly not a mathematical tree. A single trunk comes out of the ground and immediately divides. At waist height there are four branches, then three of them recombine. At chest height, there’s another split and yet another merger. This rogue tree is flouting all the canons and customs of treedom.

And the crepe myrtle is not the only offender. Banyan trees, native to India, have their horizontal branches propped up by numerous outrigger supports that drop to the ground. The banyan shown below, in Hilo, Hawaii, has a hollowed-out cavity where the trunk ought to be, surrounded by dozens or hundreds of supporting shoots, with cross-braces overhead. The L-system described above could never create such a network. But if the banyan can do this, why don’t other trees adopt the same trick?

In biology, the question “Why *x*?” is shorthand for “What is the evolutionary advantage of *x*?” or “How does *x* contribute to the survival and reproductive success of the organism?” Answering such questions often calls for a leap of imagination. We look at the mottled brown moth clinging to tree bark and propose that its coloration is camouflage, concealing the insect from predators. We look at a showy butterfly and conclude that its costume is aposematic—a warning that says, “I am toxic; you’ll be sorry if you eat me.”

These explanations risk turning into just-so stories,*les contes des pourquoi*.

And if we have a hard time imagining the experiences of animals, the lives of plants are even further beyond our ken. Does the flower lust for the pollen-laden bee? Does the oak tree grieve when its acorns are eaten by squirrels? How do trees feel about woodpeckers? Confronted with these questions, I can only shrug. I have no idea what plants desire or dread.

Others claim to know much more about vegetable sensibilities. Peter Wohlleben, a German forester, has published a book titled *The Hidden Life of Trees: What They Feel, How They Communicate*. He reports that trees suckle their young, maintain friendships with their neighbors, and protect sick or wounded members of their community. To the extent these ideas have a scientific basis, they draw heavily on work done in the laboratory of Suzanne Simard at the University of British Columbia. Simard, leader of the Mother Tree project, studies communication networks formed by tree roots and their associated soil fungi.

I find Simard’s work interesting. I find the anthropomorphic rhetoric unhelpful and offensive. The aim, I gather, is to make us care more about trees and forests by suggesting they are a lot like us; they have families and communities, friendships, alliances. In my view that’s exactly wrong. What’s most intriguing about trees is that they are aliens among us, living beings whose long, immobile, mute lives bear no resemblance to our own frenetic toing-and-froing. Trees are deeply mysterious all on their own, without any overlay of humanizing sentiment.

Dkhar, Jeremy, and Ashwani Pareek. 2014. What determines a leaf’s shape? *EvoDevo* 5:47.

McMahon, Thomas A. 1975. The mechanical design of trees. *Scientific American* 233(1):93–102.

Osnas, Jeanne L. D., Jeremy W. Lichstein, Peter B. Reich, and Stephen W. Pacala. 2013. Global leaf trait relationships: mass, area, and the leaf economics spectrum. *Science* 340:741–744.

Prusinkiewicz, Przemyslaw, and Aristid Lindenmayer, with James S. Hanan, F. David Fracchia, Deborah Fowler, Martin J. M. de Boer, and Lynn Mercer. 1990. *The Algorithmic Beauty of Plants*. New York: Springer-Verlag. PDF edition available at http://algorithmicbotany.org/papers/.

Runions, Adam, Martin Fuhrer, Brendan Lane, Pavol Federl, Anne-Gaëlle Rolland-Lagan, and Przemyslaw Prusinkiewicz. 2005. Modeling and visualization of leaf venation patterns. *ACM Transactions on Graphics* 24(3):702?711.

Runions, Adam, Miltos Tsiantis, and Przemyslaw Prusinkiewicz. 2017. A common developmental program can produce diverse leaf shapes. *New Phytologist* 216:401–418. Preprint and source code.

Tadrist, Loïc, and Baptiste Darbois Texier. 2016. Are leaves optimally designed for self-support? An investigation on giant monocots. arXiv:1602.03353.

Vogel, Steven. 2012. *The Life of a Leaf*. University of Chicago Press.

Wright, Ian J., Ning Dong, Vincent Maire, I. Colin Prentice, Mark Westoby, Sandra Díaz, Rachael V. Gallagher, Bonnie F. Jacobs, Robert Kooyman, Elizabeth A. Law, Michelle R. Leishman, Ülo Niinemets, Peter B. Reich, Lawren Sack, Rafael Villar, Han Wang, and Peter Wilf. 2017. Global climatic drivers of leaf size. *Science* 357:917–921.

Yamazaki, Kazuo. 2011. Gone with the wind: trembling leaves may deter herbivory. *Biological Journal of the Linnean Society* 104:738–747.

Young, David A. 2010 preprint. Growth-algorithm model of leaf shape. arXiv:1004.4388.

]]>When I follow Frost’s trail, it leads me into an unremarkable patch of Northeastern woodland, wedged between highways and houses and the town dump. It’s nowhere dark and deep enough to escape the sense of human proximity. This is not the forest primeval. Still, it is woodsy enough to bring to mind not only the rhymes of overpopular poets but also some tricky questions about trees and forests—questions I’ve been poking at for years, and that keep poking back. Why are trees so tall? Why aren’t they taller? Why do their leaves come in so many different shapes and sizes? Why are the trees trees (in the graph theoretical sense of that word) rather than some other kind of structure? And then there’s the question I want to discuss today:

Taking a quick census along the Frost trail, I catalog hemlock, sugar maple, at least three kinds of oak (red, white, and pin), beech and birch, shagbark hickory, white pine, and two other trees I can’t identify with certainty, even with the help of a Peterson guidebook and iNaturalist. The stand of woods closest to my home is dominated by hemlock, but on hillsides a few miles down the trail, broadleaf species are more common. The photograph below shows a saddle point (known locally as the Notch) between two peaks of the Holyoke Range, south of Amherst. I took the picture on October 15 last year—in a season when fall colors make it easier to detect the species diversity.

Forests like this one cover much of the eastern half of the United States. The assortment of trees varies with latitude and altitude, but at any one place the forest canopy is likely to include eight or ten species. A few isolated sites are even richer; certain valleys in the southern Appalachians, known as cove forests, have as many as 25 canopy species. And tropical rain forests are populated by 100 or even 200 tall tree species.

From the standpoint of ecological theory, all this diversity is puzzling. You’d think that in any given environment, one species would be slightly better adapted and would therefore outcompete all the others, coming to dominate the landscape. _{2}, water, various mineral nutrients—so the persistence of mixed-species woodlands begs for explanation.

Here’s a little demo of competitive exclusion. Two tree species—let’s call them olive and orange—share the same patch of forest, a square plot that holds 625 trees.

Initially, each site is randomly assigned a tree of one species or the other. When you click the *Start* button (or just tap on the array of trees), you launch a cycle of death and renewal. At each time step, one tree is chosen—entirely at random and without regard to species—to get the axe. Then another tree is chosen as the parent of the replacement, thereby determining its species. This latter choice is not purely random, however; there’s a bias. One of the species is better adapted to its environment, exploiting the available resources more efficiently, and so it has an elevated chance of reproducing and putting its offspring into the vacant site. In the control panel below the array of trees is a slider labeled “fitness bias”; nudging it left favors the orange species, right the olives.

The outcome of this experiment should not come as a surprise. The two species are playing a zero-sum game: Whatever territory olive wins, orange must lose, and vice versa. One site at a time, the fitter species conquers all. If the advantage is very slight, the process may take a while, but in the end the less-efficient organism is always banished. (What if the two species are exactly equal? I’ll return to that question in a moment, but for now let’s just pretend it never happens. And I have deviously jiggered the simulation so that you can’t set the bias to zero.)

Competitive exclusion does not forbid *all* cohabitation. Suppose olive and orange rely on two mineral nutrients in the soil—say, iron and calcium. Assume both of these elements are in short supply, and their availability is what limits growth in the populations of the trees. If olive trees are better at taking up iron and oranges assimilate calcium more effectively, then the two species may be able to reach an accommodation where both survive.

In this model, neither species is driven to extinction. At the default setting of the slider control, where iron and calcium are equally abundant in the environment, olive and orange trees also maintain roughly equal numbers on average. Random fluctuations carry them away from this balance point, but not very far or for very long. The populations are stabilized by a negative feedback loop. If a random perturbation increases the proportion of olive trees, each one of those trees gets a smaller share of the available iron, thereby reducing the species’ potential for further population growth. The orange trees are less affected by an iron deficiency, and so their population rebounds. But if the oranges then overshoot, they will be restrained by overuse of the limited calcium supply.

Moving the slider to the left or right alters the balance of iron and calcium in the environment. A 60:40 proportion favoring iron will shift the equilibrium between the two tree species, allowing the olives to occupy more of the territory. But, as long as the resource ratio is not too extreme, the minority species is in no danger of extinction. The two kinds of trees have a live-and-let-live arrangement.

In the idiom of ecology, the olive and orange species escape the rule of competitive exclusion because they occupy distinct niches, or roles in the ecosystem. They are specialists, preferentially exploiting different resources. The niches do not have to be completely disjoint. In the simulation above they overlap somewhat: The olives need calcium as well as iron, but only 25 percent as much; the oranges have mirror-image requirements.

Will this loophole in the law of competitive exclusion admit more than two species? Yes: *N* competing species can coexist if there are at least *N* independent resources or environmental strictures limiting their growth, and if each species has a different limiting factor. Everybody must have a specialty. It’s like a youth soccer league where every player gets a trophy for some unique, distinguishing talent.

This notion of slicing and dicing an ecosystem into multiple niches is a well-established practice among biologists. It’s how Darwin explained the diversity of finches on the Galapagos islands, where a dozen species distinguish themselves by habitat (ground, shrubs, trees) or diet (insects, seeds and nuts of various sizes). Forest trees might be organized in a similar way, with a number of microenvironments that suit different species. The process of creating such a diverse community is known as niche assembly.

Some niche differentiation is clearly present among forest trees. For example, gums and willows prefer wetter soil. In my local woods, however, I can’t detect any systematic differences in the sites colonized by maples, oaks, hickories and other trees. They are often next-door neighbors, on plots of land with the same slope and elevation, and growing in soil that looks the same to me. Maybe I’m just not attuned to what tickles a tree’s fancy.

Niche assembly is particularly daunting in the tropics, where it requires a hundred or more distinct limiting resources. Each tree species presides over its own little monopoly, claiming first dibs on some environmental factor no one else really cares about. Meanwhile, all the trees are fiercely competing for the most important resources, namely sunlight and water. Every tree is striving to reach an opening in the canopy with a clear view of the sky, where it can spread its leaves and soak up photons all day long. Given the existential importance of winning this contest for light, it seems odd to attribute the distinctive diversity of forest communities to squabbling over other, lesser resources.

Where niche assembly makes every species the winner of its own little race, another theory dispenses with all competition, suggesting the trees are not even trying to outrun their peers. They are just milling about at random. According to this concept, called neutral ecological drift, all the trees are equally well adapted to their environment, and the set of species appearing at any particular place and time is a matter of chance. A site might currently be occupied by an oak, but a maple or a birch would thrive there just as well. Natural selection has nothing to select. When a tree dies and another grows in its place, nature is indifferent to the species of the replacement.

This idea brings us back to a question I sidestepped above: What happens when two competing species are exactly equal in fitness? The answer is the same whether there are two species or ten, so for the sake of visual variety let’s look at a larger community.

If you have run the simulation—and if you’ve been patient enough to wait for it to finish—you are now looking at a monochromatic array of trees. I can’t know what the single color on your screen might be—or in other words which species has taken over the entire forest patch—but I know there’s just one species left. The other nine are extinct. In this case the outcome might be considered at least a little surprising. Earlier we learned that if a species has even a slight advantage over its neighbors, it will take over the entire system. Now we see that no advantage is needed. Even when all the players are exactly equal, one of them will emerge as king of the mountain, and everyone else will be exterminated. Harsh, no?

Here’s a record of one run of the program, showing the abundance of each species as a function of time:

At the outset, all 10 species are present in roughly equal numbers, clustered close to the average abundance of \(625/10\). As the program starts up, the grid seethes with activity as the sites change color rapidly and repeatedly. Within the first 70,000 times steps, however, all but three species have disappeared. The three survivors trade the lead several times, as waves of contrasting colors wash over the array. Then, after about 250,000 steps, the species represented by the bright green line drops to zero population—extinction. The final one-on-one stage of the contest is highly uneven—the orange species is close to total dominance and the crimson one is bumping along near extinction—but nonetheless the tug of war lasts another 100,000 steps. (Once the system reaches a monospecies state, nothing more can ever change, and so the program halts.)

This lopsided result is not to be explained by any sneaky bias hidden in the algorithm. At all times and for all species, the probability of gaining a member is exactly equal to the probability of losing a member. It’s worth pausing to verify this fact. Suppose species \(X\) has population \(x\), which must lie in the range \(0 \le x \le 625\). A tree chosen at random will be of species \(X\) with probability \(x/625\); therefore the probability that the tree comes from some other species must be \((625 - x)/625\). \(X\) gains one member if it is the replacement species but not the victim species, an event with a combined probability of \(x(625 - x)/625\). \(X\) loses one member if it is the victim but not the replacement, which has the same probability.

It’s a fair game. No loaded dice. Nevertheless, somebody wins the jackpot, and the rest of the players lose everything, every time.

The spontaneous decay of species diversity in this simulated patch of forest is caused entirely by random fluctuations. Think of the population \(x\) as a random walker wandering along a line segment with \(0\) at one end and \(625\) at the other. At each time step the walker moves one unit right \((+1)\) or left \((-1)\) with equal probability; on reaching either end of the segment, the game ends. The most fundamental fact about such a walk is that it *does* always end. A walk that meanders forever between the two boundaries is not impossible, but it has probability \(0\); hitting one wall or the other has probability \(1\).

How long should you expect such a random walk to last? In the simplest case, with a single walker, the expected number of steps starting at position \(x\) is \(x(625 - x)\). This expression has a maximum when the walk starts in the middle of the line segment; the maximum length is just under \(100{,}000\) steps. In the forest simulation with ten species the situation is more complicated because the multiple walks are correlated, or rather anti-correlated: When one walker steps to the right, another must go left. Computational experiments suggest that the median time needed for ten species to be whittled down to one is in the neighborhood of \(320{,}000\) steps.

From these computational models it’s hard to see how neutral ecological drift could be the savior of forest diversity. On the contrary, it seems to guarantee that we’ll wind up with a monoculture, where one species has wiped out all others. But this is not the end of the story.

One issue to keep in mind is the timescale of the process. In the simulation, time is measured by counting cycles of death and replacement among forest trees. I’m not sure how to convert that into calendar years, but I’d guess that 320,000 death-and-replacement events in a tract of 625 trees might take 50,000 years or more. Here in New England, that’s a very long time in the life of a forest. This entire landscape was scraped clean by the Laurentide ice sheet just 20,000 years ago. If the local woodlands are losing species to random drift, they would not yet have had time to reach the end game.

The trouble is, this thesis implies that forests start out diverse and evolve toward a monoculture, which is not supported by observation. If anything, diversity seems to *increase* with time. The cove forests of Tennessee, which are much older than the New England woods, have more species, not fewer. And the hyperdiverse ecosystem of the tropical rain forests is thought to be millions of years old.

Despite these conceptual impediments, a number of ecologists have argued strenuously for neutral ecological drift, most notably Stephen P. Hubbell in a 2001 book, *The Unified Neutral Theory of Biodiversity and Biogeography*. The key to Hubbell’s defense of the idea (as I understand it) is that 625 trees do not make a forest, and certainly not a planet-girdling ecosystem.

Hubbell’s theory of neutral drift was inspired by earlier studies of the biogeography of islands, in particular the collaborative work of Robert H. MacArthur and Edward O. Wilson in the 1960s. Suppose our little plot of \(625\) trees is growing on an island at some distance from a continent. For the most part, the island evolves in isolation, but every now and then a bird carries a seed from the much larger forest on the mainland. We can simulate these rare events by adding a facility for immigration to the neutral-drift model. In the panel below, the slider controls the immigration rate. At the default setting of \(1/100\), every \(100\)th replacement tree comes not from the local forest but from a stable reserve where all \(10\) species have an equal probability of being selected.

For the first few thousand cycles, the evolution of the forest looks much like it does in the pure-drift model. There’s a brief period of complete tutti-frutti chaos, then waves of color erupt over the forest as it blushes pink, then deepens to crimson, or fades to a sickly green. What’s different is that none of those expanding species ever succeeds in conquering the entire array. As shown in the timeline graph below, they never grow much beyond 50 percent of the total population before they retreat into the scrum of other species. Later, another tree color makes a bid for empire but meets the same fate. (Because there is no clear endpoint to this process, the simulation is designed to halt after 500,000 cycles. If you haven’t seen enough by then, click *Resume*.)

Immigration, even at a low level, brings a qualitative change to the behavior of the model and the fate of the forest. The big difference is that we can no longer say extinction is forever. A species may well disappear from the 625-tree plot, but eventually it will be reimported from the permanent reserve. Thus the question is not whether a species is living or extinct but whether it is present or absent at a given moment. At an immigration rate of \(1/100\), the average number of species present is about \(9.6\), so none of them disappear for long.

With a higher level of immigration, the 10 species remain thoroughly mixed, and none of them can ever make any progress toward world domination. On the other hand, they have little risk of disappearing, even temporarily. Push the slider control all the way to the left, setting the immigration rate at \(1/10\), and the forest display becomes an array of randomly blinking lights. In the timeline graph below, there’s not a single extinction.

Pushing the slider in the other direction, rarer immigration events allow the species distribution to stray much further from equal abundance. In the trace below, with an immigrant arriving every \(1{,}000\)th cycle, the population is dominated by one or two species for most of the time; other species are often on the brink of extinction—or *over* the brink—but they come back eventually. The average number of living species is about 4.3, and there are moments when only two are present.

Finally, with a rate of \(1/10{,}000\), the effect of immigration is barely noticeable. As in the model without immigration, one species invades all the terrain; in the example recorded below, this takes about \(400{,}000\) steps. After that, occasional immigration events cause a small blip in the curve, but it will be a very long time before another species is able to displace the incumbent.

The island setting of this model makes it easy to appreciate how sporadic, weak connections between communities can have an outsize influence on their development. But islands are not essential to the argument. Trees, being famously immobile, have only occasional long-distance communication, even when there’s no body of water to separate them. (It’s a rare event when Birnam Wood marches off to Dunsinane.) Hubbell formulates a model of ecological drift in which many small patches of forest are organized into a hierarchical metacommunity. Each patch is both an island and part of the larger reservoir of species diversity. If you choose the right patch sizes and the right rates of migration between them, you can maintain multiple species at equilibrium. Hubbell also allows for the emergence of entirely new species, which is also taken to be a random or selection-neutral process.

Niche assembly and neutral ecological drift are theories that elicit mirror-image questions from skeptics. With niche assembly we look at dozens or hundreds of coexisting tree species and ask, “Can every one of them have a unique limiting resource?” With neutral drift we ask, “Can all of those species be exactly equal in fitness?”

Hubbell responds to the latter question by turning it upside down. The very fact that we observe coexistence implies equality:

All species that manage to persist in a community for long periods with other species must exhibit net long-term population growth rates of nearly zero…. If this were not the case, i.e., if some species should manage to achieve a positive growth rate for a considerable length of time, then from our first principle of the biotic saturation of landscapes, it must eventually drive other species from the community. But if all species have the same net population growth rate of zero on local to regional scales, then ipso facto they must have identical or nearly identical per capita relative fitnesses.

Herbert Spencer proclaimed: Survival of the fittest. Here we have a corollary: If they’re all survivors, they must all be equally fit.

Now for something completely different.

Another theory of forest diversity was devised specifically to address the most challenging case—the extravagant variety of trees in tropical ecosystems. In the early 1970s J. H. Connell and Daniel H. Janzen, field biologists working independently in distant parts of the world, almost simultaneously came up with the same idea.

A tropical rain forest is a tough neighborhood. Trees are under frequent attack by marauding gangs of predators, parasites, and pathogens. (Connell lumped these bad guys together under the label “enemies.”) Many of the enemies are specialists, targeting only trees of a single species. The specialization can be explained by competitive exclusion: Each tree species becomes a unique resource supporting one type of enemy.

Suppose a tree is beset by a dense population of host-specific enemies. The swarm of meanies attacks not only the adult tree but also any offspring of the host that have taken root near their parent. Since young trees are more vulnerable than adults, the entire cohort could be wiped out. Seedlings at a greater distance from the parent should have a better chance of remaining undiscovered until they have grown large and robust enough to resist attack. In other words, evolution might favor the rare apple that falls far from the tree. Janzen illustrated this idea with a graphical model something like the one at right. As distance from the parent increases, the probability that a seed will arrive and take root grows smaller *(red curve)*, but the probability that any such seedling will survive to maturity goes up *(blue curve)*. The overall probability of successful reproduction is the product of these two factors *(purple curve)*; it has a peak where the red and blue curves cross.

The Connell-Janzen theory predicts that trees of the same species will be widely dispersed in the forest, leaving plenty of room in between for trees of other species, which will have a similarly scattered distribution. The process leads to anti-clustering: conspecific trees are farther apart on average than they would be in a completely random arrangement. This pattern was noted by Alfred Russel Wallace in 1878, based on his own long experience in the tropics:

If the traveller notices a particular species and wishes to find more like it, he may often turn his eyes in vain in every direction. Trees of varied forms, dimensions, and colours are around him, but he rarely sees any one of them repeated. Time after time he goes towards a tree which looks like the one he seeks, but a closer examination proves it to be distinct. He may at length, perhaps, meet with a second specimen half a mile off, or may fail altogether, till on another occasion he stumbles on one by accident.

My toy model of the social-distancing process implements a simple rule. When a tree dies, it cannot be replaced by another tree of the same species, nor may the replacement match the species of any of the eight nearest neighbors surrounding the vacant site. Thus trees of the same species must have at least one other tree between them. To say the same thing in another way, each tree has an exclusion zone around it, where other trees of the same species cannot grow.

It turns out that social distancing is a remarkably effective way of preserving diversity. When you click *Start*, the model comes to life with frenetic activity, blinking away like the front panel of a 1950s Hollywood computer. Then it just keeps blinking; nothing else ever really happens. There are no spreading tides of color as a successful species gains ground, and there are no extinctions. The variance in population size is even lower than it would be with a completely random and uniform assignment of species to sites. This stability is apparent in the timeline graph below, where the 10 species tightly hug the mean abundance of 62.5:

When I finished writing this program and pressed the button for the first time, long-term survival of all ten species was not what I expected to see. My thoughts were influenced by some pencil-and-paper doodling. I had confirmed that only four colors are needed to create a pattern where no two trees of the same color are adjacent horizontally, vertically, or on either of the diagonals. One such pattern is shown at right. I suspected that the social-distancing protocol might cause the model to condense into such a crystalline state, with the loss of species that don’t appear in the repeated motif. I was wrong. Although four is indeed the minimum number of colors for a socially distanced two-dimensional lattice, there is nothing in the algorithm that encourages the system to seek the minimum.

After seeing the program in action, I was able to figure out what keeps all the species alive. There’s an active feedback process that puts a premium on rarity. Suppose that oaks currently have the lowest frequency in the population at large. As a result, oaks are least likely to be present in the exclusion zone surrounding any vacancy in the forest, which means in turn they are most likely to be acceptable as a replacement. As long as the oaks remain rarer than the average, their population will tend to grow. Symmetrically, individuals of an overabundant species will have a harder time finding an open site for their offspring. All departures from the mean population level are self-correcting.

The initial configuration in this model is completely random, ignoring the restrictions on adjacent conspecifics. Typically there are about 200 violations of the exclusion zone in the starting pattern, but they are all eliminated in the first few thousand time steps. Thereafter the rules are obeyed consistently. Note that with ten species and an exclusion zone consisting of nine sites, there is always at least one species available to fill a vacancy. If you try the experiment with nine or fewer species, some vacancies must be left as gaps in the forest. I should also mention that the model uses toroidal boundary conditions: the right edge of the grid is adjacent to the left edge, and the top wraps around to the bottom. This ensures that all sites in the lattice have exactly eight neighbors.

Connell and Janzen envisioned much larger exclusion zones, and correspondingly larger rosters of species. Implementing such a model calls for a much larger computation. A recent paper by Taal Levi *et al.* reports on such a simulation. They find that the number of surviving species and their spatial distribution remain reasonably stable over long periods (200 billion tree replacements).

Could the Connell-Janzen mechanism also work in temperate-zone forests? As in the tropics, the trees of higher latitudes do have specialized enemies, some of them notorious—the vectors of Dutch elm disease and chestnut blight, the emerald ash borer, the gypsy moth caterpillars that defoliate oaks. The hemlocks in my neighborhood are under heavy attack by the woolly adelgid, a sap-sucking bug. Thus the forces driving diversification and anti-clustering in the Connell-Janzen model would seem to be present here. However, the observed spatial structure of the northern forests is somewhat different. Social distancing hasn’t caught on here. The distribution of trees tends to be a little clumpy, with conspecifics gathering in small groves.

Plague-driven diversification is an intriguing idea, but, like the other theories mentioned above, it has certain plausibility challenges. In the case of niche assembly, we need to find a unique limiting resource for every species. In neutral drift, we have to ensure that selection really is neutral, assigning exactly equal fitness to trees that look quite different. In the Connell-Janzen model we need a specialized pest for every species, one that’s powerful enough to suppress all nearby seedlings. Can it be true that *every* tree has its own deadly nemesis?

*Invade* more than once, since a new arrival may die out before becoming established. Also note that I have slowed down this simulation, lest it all be over in a flash.*Invade* button.

Lacking enemies, the invader can flout the social-distancing rules, occupying any forest vacancy regardless of neighborhood. Once the invader has taken over a majority of the sites, the distancing rules become less onerous, but by then it’s too late for the other species.

One further half-serious thought on the Connell-Janzen theory: In the war between trees and their enemies, humanity has clearly chosen sides. We would wipe out those insects and fungi and other tree-killing pests if we could figure out how to do so. Everyone would like to bring back the elms and the chestnuts, and save the eastern hemlocks before it’s too late. On this point I’m as sentimental as the next treehugger. But if Connell and Janzen are correct, and if their theory applies to temperate-zone forests, eliminating all the enemies would actually cause a devastating collapse of tree diversity. Without pest pressure, competitive exclusion would be unleashed, and we’d be left with one-tree forests everywhere we look.

Species diversity in the forest is now matched by theory diversity in the biology department. The three ideas I have discussed here—niche assembly, neutral drift, and social distancing—all seem to be coexisting in the minds of ecologists. And why not? Each theory is a success in the basic sense that it can overcome competitive exclusion. Each theory also makes distinctive predictions. With niche assembly, every species must have a unique limiting resource. Neutral drift generates unusual population dynamics, with species continually coming and going, although the overall number of species remains stable. Social distancing entails spatial anticlustering.

How can we choose a winner among these theories (and perhaps others)? Scientific tradition says nature should have the last word. We need to conduct some experiments, or at least go out in the field and make some systematic observations, then compare those results with the theoretical predictions.

There have been quite a few experimental tests of competitive exclusion. For example, Thomas Park and his colleagues ran a decade-long series of experiments with two closely related species of flour beetles. One species or the other always prevailed. In 1969 Francisco Ayala reported on a similar experiment with fruit flies, in which he observed coexistence under circumstances that were thought to forbid it. Controversy flared, but in the end the result was not to overturn the theory but to refine the mathematical description of where exclusion applies.

Wouldn’t it be grand to perform such experiments with trees? Unfortunately, they are not so easily grown in glass vials. And conducting multigenerational studies of organisms that live longer than we do is a tough assignment. With flour beetles, Park had time to observe more than 100 generations in a decade. With trees, the equivalent experiment might take 10,000 years. But field workers in biology are a resourceful bunch, and I’m sure they’ll find a way. In the meantime, I want to say a few more words about theoretical, mathematical, and computational approaches to the problem.

Ecology became a seriously mathematical discipline in the 1920s, with the work of Alfred J. Lotka and Vito Volterra. To explain their methods and ideas, one might begin with the familiar fact that organisms reproduce themselves, thereby causing populations to grow. Mathematized, this observation becomes the differential equation

\[\frac{d x}{d t} = \alpha x,\]

which says that the instantaneous rate of change in the population \(x\) is proportional to \(x\) itself—the more there are, the more there will be. The constant of proportionality \(\alpha\) is called the intrinsic reproduction rate; it is the rate observed when nothing constrains or interferes with population growth. The equation has a solution, giving \(x\) as a function of \(t\):

\[x(t) = x_0 e^{\alpha t},\]

where \(x_0\) is the initial population. This is a recipe for unbounded exponential growth (assuming that \(\alpha\) is positive). In a finite world such growth can’t go on forever, but that needn’t worry us here.

\[\begin{align}

\frac{d x}{d t} &= \alpha x -\gamma x y\\

\frac{d y}{d t} &= -\beta y + \delta x y

\end{align}\]

The prey species \(x\) prospers when left to itself, but suffers as the product \(x y\) increases. The situation is just the opposite for the predator \(y\), which can’t get along alone (\(x\) is its only food source) and whose population swells when \(x\) and \(y\) are both abundant.

Competition is a more symmetrical relation: Either species can thrive when alone, and the interaction between them is negative for both parties.

\[\begin{align}

\frac{d x}{d t} &= \alpha x -\gamma x y\\

\frac{d y}{d t} &= \beta y - \delta x y

\end{align}\]

The Lotka-Volterra equations yield some interesting behavior. At any instant \(t\), the state of the two-species system can be represented as a point in the \(x, y\) plane, whose coordinates are the two population levels. For some combinations of the \(\alpha, \beta, \gamma, \delta\) parameters, there’s a point of stable equilibrium. Once the system has reached this point, it stays put, and it returns to the same neighborhood following any small perturbation. Other equilibria are unstable: The slightest departure from the balance point causes a major shift in population levels. And the *really* interesting cases have no stationary point; instead, the state of the system traces out a closed loop in the \(x, y\) plane, continually repeating a cycle of states. The cycles correspond to oscillations in the two population levels. Such oscillations have been observed in many predator-prey systems. Indeed, it was curiosity about the periodic swelling and contraction of populations in the Canadian fur trade and Adriatic fisheries that inspired Lotka and Volterra to work on the problem.

The 1960s and 70s brought more surprises. Studies of equations very similar to the Lotka-Volterra system revealed the phenomenon of “deterministic chaos,” where the point representing the state of the system follows an extremely complex trajectory, though it’s wandering are not random. There ensued a lively debate over complexity and stability in ecosystems. Is chaos to be found in natural populations? Is a community with many species and many links between them more or less stable than a simpler one?

Viewed as abstract mathematics, there’s much beauty in these equations, but it’s sometimes a stretch mapping the math back to the biology. For example, when the Lotka-Volterra equations are applied to species competing for resources, the resources appear nowhere in the model. The mathematical structure describes something more like a predator-predator interaction—two species that eat each other.

Even the organisms themselves are only a ghostly presence in these models. The differential equations are defined over the continuum of real numbers, giving us population levels or densities, but not individual plants or animals—discrete things that we can count with integers. The choice of number type is not of pressing importance as long as the populations are large, but it leads to some weirdness when a population falls to, say, 0.001—a millitree. Using finite-difference equations instead of differential equations avoids this problem, but the mathematics gets messier.

Another issue is that the equations are rigidly deterministic. Given the same inputs, you’ll always get exactly the same outputs—even in a chaotic model. Determinism rules out modeling anything like neutral ecological drift. Again, there’s a remedy: stochastic differential equations, which include a source of noise or uncertainty. With models of this kind, the answers produced are not numbers but probability distributions. You don’t learn the population of \(x\) at time \(t\); you get a probability \(P(x, t)\) in a distribution with a certain mean and variance. Another approach, called Markov Chain Monte Carlo (MCMC), uses a source of randomness to sample from such distributions. But the MCMC method moves us into the realm of computational models rather than mathematical ones.

Computational methods generally allow a direct mapping between the elements of the model and the things being modeled. You can open the lid and look inside to find the trees and the resources, the births and the deaths. These computational objects are not quite tangible, but they’re discrete, and always finite. A population is neither a number nor a probability distribution but a collection of individuals. I find models of this kind intellectually less demanding. Writing a differential equation that captures the dynamics of a biological system requires insight and intuition. Writing a program to implement a few basic events in the life of a forest—a tree dies, another takes its place—is far easier.

The six little models included in this essay serve mainly as visualizations; they expend most of their computational energy painting colored dots on the screen. But larger, more ambitious models are certainly feasible, as in the work of Taal Levi et al. mentioned above.

However, if computational models are easier to create, they can also be harder to interpret. If you run a model once and species \(X\) goes extinct, what can you conclude? Not much. On the next run \(X\) and \(Y\) might coexist. To make reliable inferences, you need to do some statistics over a large ensemble of runs—so once again the answer takes the form of a probability distribution.

The concreteness and explicitness of Monte Carlo models is generally a virtue, but it has a darker flip side. Where a differential equation model might apply to any “large” population, that vague description won’t work in a computational context. You have to name a number, even though the choice is arbitrary. The size of my forest models, 625 trees, was chosen for mere convenience. With a larger grid, say \(100 \times 100\), you’d have to wait millions of time steps to see anything interesting happen. Of course the same issue arises with experiments in the lab or in the field.

Both kinds of model are always open to a charge of oversimplifying. A model is the Marie Kondo version of nature—relentlessly decluttered and tidied up. Sometime important parts get tossed out. In the case of the forest models, it troubles me that trees have no life history. One dies, and another pops up full grown. Also missing from the models are pollination and seed dispersal, and rare events such a hurricanes and fires that can reshape entire forests. Would we learn more if all those aspects of life in the woods had a place in the equations or the algorithms? Perhaps, but where do you stop?

My introduction to models in ecology came through a book of that title by John Maynard Smith, published in 1974. I recently reread it, learning more than I did the first time through. Maynard Smith makes a distinction between simulations, useful for answering questions about specific problems or situations, and models, useful for testing theories. He offers this advice: “Whereas a good simulation should include as much detail as possible, a good model should include as little as possible.”

Ayala, F. J. 1969. Experimental invalidation of the principle of competitive exclusion. *Nature* 224:1076–1079.

Clark, James S. 2010. Individuals and the variation needed for high species diversity in forest trees. *Science* 327:1129–1132.

Connell, J. H. 1971. On the role of natural enemies in preventing competitive exclusion in some marine animals and in rain forest trees. In *Dynamics of Populations*, P. J. Den Boer and G. Gradwell, eds., Wageningen, pp. 298–312.

Gilpin, Michael E., and Keith E. Justice. 1972. Reinterpretation of the invalidation of the principle of competitive exclusion. *Nature* 236:273–301.

Hardin, Garrett. 1960. The competitive exclusion principle. *Science* 131(3409): 1292–1297.

Hubbell, Stephen P. 2001. *The Unified Neutral Theory of Biodiversity and Biogeography*. Princeton, NJ: Princeton University Press.

Hutchinson, G. E. 1959. Homage to Santa Rosalia, or why are there so many kinds of animals? *The American Naturalist* 93:145–159.

Janzen, Daniel H. 1970. Herbivores and the number of tree species in tropical forests. *The American Naturalist* 104(940):501–528.

Kricher, John. C. 1988. *A Field Guide to Eastern Forests, North America.* The Peterson Field Guide Series. Illustrated by Gordon Morrison. Boston: Houghton Mifflin.

Levi, Taal, Michael Barfield, Shane Barrantes, Christopher Sullivan, Robert D. Holt, and John Terborgh. 2019. Tropical forests can maintain hyperdiversity because of enemies. *Proceedings of the National Academy of Sciences of the USA* 116(2):581–586.

Levin, Simon A. 1970. Community equilibria and stability, and an extension of the competitive exclusion principle. *The American Naturalist*, 104(939):413–423.

MacArthur, R. H., and E. O. Wilson. 1967. *The Theory of Island Biogeography.* Monographs in Population Biology. Princeton University Press, Princeton, NJ.

May, Robert M. 1973. Qualitative stability in model ecosystems. *Ecology*, 54(3):638–641.

Maynard Smith, J. 1974. *Models in Ecology.* Cambridge: Cambridge University Press.

Richards, Paul W. 1973. The tropical rain forest. *Scientific American* 229(6):58–67.

Schupp, Eugene W. 1992. The Janzen-Connell model for tropical tree diversity: population implications and the importance of spatial scale. *The American Naturalist* 140(3):526–530.

Strobeck, Curtis. 1973. *N* species competition. *Ecology*, 54(3):650–654.

Tilman, David. 2004. Niche tradeoffs, neutrality, and community structure: A stochastic theory of resource competition, invasion, and community assembly. *Proceedings of the National Academy of Sciences of the USA* 101(30):10854–10861.

Wallace, Alfred R. 1878. *Tropical Nature, and Other Essays.* London: Macmillan and Company.

The economy’s swan dive is truly breathtaking. In response to the coronavirus threat we have shut down entire commercial sectors: most retail stores, restaurants, sports and entertainment. Travel and tourism are moribund. Manufacturing is threatened too, not only by concerns about workplace contagion but also by softening demand and disrupted supply chains. All of the automakers have closed their assembly plants in the U.S., and Boeing has stopped production at its plants near Seattle, which employ 70,000. Thus it comes as no great surprise—though it’s still a shock—that 3,283,000 Americans filed claims for unemployment compensation last week. That’s by far the highest weekly tally since the program was created in the 1930s. It’s almost five times the previous record from 1982, and 15 times the average for the first 10 weeks of this year. The graph is a dramatic hockey stick:

Here’s the same graph, updated to include new unemployment claims for the weeks ending 28 March and 4 April. The four-week total of new claims is over 16 million, which is roughly 10 percent of the American workforce. [Edited 2020-04-02 and 2020-04-09.]

I’ve been brooding about the economic collapse for a couple of weeks. I worry that the consequences of unemployment and business failures could be even more dire than the direct harm caused by the virus. Recovering from a deep recession can take years, and those who suffer most are the poor and the young. I don’t want to see millions of lives blighted and the dreams of a generation thwarted. But Covid-19 is still rampant. Relaxing our defenses could swamp the hospitals and elevate the death rate. No one is eager to take that risk (except perhaps Donald Trump, who dreams of an Easter resurrection).

The other day I was squabbling about these economic perils with the person I shelter-in-place with. Yes, she said, we’re facing a steep decline, but what makes you so sure it’s going to last for years? Why can’t the economy bounce back? I patiently mansplained about the irreversibility of events like bankruptcy and eviction and foreclosure, which are almost as hard to undo as death. That argument didn’t settle the matter, but we let the subject drop. (We’re hunkered down 24/7 here; we need to get along.)

In the middle of the night, the question came back to me. Why *won’t* it bounce back? Why can’t we just pause the economy like a video, then a month or two later press the play button to resume where we left off?

One problem with pausing the economy is that people can’t survive in suspended animation. They need a continuous supply of air, water, food, shelter, TV shows, and toilet paper. You’ve got to keep that stuff coming, no matter what. But people are only part of the economy. There are also companies, corporations, unions, partnerships, non-profit associations—all the entities we create to organize the great game of getting and spending. A company, considered as an abstraction, has no need for uninterrupted life support. It doesn’t eat or breathe or get bored. So maybe companies could be put in the deep freeze and then thawed when conditions improve.

Lying awake in the dark, I told myself a story:

Clare owns a little café at the corner of Main and Maple in a New England college town. In the middle of March, when the college sent the students home, she lost half her customers. Then, as the epidemic spread, the governor ordered all restaurants to close. Clare called up Rory the Roaster to cancel her order for coffee beans, pulled her ad from the local newspaper, and taped a “C U Soon” sign to the door. Then she sat down with her only employee, Barry the Barista, to talk about the bad news.

Barry was distraught. “I have rent coming due, and my student loan, and a car payment.”

“I wish I could be more help,” Clare replied. “But the rent on the café is also due. If I don’t pay it, we could lose the lease, and you won’t have a job to come back to. We’ll both be on the street.” They sat glumly in the empty shop, six feet apart. Seeing the lost-puppy look in Barry’s eyes, Clare added: “Let me call up Larry the Landlord and see if we can work something out.”

Larry was sympathetic. He’d been hearing from lots of tenants, and he genuinely wanted to help. But he told Clare what he’d told the rest: “The building has a mortgage. If I don’t pay the bank, I’ll lose the place, and we’ll all be on the street.”

You can guess what Betty the Banker said. “I have obligations to my depositors. Accounts earn interest every month. People are redeeming CDs. If I don’t maintain my cash reserves, the FDIC will come in and seize our assets. We’ll all be on the street.”

Everyone in this little fable wants to do the right thing. No one wants to put Clare out of business or leave Barry without an income. And yet my nocturnal meditations come to a dark end, in which the failure of Clare’s corner coffee shop triggers a worldwide recession. Barry gets evicted, Larry defaults on his loan, Betty’s bank goes bottom up. Rory the Roaster also goes under, and the Colombian farm that supplies the beans lays off all its workers. With Clare’s place now an empty storefront, there are fewer shoppers on Main Street, and the bookstore a few doors away folds up. The newspaper where Clare used to advertise ceases publication. The town’s population dwindles. The college closes.

At this point I feel like Ebenezer Scrooge pleading with the Ghost of Christmas Future to save Tiny Tim, or George Bailey desperate to escape the mean streets of Potterville and get back to the human warmth of Bedford Falls. Surely there must be some way to avert this catastrophe.

Here’s my idea. The rent and loan payments that cause all this economic mayhem are different from the transactions that Clare handles at her cash register. In her shopkeeper economy, money comes in only when coffee goes out; the two events are causally connected and simultaneous. And if she’s not selling any coffee, she can stop buying beans. The payment of her rent, on the other hand, is triggered by nothing but the ticking of the clock. She is literally buying time. Now the remedy is obvious: Stop the clock, or reset it. This is easier than you might think. We just go skipping down the Yellow Brick Road and petition the wizard to issue a proclamation. The wizard’s decree says this:

In the year 2020, April 30 shall be followed by April 1.

*Redux* is Latin for “a thing brought back or restored.” The word was introduced—or brought back—into the modern American vocabulary by John Updike’s 1971 novel *Rabbit Redux*, having been used earlier in titles of works by Dryden and Trollope. It’s one of those words I’ve always avoided saying aloud because of doubt about the pronunciation. The OED says it’s *re-ducks*.

How does this fiddling with the calendar help Clare? Consider what happens when the calendar flips from April 30 to April 1 Redux. It’s the first of the month, and the rent is due. But wait! No it’s not. She already paid the rent for April, a month ago. It won’t be due again until May 1, and that’s a month away. It’s the same with Larry’s mortgage payment, and Barry’s car loan. Of course stopping the clock cuts both ways. If you get a monthly pension or Social Security payment, that won’t be coming in April Redux, nor will the bank pay you interest on your deposits.

By means of this sly maneuver we have broken a vicious cycle. Larry doesn’t get a rent check from Clare, but he also doesn’t have to write a mortgage-loan check to Betty, who doesn’t have to make payments to her depositors and creditors. Each of them gets a month’s reprieve. With this extra slack, maybe Clare can keep Barry on the payroll and still have a viable business when her customers finally come out of hiding.

But isn’t this just a sneaky scheme to deprive the creditor class of money they are legally entitled to receive under the terms of contracts that both parties willingly signed? Yes it is, and a clever one at that. It is also a way to more equitably distribute the risks and costs of the present crisis. At the moment the burden falls heavily on Clare and Barry, who are forbidden to sell me a cup of coffee; but Larry and Betty are free to go on collecting their rents and loan payments. In addition to spreading around the financial pain, the scheme might also reduce the likelihood of a major, lasting economic contraction, which none of these characters would enjoy.

In spite of these appeals to the greater good of society as a whole, you may still feel there’s something dishonest about April Redux. If so, we can have the wizard issue a second decree:

In the 30 months from May 2020 through November 2022,

every month shall have one day fewer than the usual number.

every month shall have one day fewer than the usual number.

During this period every scheduled payment will come due a day sooner than usual. At the end, lenders and borrowers are even-steven.

The last time anybody tinkered with the calendar in the English-speaking world was 1752, when the British isles and their colonies finally adopted the Gregorian calendar (introduced elsewhere as early as 1592). *Past & Present*, no. 149, 1995, pp. 95–139. JSTOR (paywall).*was* concern and controversy about the proper calculation of wages, rents, and interest in the abbreviated September.

Riots in the streets are clearly a no-no in this period of social distancing, so presumably we won’t have to worry about mob action when April repeats itself. Besides, who’s going to complain about having 30 days *added* to their lifespan? I suppose there may be some grumbling from people with April birthdays, who think they are suddenly two years older. And back-to-back April Fool days could test the nation’s patience.

Although my plan for an April do-over is presented in the spirit of the season, I do think it illuminates a serious issue—an aspect of modern commerce that makes the current situation especially dangerous. Our problem is not that we have shut down the whole economy. The problem is that we’ve shut down only *half* the economy. The other half carries on with business as usual, creating imbalances that leave the whole edifice teetering on the brink of collapse.

The $2 trillion rescue package enacted last week addresses some of these issues. The cash handout for individual taxpayers, and a sweetening of unemployment benefits, should help Barry muddle through and pay his bills. A program of loans for small businesses could keep Clare afloat, and the loan would be forgiven if she keeps Barry on the payroll. These are thoughtful and useful measures, and a refreshing change from earlier bailout practices. We are not sending all the funds directly to investment banks and insurance companies. But a big share will wind up there anyway, since we are effectively subsidizing the rent and mortgage payments of individuals and small businesses. I wonder if it wouldn’t be fairer, more effective, and less expensive to curtail some of those payments. I’m not suggesting that we shut down the banks along with the shops; that would make matters worse. But we might require financial institutions to defer or forgo certain payments from distressed small businesses and the employs they lay off.

Voluntary efforts along these lines promise to soften the impact for at least a few lucky workers and businesses that have lost their revenue stream. In my New England college town, some of the banks are offering to defer monthly payments on mortgage loans, and there’s social pressure on landlords to do defer rents.

But don’t count on everyone to follow that program. On March 31, following announcements of layoffs and furloughs by Macy’s, Kohl’s, and other large retailers, the *New York Times* reported: “Last week, Taubman, a large owner of shopping malls, sent a letter to its tenants saying that the company expected them to keep paying their rent amid the crisis. Taubman, which oversees well-known properties like the Mall at Short Hills in New Jersey, reminded its tenants that it also had obligations to meet, and was counting on the rent to pay lenders and utilities.” [Added 2020-03-31.]

The coronavirus crisis is being treated as a unique event (and I certainly hope we’ll never see the like of it again). The associated economic crisis is also unique, at least within my memory. Most panics and recessions have their roots in the financial markets. At some point investors realize that tech stocks with an infinite price-to-earnings ratio are not such a bargain after all, or that bundling together thousands of risky mortgages doesn’t actually make them less risky. When the bubble bursts, the first casualties are on Wall Street; only later do the ripple effects reach Clare’s café. Now, we are seeing a rare disturbance that travels in the opposite direction. Do we know how to fix it?

]]>`img`

tag. The process was cumbersome and the product was ugly. In 2009 I wrote an `e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots`

and it would appear on your screen as:

\[e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots\]

All the work of parsing the TeX code and typesetting the math was done by a JavaScript program downloaded into your browser along with the rest of the web page.

Cervone’s jsMath soon evolved into MathJax, an open-source project initially supported by the AMS and SIAM. There are now about two dozen sponsors, and the project is under the aegis of NumFOCUS.

MathJax has made a big difference in my working life, transforming a problem into a pleasure. Putting math on the web is fun! Sometimes I do it just to show off. Furthermore, the software has served as an inspiration as well as a helpful tool. Until I saw MathJax in action, it simply never occurred to me that interesting computations could be done within the JavaScript environment of a web browser, which I had thought was there mainly to make things blink and jiggle. With the example of MathJax in front of me, I realized that I could not only display mathematical ideas but also explore and animate them within a web page.

Last fall I began hearing rumors about MathJax 3.0, “a complete rewrite of MathJax from the ground up using modern techniques.” It’s the kind of announcement that inspires both excitement and foreboding. What will the new version add? What will it take away? What will it fix? What will it break?

Before committing all of bit-player to the new version, I thought I would try a small-scale experiment. I have a standalone web page that makes particularly tricky use of MathJax. The page is a repository of the Dotster programs extracted from a recent bit-player post, My God, it’s full of dots. In January I got the Dotster page running with MathJax 3.

Most math in web documents is static content: An equation needs to be formatted once, when the page is first displayed, and it never changes after that. The initial typesetting is handled automatically by MathJax, in both the old and the new versions. As soon as the page is downloaded from the server, MathJax makes a pass through the entire text, identifying elements flagged as TeX code and replacing them with typeset math. Once that job is done, MathJax can go to sleep.

The Dotster programs are a little different; they include equations that change dynamically in response to user input. Here’s an example:

The slider on the left sets a numerical value that gets plugged into the two equation on the right. Each time the slider is moved, the equations need to be updated and reformatted. Thus with each change to the slider setting, MathJax has to wake up from its slumbers and run again to typeset the altered content.

The MathJax program running in the little demo above is the older version, 2.7. Cosmetically, the result is not ideal. With each change in the slider value, the two equations contract a bit, as if pinched between somebody’s fingers, and then snap back to their original size. They seem to wink at us.

The winking effect is caused by a MathJax feature called Fast Preview. The system does a quick-and-dirty rendering of the math content without calculating the correct final sizes for the various typographic elements. (Evidently that calculation takes a little time). You can turn off Fast Preview by right-clicking or control-clicking one of the equations and then navigating through the submenus shown at right. However, you’ll probably judge the result to be worse rather than better. Without Fast Preview, you’ll get a glimpse of the raw TeX commands. Instead of winking, the equations do jumping jacks.

I am delighted to report that all of this visual noise has been eliminated in the new MathJax. On changing a slider setting, the equations are updated in place, with no unnecessary visual fuss. And there’s no need for a progress indication, because the change is so quick it appears to be instantaneous. See for yourself:

Thus version 3 looks like a big win. There’s a caveat: Getting it to work did not go quite as smoothly as I had hoped. Nevertheless, this is a story with a happy ending.

If you have only static math content in your documents, making the switch to MathJax 3 is easy. In your HTML file you change a URL to load the new MathJax version, and convert any configuration options to a new format. As it happens, all the default options work for me, so I had nothing to convert. What’s most important about the upgrade path is what you *don’t* need to do. In most cases you should not have to alter any of the TeX commands present in the HTML files being processed by MathJax. (There are a few small exceptions.)

With dynamic content, further steps are needed. Here is the JavaScript statement I used to reawaken the typesetting engine in MathJax version 2.7:

```
MathJax.Hub.Queue(["Typeset", MathJax.Hub, mathjax_demo_box]);
```

The statement enters a `Typeset`

command into a queue of pending tasks. When the command reaches the front of the queue, MathJax will typeset any math found inside the HTML element designated by the identifier `mathjax_demo_box`

, ignoring the rest of the document.

In MathJax 3, the documentation suggested I could simply replace this command with a slightly different and more direct one:

```
MathJax.typeset([mathjax_demo_box]);
```

I did that. It didn’t work. When I moved the slider, the displayed math reverted to raw TeX form, and I found an error message in the JavaScript console:

What has gone wrong here? JavaScript’s `appendChild`

method adds a new node to the treelike structure of an HTML document. It’s like hanging an ornament from some specified branch of a Christmas tree. The error reported here indicates that the specified branch does not exist; it is `null`

.

Let’s not tarry over my various false starts and wrong turns as I puzzled over the source of this bug. I eventually found the cause and the solution in the “issues” section of the MathJax repository on GitHub. Back in September of last year Mihai Borobocea had reported a similar problem, along with the interesting observation that the error occurs only when an existing TeX expression is being replaced in a document, not when a new expression is being added. Borobocea had also discovered that invoking the procedure `MathJax.typesetClear()`

before `MathJax.typeset()`

would prevent the error.

A comment by Cervone explains much of what’s going on:

You are correct that you should use

`MathJax.typesetClear()`

if you have removed previously typeset math from the page. (In version 3, there is information stored about the math in a list of typeset expressions, and if you remove typeset math from the page and replace it with new math, that list will hold pointers to math that no longer exists in the page. That is what is causing the error you are seeing . . . )

I found that adding `MathJax.typesetClear()`

did indeed eliminate the error. As a practical matter, that solved my problem. But Borobocea pointed out a remaining loose end. Whereas `MathJax.typeset([mathjax_demo_box])`

operates only on the math inside a specific container, `MathJax.typesetClear()`

destroys the list of math objects for the entire document, an act that might later have unwanted consequences. Thus it seemed best to reformat all the math in the document whenever any one expression changes. This is inefficient, but with the 20-some equations in the Dotster web page the typesetting is so fast there’s no perceptible delay.

In January a fix for this problem was merged into MathJax 3.0.1, which is now the shipping version. Cervone’s comment on this change says that it “prevents the error message,” which left me with the impression that it might suppress the message without curing the error itself. But as far as I can tell the entire issue has been cleared up. There’s no longer any need to invoke `MathJax.typesetClear()`

.

In my first experiments with version 3.0 I stumbled onto another bit of weirdness, but it turned out to be a quirk of my own code, not something amiss in MathJax.

I was seeing occasional size variations in typeset math that seemed reminiscent of the winking problem in version 2.7. Sometimes the initial, automatic typesetting would leave the equations in a slightly smaller size; they would grow back to normal as soon as `MathJax.typeset()`

was applied. In the image at right I have superimposed the two states, with the correct, larger image colored red. It looks like Fast Preview has come back to haunt us, but that can’t be right, because Fast Preview has been removed entirely from version 3.

My efforts to solve this mystery turned into quite a debugging debacle. I got a promising clue from an exchange on the MathJax wiki, discussing size anomalies when math is composed inside an HTML element temporarily flagged `display: none`

, a style rule that makes the math invisible. In that circumstance MathJax has no information about the surrounding text, and so it leaves the typeset math in a default state. The same mechanism might account for what I was seeing—except that my page has no elements with a `display: none`

style.

I first observed this problem in the Chrome browser, where it is intermittent; when I repeatedly reloaded the page, the small type would appear about one time out of five. What fun! It takes multiple trials just to know whether an attempted fix has had any effect. Thus I was pleased to discover that in Firefox the shrunken type appears consistently, every time the page is loaded. Testing became a great deal easier.

I soon found a cure, though not a diagnosis. While browsing again in the MathJax issues archive and in a MathJax user forum, I came across suggestions to try a different form of output, with mathematical expressions constructed not from text elements in HTML and style rules in CSS but from paths drawn in Scalable Vector Graphics, or SVG. I found that the SVG expressions were stable and consistent in size, and in other respects indistinguishable from their HTML siblings. Again my problem was solved, but I still wanted to know the underlying cause.

Here’s where the troubleshooting report gets a little embarrassing. Thinking I might have a new bug to report, I set out to build a minimal exemplar—the smallest and simplest program that would trigger the bug. I failed. I was starting from a blank page and adding more and more elements of the original program—`div`

s nested inside `div`

s in the HTML, various stylesheet rules in the CSS, bigger collections of more complex equations—but none of these additions produced the slightest glitch in typesetting. So I tried working in the other direction, starting with the complex misbehaving program and stripping away elements until the problem disappeared. But it didn’t disappear, even when I reduced the page to a single equation in a plain white box.

As often happens, I found the answer not by banging my head against the problem but by going for a walk. Out in the fresh air, I finally noticed the one oddity that distinguished the failing program from all of the correctly working ones. Because the Dotster program began life embedded in a WordPress blog post, I could not include a link to the CSS stylesheet in the `head`

section of the HTML file. Instead, a JavaScript function constructed the link and inserted it into the `head`

. That happened *after* MathJax made its initial pass over the text. At the time of typesetting, the elements in which the equations were placed had no styles applied, and so MathJax had no way of determining appropriate sizes.

When Don Knuth unveiled TeX, circa 1980, I was amazed. Back then, typewriter-style word processing was impressive enough. TeX did much more: real typesetting, with multiple fonts (which Knuth also had to create from scratch), automatic hyphenation and justification, and beautiful mathematics.

Thirty years later, when Cervone created MathJax, I was amazed again—though perhaps not for the right reasons. I had supposed that the major programming challenge would be capturing all the finicky rules and heuristics for building up math expressions—placing and sizing superscripts, adjusting the height and width of parentheses or a radical sign to match the dimensions of the expression enclosed, spacing and aligning the elements of a matrix. Those are indeed nontrivial tasks, but they are just the beginning. My recent adventures have helped me see that another major challenge is making TeX work in an alien environment.

In classic TeX, the module that typesets equations has direct access to everything it might ever need to know about the surrounding text—type sizes, line spacing, column width, the amount of interword “glue” needed to justify a line of type. Sharing this information is easy because all the formatting is done by the same program. MathJax faces a different situation. Formatting duties are split, with MathJax handling mathematical content but the browser’s layout engine doing everything else. Indeed, the document is written in two different languages, TeX for the math and HTML/CSS for the rest. Coordinating actions in the two realms is not straightforward.

There are other complications of importing TeX into a web page. The classic TeX system runs in batch mode. It takes some inputs, produces its output, and then quits. Batch processing would not offer a pleasant experience in a web browser. The entire user interface (such as the buttons and sliders in my Dotster programs) would be frozen for the duration. To avoid this kind of rudeness to the user, MathJax is never allowed to monopolize JavaScript’s single thread of execution for more than a fraction of a second. To ensure this cooperative behavior, earlier versions relied on a hand-built scheme of queues (where procedures wait their turn to execute) and callbacks (which signal when a task is complete). Version 3 takes advantage of a new JavaScript construct called a *promise*. When a procedure cannot compute a result immediately, it hands out a promise, which it then redeems when the result becomes available.

Wait, there’s more! MathJax is not just a TeX system. It also accepts input written in MathML, a dialect of XML specialized for mathematical notation. Indeed, the internal language of MathJax is based on MathML. And MathJax can also be configured to handle AsciiMath, a cute markup language that aims to make even the raw form of an expression readable. Think of it as math with emoticons: Type ``oo``

and you’ll get \(\infty\), or ``:-``

for \(\div\).

MathJax also provides an extensive suite of tools for accessibility. Visually impaired readers can have an equation read aloud. As I learned at the January Joint Math Meetings, there are even provisions for generating Braille output—but that’s a subject that deserves a post of its own.

When I first encountered MathJax, I saw it as a marvel, but I also considered it a workaround or stopgap. Reading a short document that includes a single equation entails downloading the entire MathJax program, which can be much larger than the document itself. And you need to download it all again for every other mathy document (unless your browser cache hangs onto a copy). What an appalling waste of bandwidth.

Several alternatives seemed more promising as a long-term solution. The best approach, it seemed to me then, was to have support for mathematical notation built into the browser. Modern browsers handle images, audio, video, SVG, animations—why not math? But it hasn’t happened. Firefox and Safari have limited support for MathML; none of the browsers I know are equipped to deal with TeX.

Another strategy that once seemed promising was the browser plugin. A plugin could offer the same capabilities as MathJax, but you would download and install it only once. This sounds like a good deal for readers, but it’s not so attractive for the author of web content. If there are multiple plugins in circulation, they are sure to have quirks, and you need to accommodate all of them. Furthermore, you need some sort of fallback plan for those who have not installed a plugin.

Still another option is to run MathJax on the server, rather than sending the whole program to the browser. The document arrives with TeX or MathML already converted to HTML/CSS or SVG for display. This is the preferred modus operandi for several large websites, most notably Wikipedia. I’ve considered it for bit-player, but it has a drawback: Running on the server, MathJax cannot provide the kind of on-demand typesetting seen in the demos above.

As the years go by, I am coming around to the view that MathJax is not just a useful stopgap while we wait for the right thing to come along; it’s quite a good approximation to the right thing. As the author of a web page, I get to write mathematics in a familiar and well-tested notation, and I can expect that any reader with an up-to-date browser will see output that’s much like what I see on my own screen. At the same time, the reader also has control over how the math is rendered, via the context menu. And the program offers accessibility features that I could never match on my own.

To top it off, the software is open-source—freely available to everyone. That is not just an economic advantage but also a social one. The project has a community that stands ready to fix bugs, listen to suggestions and complaints, offer help and advice. Without that resource, I would still be struggling with the hitches and hiccups described above.

]]>Wandering around in these cavernous spaces always leaves me feeling a little disoriented and dislocated. It’s not just that I’m lost, although often enough I am—searching for Lobby D, or Meeting Room 407, or a toilet. I’m also dumbfounded by the very existence of these huge empty boxes, monuments to the human urge to congregate. If you build it, we will come.

It seems every city needs such a place, commensurate with its civic stature or ambitions. It’s no mystery why the cities make the investment. The JMM attracted more than 5,500 mathematicians (plus a few interlopers like me). I would guess we each spent on the order of $1,000 in payments to hotels, restaurants, taxis, and such, and perhaps as much again on airfare and registration fees. The revenue flowing to the city and its businesses and citizens must be well above $5 million. Furthermore, from the city’s point of view it’s all free money; the visitors do not send their children to the local schools or add to the burden on other city services, and they don’t vote in Denver.

However, this calculation tells only half the story. Although visitors to the Colorado Convention Center leave wads of cash in Denver, at the same time Denver residents are flying off to meetings elsewhere, withdrawing funds from the local economy and spreading the money around in Phoenix, Seattle, or Boston. If the convention-going traffic is symmetrical, the exchange will come out even for everyone. So why don’t we all save ourselves a lot of bother—not to mention millions of dollars—and just stay home? From inside the convention center, you may not be able to tell what city you’re in anyway.

While I was in Denver, I looked at the schedule of upcoming events for the convention center. A boat show was getting underway even as the mathematicians were still roaming the corridors, and tickets were also on sale for some sort of motorcycling event. The drillers and frackers were coming to town a few weeks later, and then in March the American Physical Society would hold its biggest annual gathering, with about twice as many participants as the JMM. The APS meeting was scheduled for this week, Monday through Friday (March 2–6). But late last Saturday night the organizers decided to cancel the entire conference because of the coronavirus threat. Some attendees were already in Denver or on their way.

I was taken aback by this decision, which is not to say I believe it was wrong. A year from now, if the world is still recovering from an epidemic that killed many thousands, the decisionmakers at the APS will be seen as prescient, prudent, and public-spirited. On the other hand, if Covid-19 sputters out in a few weeks, they may well be mocked as alarmists who succumbed to panic. But the latter judgment would be a little unfair. After all, the virus might be halted precisely *because* those 11,000 physicists stayed home.

I have not yet heard of other large scientific conferences shutting down, but a number of meetings in the tech industry have been called off, postponed, or gone virtual, along with some sports and entertainment events. The American Chemical Society is “monitoring developments” in advance of their big annual meeting, scheduled for later this month in Philadelphia. [Update: On March 9 the ACS announced "we are cancelling (terminating) the ACS Spring 2020 National Meeting & Expo."] Even if the events go on, some prospective participants will not be able to attend. I’ve just received an email from Harvard with stern warnings and restrictions on university-related travel.

Presumably, the Covid-19 threat will run its course and dissipate, and life will return to something called normal. But it’s also possible we have a new normal, that we have crossed some sort of demographic or epidemiological threshold, and novel pathogens will be showing up more frequently. Furthermore, the biohazard is not the only reason to question the future of megameetings; the ecohazard may be even more compelling.

All in all, it seems an apt moment to reflect on the human urge to come together in these large, temporary encampments, where we share ideas, opinions, news, gossip—and perhaps viruses—before packing up and going home until next year. Can the custom be sustained? If not, what might replace it?

Mathematicians and physicists have not always formed roving hordes to plunder defenseless cities. Until the 20th century there weren’t enough of them to make a respectable motorcycle gang. Furthermore, they had no motorcycles, or any other way to travel long distances in a reasonable time.

Before the airplane and the railroad, meetings between scientists were generally one-on-one. Consider the sad story of Neils Henrik Abel, a young Norwegian mathematician in the 1820s. Feeling cut off from his European colleagues, he undertook a two-year-long trek from Oslo to Berlin and Paris, traveling almost entirely on foot. In Paris he visited Lagrange and Cauchy, who received him coolly and did not read his proof of the unsolvability of quintic equations. So Abel walked home again. Somewhere along the way he picked up a case of tuberculosis and died two years later, at age 27, impoverished and probably unaware that his work was finally beginning to be noticed. I like to think the outcome would have been happier if he’d been able to present his results in a contributed-paper session at the JMM.

For Abel, the take-a-hike model of scholarly communication proved ineffective; perhaps more important, it doesn’t scale well. If everyone must make individual *tête-à-tête* visits, then forming connections between \(n\) scientists would require \(n(n - 1) / 2\) trips. Having everyone converge at a central point reduces the number to \(n\). From this point of view, the modern mass meeting looks not like a travel extravagance but like a strategy for minimizing total air miles. Still, staying home would be even more frugal, whether the cost is measured in dollars, kelvins, or epidemiological risk.

Most of the big disciplinary conferences got their start toward the end of the 19th century, and by the 1930s and 40s had hundreds of participants. Writing about mathematical life in that era, Ralph Boas notes: “One reason for going to meetings was that photocopying hadn’t been invented; it was at meetings that one found out what was going on.” But now photocopying *has* been invented—and superseded. There’s no need for a cross-country trip to find out what’s new; on any weekday morning you can just check the arXiv. Yet attendance at these meetings is up by another order of magnitude.

Even in a world with faster channels of communication, there are still moments of high excitement in the big convention halls. At the 1987 March meeting of the APS, the recent discovery of high-temperature superconductivity in cuprate ceramics was presented and discussed in a lively session that lasted past 3 a.m. The event is known as the Woodstock of Physics. I missed it—as well as the original Woodstock. But I was at the JMM in 2014 when progress toward confirming the twin prime conjecture caused a big stir. The conjecture (still unproved) says there are infinitely many pairs of prime numbers, such as 11 and 13, separated by exactly 2. Yiting Zhang had just proved there are infinitely many primes separated by no more than 70 million. Several talks discussed this finding and followup work by others, and Zhang himself spoke to a packed room.

Boas emphasized the motive of *hearing* what’s new, but one must not ignore the equally important impulse to *tell* what’s new. At the recent JMM, with its 5,500 visitors, the book of abstracts listed 2,529 presentations. In other words, almost half the visitors came to *deliver* a talk, which is probably a stronger motivation than hearing what others have to say. (When I first saw those numbers, I had the thought: “So, on average every presentation had one speaker and one listener.” The truth is not quite as bad as that, but it’s still worth keeping in mind that a meeting of this kind is not like a rock concert or a football game, with only a dozen or so performers and thousands in the audience.)

At some gatherings, the aim is not so much to talk about math and science but to *do* it. Groups of three or four huddle around blackboards or whiteboards, collaborating. But this activity is commoner at small, narrowly focused meetings—maybe at Aspen for the physicists or Banff for the mathematicians. No doubt such things also happen at the bigger meetings, but they are not a major item on the agenda for most attendees.

For one subpopulation of meeting-goers the main motivation is very practical: getting a job. Again this is a matter of efficiency. Someone looking for a postdoc position can arrange a dozen interviews at a single meeting.

There are many reasons to make the pilgrimage to the Colorado Convention Center, but I think the most important factor is yet to be stated. Dennis Flanagan, who was my employer, friend, and mentor many years ago at *Scientific American*, wrote that “science is intensely social.”

In an active scientific discipline everyone knows everyone else, if not in person, then by their writings and reputation. Scientists attend at least as many meetings and conventions as salesmen. Flanagan’s Version, 1988, p. 15.

You might interpret this comment as saying that scientists—like salesmen—are a bunch of genial, gregarious party animals who like to go out on the town, drink to excess, and misbehave. But I’m pretty sure that’s not what Dennis had in mind. He was arguing that social interactions are essential to the *process* of science. Becoming a mathematician or a physicist is tantamount to joining a club, and you can’t do that in isolation. You have to absorb the customs, the tastes, the values of the culture. For example, you need to internalize the community standard for deciding what is true. (It’s rather different in physics and mathematics.) Even subtler is the standard for deciding what is *interesting*—what ideas are worth pursuing, what problems are worth solving.

Meetings and conferences are not the only way of inculcating culture; the apprenticeship system known as graduate school is clearly more imporant overall. Still, discipline-wide gatherings have a role. By their very nature they are more cosmopolitan than any one university department. They acquaint you with the norms of the population but also with the range of variance, and thereby improve the probability that you’ll figure out where you fit in.

The quintessential big-meeting event is running into someone in the hallway whom you see only once a year. You stop and shake hands, or even hug. (In future we’ll bump elbows.) You’re both in a hurry. If you chat too long, you’ll miss the opening sentences of the next talk, which may be the only sentences you’ll understand. So the exchange of words is brief and unlikely to be deep. As I and my cohort grow older, it often amounts to little more than, “Wow. I’m still alive and so are you!” But sometimes it’s worth traveling a thousand miles to get that human validation.

If we have to dispense with such gatherings, science and math will muddle through somehow. We’ll meet more in the sanitary realm of bits and pixels, less in this fraught environment of atoms. We’ll become more hierarchical, with greater emphasis on local meetings and less on national and international ones. The alternatives can be made to work, and the next generation will view them as perfectly natural, if not inevitable. But I’m going to miss the ugly carpet, the uncomfortable folding/stacking chairs, and the ballrooms where nobody dances.

]]>In mathematics abstraction serves as a kind of stairway to heaven—as well as a test of stamina for those who want to get there.

Some years later you reach higher ground. The symbols representing particular numbers give way to the \(x\)s and \(y\)s that stand for quantities yet to be determined. They are symbols for symbols. Later still you come to realize that this algebra business is not just about “solving for \(x\),” for finding a specific number that corresponds to a specific letter. It’s a magical device that allows you to make blanket statements encompassing *all* numbers: \(x^2 - 1 = (x + 1)(x - 1)\) is true for any value of \(x\).

Continuing onward and upward, you learn to manipulate symbolic expressions in various other ways, such as differentiating and integrating them, or constructing functions of functions of functions. Keep climbing the stairs and eventually you’ll be introduced to areas of mathematics that openly boast of their abstractness. There’s *abstract algebra*, where you build your own collections of numberlike things: groups, fields, rings, vector spaces. *category theory*, where you’ll find a collection of ideas with the disarming label *abstract nonsense*.

Not everyone is filled with admiration for this Jenga tower of abstractions teetering atop more abstractions. Consider Andrew Wiles’s proof of Fermat’s last theorem, and its reception by the public. The theorem, first stated by Pierre de Fermat in the 1630s, makes a simple claim about powers of integers: If \(x, y, z, n\) are all integers greater than \(0\), then \(x^n + y^n = z^n\) has solutions only if \(n \le 2\). The proof of this claim, published in the 1990s, is not nearly so simple. Wiles (with contributions from Richard Taylor) went on a scavenger hunt through much of modern mathematics, collecting a truckload of tools and spare parts needed to make the proof work: elliptic curves, modular forms, Galois groups, functions on the complex plane, *L*-series. It is truly a *tour de force*.

*E* with certain properties. But the properties deduced on the left and right branches of the diagram turn out to be inconsistent, implying that *E* does not exist, nor does the counterexample that gave rise to it.

Is all that heavy machinery really needed to prove such an innocent-looking statement? Many people yearn for a simpler and more direct proof, ideally based on methods that would have been available to Fermat himself. *Parade* columnist, takes an even more extreme position, arguing that Wiles strayed so far from the subject matter of the theorem as to make his proof invalid. (For a critique of her critique, see Boston and Granville.)

Almost all of this grumbling about illegimate methods and excess complexity comes from outside the community of research mathematicians. Insiders see the Wiles proof differently. For them, the wide-ranging nature of the proof is actually what’s most important. The main accomplishment, in this view, was cementing a connection between those far-flung areas of mathematics; resolving FLT was just a bonus.

Yet even mathematicians can have misgivings about the intricacy of mathematical arguments and the ever-taller skyscrapers of abstraction. Jeremy Gray, a historian of mathematics, believes anxiety over abstraction was already rising in the 19th century, when mathematics seemed to be “moving away from reality, into worlds of arbitrary dimension, for example, and into the habit of supplanting intuitive concepts (curves that touch, neighboring points, velocity) with an opaque language of mathematical analysis that bought rigor at a high cost in intelligibility.”

*MAA Focus* by Adriana Salerno. The thesis was to be published in book form last fall by Birkhäuser, but the book doesn’t seem to be available yet.

I like to imagine abstraction (abstractly ha ha ha) as pulling the strings on a marionette. The marionette, being “real life,” is easily accessible. Everyone understands the marionette whether it’s walking or dancing or fighting. We can see it and it makes sense. But watch instead the hands of the puppeteers. Can you look at the hand movements of the puppeteers and know what the marionette is doing?… Imagine it gets worse. Much, much worse. Imagine that the marionettes we see are controlled by marionettoids we don’t see which are in turn controlled by pre-puppeteers which are finally controlled by actual puppeteers.

Keep all those puppetoids in mind. I’ll be coming back to them, but first I want to shift my attention to computer science, where the towers of abstraction are just as tall and teetery, but somehow less scary.

Suppose your computer is about to add two numbers…. No, wait, there’s no need to suppose or imagine. In the orange panel below, type some numbers into the \(a\) and \(b\) boxes, then press the “+” button to get the sum in box \(c\). Now, please describe what’s happening inside the machine as that computation is performed.

a

b

c

You can probably guess that somewhere behind the curtains there’s a fragment of code that looks like `c = a + b`

. And, indeed, that statement appears verbatim in the JavaScript program that’s triggered when you click on the plus button. But if you were to go poking around among the circuit boards under the keyboard of your laptop, you wouldn’t find anything resembling that sequence of symbols. The program statement is a high-level abstraction. If you really want to know what’s going on inside the computing engine, you need to dig deeper—down to something as tangible as a jelly bean.

How about an electron? `c = a + b`

by tracing the motions of all the electrons (perhaps \(10^{23}\) of them) through all the transistors (perhaps \(10^{11}\)).

To understand how electrons are persuaded to do arithmetic for us, we need to introduce a whole sequence of abstractions.

- First, step back from the focus on individual electrons, and reformulate the problem in terms of continuous quantities: voltage, current, capacitance, inductance.
- Replace the physical transistors, in which voltages and currents change smoothly, with idealized devices that instantly switch from totally off to fully on.
- Interpret the two states of a transistor as logical values (
*true*and*false*) or as numerical values (\(1\) and \(0\)). - Organize groups of transistors into “gates” that carry out basic functions of Boolean logic, such as and, or, and not.
- Assemble the gates into larger functional units, including adders, multipliers, comparators, and other components for doing base-\(2\) arithmetic.
- Build higher-level modules that allow the adders and such to be operated under the control of a program. This is the conceptual level of the instruction-set architecture, defining the basic operation codes (
*add, shift, jump*, etc.) recognized by the computer hardware. - Graduating from hardware to software, design an operating system, a collection of services and interfaces for abstract objects such as files, input and output channels, and concurrent processes.
- Create a compiler or interpreter that knows how to translate programming language statements such as
`c = a + b`

into sequences of machine instructions and operating-system requests.

From the point of view of most programmers, the abstractions listed above represent computational *infrastructure*: They lie beneath the level where you do most of your thinking—the level where you describe the algorithms and data structures that solve your problem. But computational abstractions are also a tool for building *superstructure*, for creating new functions beyond what the operating system and the programming language provide. For example, if your programming language handles only numbers drawn from the real number line, you can write procedures for doing arithmetic with complex numbers, such as \(3 + 5i\). (Go ahead, try it in the orange box above.) And, in analogy with the mathematical practice of defining functions of functions, we can build compiler compilers and schemes for metaprogramming—programs that act on other programs.

In both mathematics and computation, rising through the various levels of abstraction gives you a more elevated view of the landscape, with wider scope but less detail. Even if the process is essentially the same in the two fields, however, it doesn’t feel that way, at least to me. In mathematics, abstraction can be a source of anxiety; in computing, it is nothing to be afraid of. In math, you must take care not to tangle the puppet strings; in computing, abstractions are a defense against such confusion. For the mathematician, abstraction is an intellectual challenge; for the programmer, it is an aid to clear thinking.

Why the difference? How can abstraction have such a friendly face in computation and such a stern mien in math? One possible answer is that computation is just plain easier than mathematics.

Another possible explanation is that computer systems are engineered artifacts; we can build them to our own specifications. If a concept is just too hairy for the human mind to master, we can break it down into simpler pieces. Math is not so complaisant—not even for those who hold that mathematical objects are invented rather than discovered. We can’t just design number theory so that the Riemann hypothesis will be true.

But I think the crucial distinction between math abstractions and computer abstractions lies elsewhere. It’s not in the abstractions themselves but in the boundaries between them.

*abstraction barrier* in Abelson and Sussman’s Structure and Interpretation of Computer Programs, circa 1986. The underlying idea is surely older; it’s implicit in the “structured programming” literature of the 1960s and 70s. But *SICP* still offers the clearest and most compelling introduction.*information hiding* is considered a virtue, not an impeachable offense. If a design has a layered structure, with abstractions piled one atop the other, the layers are separated by *abstraction barriers*. A high-level module can reach across the barrier to make use of procedures from lower levels, but it won’t know anything about the implementation of those procedures. When you are writing programs in Lisp or Python, you shouldn’t need to think about how the operating system carries out its chores; and when you’re writing routines for the operating system, you needn’t think about the physics of electrons meandering through the crystal lattice of a semiconductor. Each level of the hierarchy can be treated (almost) independently.

Mathematics also has its abstraction barriers, although I’ve never actually heard the term used by mathematicians. A notable example comes from Giuseppe Peano’s formulation of the foundations of arithmetic, circa 1900. Peano posits the existence of a number \(0\), and a function called *successor*, \(S(n)\), which takes a number \(n\) and returns the next number in the counting sequence. Thus the natural numbers begin \(0, S(0), S(S(0)), S(S(S(0)))\), and so on. Peano deliberately refrains from saying anything more about what these numbers look like or how they work. They might be implemented as sets, with \(0\) being the empty set and successor the operation of adjoining an element to a set. Or they could be unary lists: (), (|), (||), (|||), . . . The most direct approach is to use Church numerals, in which the successor function itself serves as a counting token, and the number \(n\) is represented by \(n\) nested applications of \(S\).

From these minimalist axioms we can define the rest of arithmetic, starting with addition. In calculating \(a + b\), if \(b\) happens to be \(0\), the problem is solved: \(a + 0 = a\). If \(b\) is *not* \(0\), then it must be the successor of some number, which we can call \(c\). Then \(a + S(c) = S(a + c)\). Notice that this definition doesn’t depend in any way on how the number \(0\) and the successor function are represented or implemented. Under the hood, we might be working with sets or lists or abacus beads; it makes no difference. An abstraction barrier separates the levels. From addition you can go on to define multiplication, and then exponentiation, and again abstraction barriers protect you from the lower-level details. There’s never any need to think about how the successor function works, just as the computer programmer doesn’t think about the flow of electrons.

The importance of not thinking was stated eloquently by Alfred North Whitehead, more than a century ago:

Alfred North Whitehead, It is a profoundly erroneous truism, repeated by all copybooks and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilisation advances by extending the number of important operations which we can perform without thinking about them. Operations of thought are like cavalry charges in a battle—they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.An Introduction of Mathematics, 1911, pp. 45–46.

If all of mathematics were like the Peano axioms, we would have a watertight structure, compartmentalized by lots of leakproof abstraction barriers. And abstraction would probably not be considered “the hardest part about math.” But, of course, Peano described only the tiniest corner of mathematics. We also have the puppet strings.

In Piper Harron’s unsettling vision, the puppeteers high above the stage pull strings that control the pre-puppeteers, who in turn operate the marionettoids, who animate the marionettes. Each of these agents can be taken as representing a level of abstraction. The problem is, we want to follow the action at both the top and the bottom of the hierarchy, and possibly at the middle levels as well. The commands coming down from the puppeteers on high embody the abstract ideas that are needed to build theorems and proofs, but the propositions to be proved lie at the level of the marionettes. There’s no separating these levels; the puppet strings tie them together.

In the case of Fermat’s Last Theorem, you might choose to view the Wiles proof as nothing more than an elevated statement about elliptic curves and modular forms, but the proof is famous for something else—for what it tells us about the elementary equation \(x^n + y^n = z^n\). Thus the master puppeteers work at the level of algebraic geometry, but our eyes are on the dancing marionettes of simple number theory. What I’m suggesting, in other words, is that abstraction barriers in mathematics sometimes fail because events on both sides of the barrier make simultaneous claims on our interest.

In computer science, the programmer can ignore the trajectories of the electrons because those details really are of no consequence. Indeed, the electronic guts of the computing machinery could be ripped out and replaced by fluidic devices or fiber optics or hamsters in exercise wheels, and that brain transplant would have no effect on the outcome of the computation. Few areas of mathematics can be so cleanly floated away and rebuilt on a new foundation.

Can this notion of leaky abstraction barriers actually explain why higher mathematics looks so intimidating to most of the human population? It’s surely not the whole story, but maybe it has a role.

In closing I would like to point out an analogy with a few other areas of science, where problems that cross abstraction barriers seem to be particularly difficult. Physics, for example, deals with a vast range of spatial scales. At one end of the spectrum are the quarks and leptons, which rattle around comfortably inside a particle with a radius of \(10^{-15}\) meter; at the other end are galaxy clusters spanning \(10^{24}\) meters. In most cases, effective abstraction barriers separate these levels. When you’re studying celestial mechanics, you don’t have to think about the atomic composition of the planets. Conversely, if you are looking at the interactions of elementary particles, you are allowed to assume they will behave the same way anywhere in the universe. But there are a few areas where the barriers break down. For example, near a critical point where liquid and gas phases merge into an undifferentiated fluid, forces at all scales from molecular to macroscopic become equally important. Turbulent flow is similar, with whirls upon whirls upon whirls. It’s not a coincidence that critical phenomena and turbulence are notoriously difficult to describe.

Biology also covers a wide swath of territory, from molecules and single cells to whole organisms and ecosystems on a planetary scale. Again, abstraction barriers usually allow the biologist to focus on one realm at a time. To understand a predator-prey system you don’t need to know about the structure of cytochrome *c*. But the barriers don’t always hold. Evolution spans all these levels. It depends on molecular events (mutations in DNA), and determines the shape and fate of the entire tree of life. We can’t fully grasp what’s going on in the biosphere without keeping all these levels in mind at once.

Sorry. My program and your browser are not getting along. None of the interactive elements of this page will work. Could you try a different browser? Current versions of Chrome, Firefox, and Safari seem to work.

The disks are scattered randomly, except that no disk is allowed to overlap another disk or extend beyond the boundary of the square. Once a disk has been placed, it never moves, so each later disk has to find a home somewhere in the nooks and crannies between the earlier arrivals. Can this go on forever?

The search for a vacant spot would seem to grow harder as the square gets more crowded, so you might expect the process to get stuck at some point, with no open site large enough to fit the next disk. On the other hand, because the disks get progressively smaller, later ones can squeeze into tighter quarters. In the specific filling protocol shown here, these two trends are in perfect balance. The process of adding disks, one after another, never seems to stall. Yet as the number of disks goes to infinity, they completely fill the box provided for them. There’s a place for every last dot, but there’s no blank space left over.

Or at least that’s the mathematical ideal. The computer program that fills the square above never attains this condition of perfect plenitude. It shuts down after placing just 5,000 disks, which cover about 94 percent of the square’s area. This early exit is a concession to the limits of computer precision and human patience, but we can still dream of how it would work in a world without such tiresome constraints.

This scheme for filling space with randomly placed objects is the invention of John Shier, a physicist who worked for many years in the semiconductor industry and who has also taught at Normandale Community College near Minneapolis. He explains the method and the mathematics behind it in a recent book, *Fractalize That! A Visual Essay on Statistical Geometry*. (For bibliographic details see the links and references at the end of this essay.) I learned of Shier’s work from my friend Barry Cipra.

Shier hints at the strangeness of these doings by imagining a set of 100 round tiles in graduated sizes, with a total area approaching one square meter. He would give the tiles to a craftsman with these instructions:

“Mark off an area of one square meter, either a circle or a square. Start with the largest tile, and attach it permanently anywhere you wish in the marked-off area. Continue to attach the tiles anywhere you wish, proceeding always from larger to smaller.

There will always be a place for every tile regardless of how you choose to place them.” How many experienced tile setters would believe this?

Shier’s own creations go way beyond squares and circles filled with simple shapes such as disks. He has shown that the algorithm also works with an assortment of more elaborate designs, including nonconvex figures and even objects composed of multiple disconnected pieces. We get snowflakes, nested rings, stars, butterflies, fish eating lesser fish, faces, letters of the alphabet, and visual salads bringing together multiple ingredients. Shier’s interest in these patterns is aesthetic as well as mathematical, and several of his works have appeared in art exhibits; one of them won a best-of-show award at the 2017 Joint Mathematics Meeting.

Shier and his colleagues have also shown that the algorithm can be made to work in three-dimensional space. The book’s cover is adorned with a jumble of randomly placed toruses filling the volume of a transparent cube. If you look closely, you’ll notice that some of the rings are linked; they cannot be disentangled without breaking at least one ring. (The 3D illustration was created by Paul Bourke, who has more examples online, including 3D-printed models.)

After reading Shier’s account of his adventures, and admiring the pictures, I had to try it for myself. The experiments I’m presenting in this essay have no high artistic ambitions. I stick with plain-vanilla circular disks in a square frame, all rendered with the same banal blue-to-red color scheme. My motive is merely to satisfy my curiosity—or perhaps to overcome my skepticism. When I first read the details of how these graphics are created, I couldn’t quite believe it would work. Writing my own programs and seeing them in action has helped persuade me. So has a proof by Christopher Ennis, which I’ll return to below.

Filling a region of the plane with disks is not in itself such a remarkable trick. One well-known way of doing it goes by the name Apollonian circles. Start with three disks that are all tangent to one another, leaving a spiky three-pointed vacancy between them. Draw a new disk in the empty patch, tangent to all three of the original disks; this is the largest disk that can possibly fit in the space. Adding the new disk creates three smaller triangular voids, where you can draw three more triply tangent disks. There’s nothing to stop you from going on in this way indefinitely, approaching a limiting configuration where the entire area is filled.

There are randomized versions of the Apollonian model. For example, you might place zero-diameter seed disks at random unoccupied positions and then allow them to grow until they touch one (or more) of their neighbors. This process, too, is space-filling in the limit. And it can never fail: Because the disks are custom-fitted to the space available, you can never get stuck with a disk that can’t find a home.

Shier’s algorithm is different. You are given disks one at a time in a predetermined order, starting with the largest, then the second-largest, and so on. To place a disk in the square, you choose a point at random and test to see if the disk will fit at that location without bumping into its neighbors or poking beyond the boundaries of the square. If the tests fail, you pick another random point and try again. It’s not obvious that this haphazard search will always succeed—and indeed it works only if the successive disks get smaller according to a specific mathematical rule. But if you follow that rule, you can keep adding disks forever. Furthermore, as the number of disks goes to infinity, the fraction of the area covered approaches \(1\). It’s convenient to have a name for series of disks that meet these two criteria; I have taken to calling them *fulfilling* series.

In exploring these ideas computationally, it makes sense to start with the simplest case: disks that are all the same size. This version of the process clearly *cannot* be fulfilling. No matter how the disks are arranged, their aggregate area will eventually exceed that of any finite container. Click in the gray square below to start filling it with equal-size disks. The square box has area \(A_{\square} = 4\). The slider in the control panel determines the area of the individual disks \(A_k\), in a range from \(0.0001\) to \(1.0\).

Sorry, the program will not run in this browser.

If you play with this program for a while, you’ll find that the dots bloom quickly at first, but the process invariably slows down and eventually ends in a state labeled “Jammed,” indicating that the program has been unable

The densest possible packing of equal-size disks places the centers on a triangular lattice with spacing equal to the disk diameter. The resulting density (for an infinite number of disks on an infinite plane) is \(\pi \sqrt{3}\, /\, 6 \approx 0.9069\), which means more than 90 percent of the area is covered. A random filling in a finite square is much looser. My first few trials all halted with a filling fraction fairly close to one-half, and so I wondered if that nice round number might be the expectation value of the probabilistic process. Further experiments suggested otherwise. Over a broad range of disk sizes, from \(0.0001\) up to about \(0.01\), the area covered varied from one run to the next, but the average was definitely above one-half—perhaps \(0.54\). After some rummaging through the voluminous literature on circle packing, I think I may have a clue to the exact expectation value: \(\pi / (3 + 2 \sqrt{2}) \approx 0.539012\). Where does that weird number come from? The answer has nothing to do with Shier’s algorithm, but I think it’s worth a digression.

Consider an adversarial process: Alice is filling a unit square with \(n\) equal-size disks and wants to cover as much of the area as possible. Bob, who wants to minimize the area covered, gets to choose \(n\). If Bob chooses \(n = 1\), Alice can produce a single disk that just fits inside the square and covers about \(79\) percent of the space. Can Bob do better? Yes, if Bob specifies \(n = 2\), Alice’s best option is to squeeze the two disks into diagonally opposite corners of the square as shown in the diagram at right. These disks are bounded by right isosceles triangles, which makes it easy to calculate their radii as \(r = 1 / (2 + \sqrt{2}) \approx 0.2929\). Their combined area works out to that peculiar number \(\pi / (3 + 2 \sqrt{2}) \approx 0.54\).

If two disks are better than one (from Bob’s point of view), could three be better still? Or four, or some larger number? Apparently not. In 2010, Erik Demaine, Sándor Fekete and Robert Lang conjectured that the two-disk configuration shown above represents the worst case for any number of equal-size disks. In 2017 Fekete, Sebastian Morr, and Christian Scheffer proved this result.

Is it just a coincidence that the worst-case density for packing disks into a square also appears to be the expected density when equal-size disks are placed randomly until no more will fit? Wish I knew.

Let us return to the questions raised in Shier’s *Fractalize That!* If we want to fit infinitely many disks into a finite square, our only hope is to work with disks that get smaller and smaller as the process goes on. The disk areas must come from some sequence of ever-diminishing numbers. Among such sequences, the one that first comes to mind is \(\frac{1}{1}, \frac{1}{2}, \frac{1}{3}, \frac{1}{4}, \ldots\) These fractions have been known since antiquity as the harmonic numbers. (They are the wavelengths of the overtones of a plucked string.)

To see what happens when successive disks are sized according to the harmonic sequence, click in the square below.

Sorry, the program will not run in this browser.

Again, the process halts when no open space is large enough to accommodate the next disk in the sequence. If you move the slider all the way to the right, you’ll see a sequence of disks with areas drawn from the start of the full harmonic sequence, \(\frac{1}{1} , \frac{1}{2}, \frac{1}{3}, \dots\); at this setting, you’ll seldom get beyond eight or nine disks. Moving the slider to the left omits the largest disks at the beginning of the sequence, leaving the infinite tail of smaller disks. For example, setting the slider to \(1/20\) skips all the disks from \(\frac{1}{1}\) through \(\frac{1}{19}\) and begins filling the square with disks of area \(\frac{1}{20}, \frac{1}{21}, \frac{1}{22}, \dots\) Such truncated series go on longer, but eventually they also end in a jammed configuration.

The slider goes no further than 1/50, but even if you omitted the first 500 disks, or the first 5 million, the result would be the same. This is a consequence of the most famous property of the harmonic numbers: Although the individual terms \(1/k\) dwindle away to zero as \(k\) goes to infinity, the sum of all the terms,

\[\sum_{k = 1}^{\infty}\frac{1}{k} = \frac{1}{1} + \frac{1}{2} + \frac{1}{3} + \cdots,\]

does not converge to a finite value. As long as you keep adding terms, the sum will keep growing, though ever more slowly. This curious fact was proved in the 14th century by the French bishop and scholar Nicole Oresme. The proof is simple but ingenious. Oresme pointed out that the harmonic sequence

\[\frac{1}{1} + \frac{1}{2} + \left(\frac{1}{3} + \frac{1}{4}\right) + \left(\frac{1}{5} + \frac{1}{6} + \frac{1}{7} + \frac{1}{8}\right) + \cdots\]

is greater than

\[\frac{1}{1} + \frac{1}{2} + \left(\frac{1}{4} + \frac{1}{4}\right) + \left(\frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8}\right) + \cdots\]

The latter series is equivalent to \(1 + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} \cdots\), and so it is clearly divergent. Since the grouped terms of the harmonic series are even greater, they too must exceed any finite bound.

The divergence of the harmonic series implies that disks whose areas are generated by the series will eventually overflow any enclosing container. Dropping a finite prefix of the sequence, such as the first 50 disks, does not change this fact.

Let me note in passing that just as the filling fraction for fixed-size disks seems to converge to a specific constant, 0.5390, disks in harmonic series also seem to have a favored filling fraction, roughly 0.71. Can this be explained by some simple geometric argument? Again, I wish I knew.

Evidently we need to make the disks shrink faster than the harmonic numbers do. Here’s an idea: Square each element of the harmonic series, yielding this:

\[\sum_{k = 1}^{\infty}\frac{1}{k^2} = \frac{1}{1^2} + \frac{1}{2^2} + \frac{1}{3^2} + \cdots.\]

Click below (or press the Start button) to see how this one turns out, again in a square of area 4.

Sorry, the program will not run in this browser.

At last we have a process that won’t get stuck in a situation where there’s no place to put another disk. *could* run forever, but of course it doesn’t. It quits when the area of the next disk shrinks down to about a tenth of the size of a single pixel on a computer display. The stopped state is labeled “Exhausted” rather than “Jammed.”*fulfilling*. The disks are scattered sparsely in the square, leaving vast open spaces unoccupied. The configuration reminds me of deep-sky images made by large telescopes.

Why does this outcome look so different from the others? Unlike the harmonic numbers, the infinite series \(1 + \frac{1}{4} + \frac{1}{9} + \frac{1}{16} + \cdots\) converges to a finite sum. In the 18th century the task of establishing this fact (and determining the exact sum) was known as the Basel Problem, after the hometown of the Bernoulli family, who put much effort into the problem but never solved it. The answer came in 1735 from Leonhard Euler (another native of Basel, though he was working in St. Petersburg), who showed that the sum is equal to \(\pi^2 / 6\). This works out to about \(1.645\); since the area of the square we want to fill is \(4\), even an infinite series of disks would cover only about \(41\) percent of the territory.

Given that the numbers \(\frac{1}{1^1}, \frac{1}{2^1}, \frac{1}{3^1}, \dots\) diminish too slowly, whereas \(\frac{1}{1^2}, \frac{1}{2^2}, \frac{1}{3^2}, \dots\) shrink too fast, it makes sense to try an exponent somewhere between \(1\) and \(2\) in the hope of finding a Goldilocks solution. The computation performed below in Program 4 is meant to facilitate the search for such a happy medium. Here the disk sizes are elements of the sequence \(\frac{1}{1^s}, \frac{1}{2^s}, \frac{1}{3^s}, \dots\), where the value of the exponent \(s\) is determined by the setting of the slider, with a range of \(1 \lt s \le 2\). We already know what happens at the extremes of this range. What is the behavior in the middle?

Sorry, the program will not run in this browser.

If you try the default setting of \(s = 1.5\), you’ll find you are still in the regime where the disks dwindle away so quickly that the box never fills up; if you’re willing to wait long enough, the program will end in an exhausted state rather than a jammed one. Reducing the exponent to \(s = 1.25\) puts you on the other side of the balance point, where the disks remain too large and at some point one of them will not fit into any available space. By continuing to shuttle the slider back and forth, you could carry out a binary search, closing in, step by step, on the “just right” value of \(s\). This strategy can succeed, but it’s not quick. As you get closer to the critical value, the program will run longer and longer before halting. (After all, running forever is the behavior we’re seeking.) To save you some tedium, I offer a spoiler: the optimum setting is between 1.29 and 1.30.

At this point we have wandered into deeper mathematical waters. A rule of the form \(A_k = 1/k^s\) is called a power law, since each \(k\) is raised to the same power. And series of the form \(\sum 1/k^s\) are known as zeta functions, denoted \(\zeta(s)\). Zeta functions have quite a storied place in mathematics. The harmonic numbers correspond to \(\zeta(1) = \sum 1/k^1\), which does not converge.

Today, Riemann’s version of the zeta function is the engine (or enigma!) driving a major mathematical industry. Shier’s use of this apparatus in making fractal art is far removed from that heavy-duty research enterprise—but no less fascinating. Think of it as the zeta function on vacation.

If a collection of disks are to fill a square exactly, their aggregate area must equal the area of the square. This is a necessary condition though not a sufficient one. In all the examples I’ve presented so far, the containing square has an area of 4, so what’s needed is to find a value of \(s\) that satisfies the equation:

\[\zeta(s) = \sum_{k = 1}^{\infty}\frac{1}{k^s} = 4\]

Except for isolated values of \(s\),

Having this result in hand solves one part of the square-filling problem. It tells us how to construct an infinite set of disks whose total area is just enough to cover a square of area \(4\), with adequate precision for graphical purposes. We assign each disk \(k\) (starting at \(k = 1\)) an area of \(1/k^{1.2939615}.\) This sequence begins 1.000, 0.408, 0.241, 0.166, 0.125, 0.098,…

In the graph above, the maroon curve with \(s = 1.29396\) converges to a sum very close to 4. Admittedly, the rate of convergence is not quick. More than 3 million terms are needed to get within 1 percent of the target.

Our off-label use of the zeta function defines an infinite sequence of disks whose aggregate area is equal to \(4\). The disks in this unique collection will exactly fill our square box (assuming they can be properly arranged). It’s satisfying to have a way of reliably achieving this result, after our various earlier failures. On the other hand, there’s something irksome about that number \(4\) appearing in the equation. It’s so arbitrary! I don’t dispute that \(4\) is a perfectly fine and foursquare number, but there are many other sizes of squares we might want to fill with dots. Why give all our attention to the \(2 \times 2\) variety?

This is all my fault. When I set out to write some square-filling programs, I knew I couldn’t use the unit square—which seems like the obvious default choice—because of the awkward fact that \(\zeta(s) = 1\) has no finite solution. The unit square is also troublesome in the case of the harmonic numbers; the first disk, with area \(A_1 = 1\), is too large to fit. So I picked the next squared integer for the box size in those first programs. Having made my choice, I stuck with it, but now I feel hemmed in by that decision made with too little forethought.

We have all the tools we need to fill squares of other sizes (as long as the size is greater than \(1\)). Given a square of area \(A_{\square}\), we just solve for \(s\) in \(\zeta(s) = A_{\square}\). A square of area 8 can be covered by disks sized according to the rule \(A_k = 1/k^s\) with \(s = \zeta(8) \approx 1.1349\). For \(A_{\square} = 100\), the corresponding value of \(s\) is \(\zeta(100) \approx 1.0101\). For any \(A_{\square} \gt 1\) there is an \(s\) that yields a fulfilling set of disks, and vice versa for any \(s \gt 1\).

This relation between the exponent \(s\) and the box area \(A_{\square}\) suggests a neat way to evade the whole bother of choosing a specific container size. We can just scale the disks to fit the box, or else scale the box to accommodate the disks. Shier adopts the former method. Each disk in the infinite set is assigned an area of

\[A_k = \frac{A_{\square}}{\zeta(s)} \frac{1}{k^s},\]

where the first factor is a scaling constant that adjusts the disk sizes to fit the container. In my first experiments with these programs I followed the same approach. Later, however, when I began writing this essay, it seemed easier to think about the scaling—and explain it—if I transformed the size of the box rather than the sizes of the disks. In this scheme, the area of disk \(k\) is simply \(1 / k^s\), and the area of the container is \(A_{\square} = \zeta(s)\). (The two scaling procedures are mathematically equivalent; it’s only the ratio of disk size to container size that matters.)

Program 5 offers an opportunity to play with such scaled zeta functions.

Sorry, the program will not run in this browser.

At the other end of the scale, if you push the value of \(s\) up beyond about \(1.40\), you’ll discover something else: The program more often than not halts after placing just a few disks. At \(s = 1.50\) or higher, it seldom gets beyond the first disk. This failure is similar to what we saw with the harmonic numbers, but more interesting. In the case of the harmonic numbers, the total area of the disks is unbounded, making an overflow inevitable. With this new scaled version of the zeta function, the total area of the disks is always equal to that of the enclosing square. In principle, all the disks could all be made to fit, if you could find the right arrangement. I’ll return below to the question of why that doesn’t happen.

In *Fractalize That!* Shier introduces another device for taming space-filling sets. He not only scales the object sizes so that their total area matches the space available; he also adopts a variant zeta function that has two adjustable parameters rather than just one:

This is the Hurwitz zeta function, named for the German mathematician Adolf Hurwitz (1859–1919). Before looking into the details of the function, let’s play with the program and see what happens. Try a few settings of the \(s\) and \(a\) controls:

Sorry, the program will not run in this browser.

Different combinations of \(s\) and \(a\) produce populations of disks with different size distributions. The separate contributions of the two parameters are not always easy to disentangle, but in general decreasing \(s\) or increasing \(a\) leads to a pattern dominated by smaller disks. Here are snapshots of four outcomes:

Within the parameter range shown in these four panels, the filling process always continues to exhaustion, but at higher values of \(s\) it can jam, just as it does with the scaled Riemann zeta function.

Hurwitz wrote just one paper on the zeta function. It was published in 1882, when he was still quite young and just beginning his first academic appointment, at the University of Göttingen. (The paper is available from the Göttinger Digitalisierungszentrum; see pp. 86–101.)

Hurwitz modified the Riemann zeta function in two ways. First, the constant \(a\) is added to each term, turning \(1/k^s\) into \(1/(a + k)^s\). Second, the summation begins with \(k = 0\) rather than \(k = 1\). By letting \(a\) take on any value in the range \(0 \lt a \le 1\) we gain access to a continuum of zeta functions. The elements of the series are no longer just reciprocals of integers but reciprocals of real numbers. Suppose \(a = \frac{1}{3}\). Then \(\zeta(s, a)\) becomes:

\[\frac{1}{\left(\frac{1}{3} + 0\right)^s} + \frac{1}{\left(\frac{1}{3} + 1\right)^s} + \frac{1}{\left(\frac{1}{3} + 2\right)^s} + \cdots\ = \left(\frac{3}{1}\right)^s + \left(\frac{3}{4}\right)^s + \left(\frac{3}{7}\right)^s + \cdots\]

The Riemann zeta function and the Hurwitz zeta function differ substantially only for small values of \(k\) or large values of \(a\). When \(k\) is large, adding a small \(a\) to it makes little difference in the value of the function. Thus as \(k\) grows toward infinity, the two functions are asymptotically equal, as suggested in the graph at right. When the Hurwitz function is put to work packing disks into a square, a rule with \(a > 1\) causes the first several disks to be smaller than they would be with the Riemann rule. A value of \(a\) between \(0\) and \(1\) enlarges the early disks. In either case, the later disks in the sequence are hardly affected at all.

If \(a\) is a positive integer, the interpretation of \(\zeta(s, a)\) is even simpler. The case \(a = 1\) corresponds to the Riemann zeta sum. When \(a\) is a larger integer, the effect is to omit the first \(a - 1\) entries, leaving only the tail of the series. For example,

\[\zeta(s, 5) = \frac{1}{5^s} + \frac{1}{6^s} + \frac{1}{7^s} + \cdots.\]

In his fractal artworks, Shier chooses various values of \(a\) as a way of controlling the size distribution of the placed objects, and thereby fine-tuning the appearance of the patterns. Having this adjustment knob available is very convenient, but in the interests of simplicity, I am going to revert to the Riemann function in the rest of this essay.

Before going on, however, I also have to confess that I don’t really understand the place of the Hurwitz zeta function in modern mathematical research, or what Hurwitz himself had in mind when he formulated it. Zeta functions have been an indispensable tool in the long struggle to understand how the prime numbers are sprinkled among the integers. The connection between these two realms was made by Euler, with his remarkable equation linking a sum of powers of integers with a product of powers of primes:

*my* motor.

Riemann went further, showing that everything we might want to know about the distribution of primes is encoded in the undulations of the zeta function over the complex plane. Indeed, if we could simply pin down all the complex values of \(s\) for which \(\zeta(s) = 0\), we would have a master key to the primes. Hurwitz, in his 1882 paper, was clearly hoping to make some progress toward this goal, but I have not been able to figure out how his work fits into the larger story. The Hurwitz zeta function gets almost no attention in standard histories and reference works (in contrast to the Riemann version, which is everywhere). Wikipedia notes: “At rational arguments the Hurwitz zeta function may be expressed as a linear combination of Dirichlet *L*-functions and vice versa”—which sounds interesting, but I don’t know if it’s useful or important. A recent article by Nicola Oswald and Jörn Steuding puts Hurwitz’s work in historical context, but it does not answer these questions—at least not in a way I’m able to understand.

But again I digress. Back to dots in boxes.

If a set of circular disks and a square container have the same total area, can you always arrange the disks so that they completely fill the square without overflowing? Certainly not! Suppose the set consists of a single disk with area equal to that of the square; the disk’s diameter is greater than the side length of the square, so it will bulge through the sides while leaving the corners unfilled. A set of two disks won’t work either, no matter how you apportion the area between them. Indeed, when you are putting round pegs in a square hole, no finite set of disks can ever fill all the crevices.

Only an infinite set—a set with no smallest disk—can possibly fill the square completely. But even with an endless supply of ever-smaller disks, it seems like quite a delicate task to find just the right arrangement, so that every gap is filled and every disk has a place to call home. It’s all the more remarkable, then, that simply plunking down the disks at random locations seems to produce exactly the desired result. This behavior is what intrigued and troubled me when I first saw Shier’s pictures and read about his method for generating them. If a *random* arrangement works, it’s only a small step to the proposition that *any* arrangement works. Could that possibly be true?

Computational experiments offer strong hints on this point, but they can never be conclusive. What we need is a proof. *Math Horizons*, a publication of the Mathematical Association of America, which keeps it behind a paywall. If you have no library access and won’t pay the $50 ransom, I can recommend a video of Ennis explaining his proof in a talk at St. Olaf College.

As a warm-up exercise, Ennis proves a one-dimensional version of the area-filling conjecture, where the geometry is simpler and some of the constraints are easier to satisfy. In one dimension a disk is merely a line segment; its area is its length, and its radius is half that length. As in the two-dimensional model, disks are placed in descending order of size at random positions, with the usual proviso that no disk can overlap another disk or extend beyond the end points of the containing interval. In Program 7 you can play with this scheme.

Sorry, the program will not run in this browser.

I have given the line segment some vertical thickness to make it visible. The resulting pattern of stripes may look like a supermarket barcode or an atomic spectrum, but please imagine it as one-dimensional.

If you adjust the slider in this program, you’ll notice a difference from the two-dimensional system. In 2D, the algorithm is fulfilling only if the exponent \(s\) is less than a critical value, somewhere in the neighborhood of 1.4. In one dimension, the process continues without impediment for all values of \(s\) throughout the range \(1 \lt s \lt 2\). Try as you might, you won’t find a setting that produces a jammed state. (In practice, the program halts after placing no more than 10,000 disks, but the reason is exhaustion rather than jamming.)

Ennis titles his *Math Horizons* article “(Always) room for one more.” He proves this assertion by keeping track of the set of points where the center of a new disk can legally be placed, and showing the set is never empty. Suppose \(n - 1\) disks have already been randomly scattered in the container. The next disk to be placed, disk \(n\), will have an area (or length) of \(A_n = 1 / n^s\). Since the geometry is one-dimensional, the corresponding disk radius is simply \(r_n = A_n / 2\). The center of this new disk cannot lie any closer than \(r_n\) to the perimeter of another disk. It must also be at a distance of at least \(r_n\) from the boundary of the containing segment. We can visualize these constraints by adding bumpers, or buffers, of thickness \(r_n\) to the outside of each existing disk and to the inner edges of the containing segment. A few stages of the process are illustrated below.

Placed disks are blue, the excluded buffer areas are orange, and open areas—the set of all points where the center of the next disk could be placed—are black. In the top line, before any disks have been placed, the entire containing segment is open except for the two buffers at the ends. Each of these buffers has a length equal to \(r_1\), the radius of the first disk to be placed; the center of that disk cannot lie in the orange regions because the disk would then overhang the end of the containing segment. After the first disk has been placed *(second line)*, the extent of the open area is reduced by the area of the disk itself and its appended buffers. On the other hand, all of the buffers have also shrunk; each buffer is now equal to the radius of disk \(2\), which is smaller than disk \(1\). The pattern continues as subsequent disks are added. Note that although the blue disks cannot overlap, the orange buffers can.

For another view of how this process evolves, click on the *Next* button in Program 8. Each click inserts one more disk into the array and adjusts the buffer and open areas accordingly.

Sorry, the program will not run in this browser.

Because the blue disks are never allowed to overlap, the total blue area must increase monotonically as disks are added. It follows that the orange and black areas, taken together, must steadily decrease. But there’s nothing steady about the process when you keep an eye on the separate area measures for the orange and black regions. Changes in the amount of buffer overlap cause erratic, seesawing tradeoffs between the two subtotals. If you keep clicking the *Next* button (especially with \(s\) set to a high value), you may see the black area falling below \(1\) percent. Can we be sure it will never vanish entirely, leaving no opening at all for the next disk?

Ennis answers this question through worst-case analysis. He considers only configurations in which no buffers overlap, thereby squeezing the black area to its smallest possible extent. If the black area is always positive under these conditions, it cannot be smaller when buffer overlaps are allowed.

The basic idea of the proof

\[A_{\square} = \zeta(s), \quad A_{\color{blue}{\mathrm{blue}}} = \sum_{k=1}^{k = n - 1} \frac{1}{k^s}, \quad A_{\color{orange}{\mathrm{orange}}} = 2(n-1)r_{n}.\]

Then we need to prove that

\[A_{\square} - (A_{\color{blue}{\mathrm{blue}}} + A_{\color{orange}{\mathrm{orange}}}) \gt 0.\]

A direct proof of this statement would require an exact, closed-form expression for \(\zeta(s)\), which we already know is problematic. Ennis evades this difficulty by turning to calculus. He needs to evaluate the remaining tail of the zeta series, \(\sum_{k = n}^\infty 1/k^s\), but this discrete sum is intractable. On the other hand, by shifting from a sum to an integral, the problem becomes an exercise in undergraduate calculus. Exchanging the discrete variable \(k\) for a continuous variable \(x\), we want to find the area under the curve \(1/x^s\) in the interval from \(n\) to infinity; this will provide a lower bound on the corresponding discrete sum. Evaluating the integral yields:

\[\int_{x = n}^{\infty} \frac{1}{x^{s}} d x = \frac{1}{(s-1) n^{s-1}}.\]

Some further manipulation reveals that the area of the black regions is never smaller than

\[\frac{2 - s}{(s - 1)n^{s - 1}}.\]

If \(s\) lies strictly between \(1\) and \(2\), this expression must be greater than zero, since both the numerator and the denominator will be positive. Thus for all \(n\) there is at least one black point where the center of a new disk can be placed.

Ennis’s proof is a stronger one than I expected. When I first learned there was a proof, I guessed that it would take a probabilistic approach, showing that although a jammed configuration may exist, it has probability zero of turning up in a random placement of the disks. Instead, Ennis shows that no such arrangement exists at all. Even if you replaced the randomized algorithm with an adversarial one that tries its best to block every disk, the process would still run to fulfillment.

The proof for a two-dimensional system follows the same basic line of argument, but it gets more complicated for geometric reasons. In one dimension, as the successive disk areas get smaller, the disk radii diminish in simple proportion: \(r_k = A_k / 2\). In two dimensions, disk radius falls off only as the square root of the disk area: \(r_k = \sqrt{A_k / \pi}\). As a result, the buffer zone surrounding a disk excludes neighbors at a greater distance in two dimensions than it would in one dimension. There is still a range of \(s\) values where the process is provably unstoppable, but it does not extend across the full interval from \(s \gt 1\) to \(s \lt 2\).

Program 9, running in the panel below, is one I find very helpful in gaining intuition into the behavior of Shier’s algorithm. As in the one-dimensional model of Program 8, each press of the *Next* button adds a single disk to the containing square, and shows the forbidden buffer zones surrounding the disks.

Sorry, the program will not run in this browser.

Move the \(s\) slider to a position somewhere near 1.40. *Next*. Shier describes this phenomenon as “infant mortality”: If the placement process survives the high-risk early period, it is all but immortal.

There’s a certain whack-a-mole dynamic to the behavior of this system. Maybe the first disk covers all but one small corner of the black zone. It looks like the next disk will completely obliterate that open area. And so it does—but at the same time the shrinking of the orange buffer rings opens up another wedge of black elsewhere. The third disk blots out that spot, but again the narrowing of the buffers allows a black patch to peek out from still another corner. Later on, when there are dozens of disks, there are also dozens of tiny black spots where there’s room for another disk. You can often guess which of the openings will be filled next, because the random search process is likely to land in the largest of them. Again, however, as these biggest targets are buried, many smaller ones are born.

Ennis’s two-dimensional proof addresses the case of circular disks inside a circular boundary, rather than a square one. (The higher symmetry and the absence of corners streamlines certain calculations.) The proof strategy, again, is to show that after \(n - 1\) disks have been placed, there is still room for the \(n\)th disk, for any value of \(n \ge 1\). The argument follows the same logic as in one dimension, relying on an integral to provide a lower bound for the sum of a zeta series. But because of the \(\pi r^2\) area relation, the calculation now includes quadratic as well as linear terms. As a result, the proof covers only a part of the range of \(s\) values. The black area is provably nonempty if \(s\) is greater than \(1\) but less than roughly \(1.1\); outside that interval, the proof has nothing to say.

As mentioned above, Ennis’s proof applies only to circular disks in a circular enclosure. Nevertheless, in what follows I am going to assume the same ideas carry over to disks in a square frame, although the location of the boundary will doubtless be somewhat different. I have recently learned that Ennis has written a further paper on the subject, expected to be published in the *American Mathematical Monthly*. Perhaps he addresses this question there.

With Program 9, we can explore the entire spectrum of behavior for packing disks into a square. The possibilities are summarized in the candybar graph below.

- The leftmost band, in darker green, is the interval for which Ennis’s proof might hold. The question mark at the upper boundary line signifies that we don’t really know where it lies.
- In the lighter green region no proof is known, but in Shier’s extensive experiments the system never jams there.
- The transition zone sees the probability of jamming rise from \(0\) to \(1\) as \(s\) goes from about \(1.3\) to about \(1.5\).
- Beyond \(s \approx 1.5\), experiments suggest that the system
*always*halts in a jammed configuration. - At \(s \approx 1.6\) we enter a regime where the buffer zone surrounding the first disk invariably blocks the entire black region, leaving nowhere to place a second disk. Thus we have a simple proof that the system always jams.
- Still another barrier arises at \(s \approx 2.7\). Beyond this point, not even one disk will fit. The diameter of a disk with area \(1\) is greater than the side length of the enclosing square.

Can we pin down the exact locations of the various threshold points in the diagram above? This problem is tractable in those situations where the placement of the very first disk determines the outcome. At high values of \(s\) (and thus low values of \(\zeta(s)\), the first disk can obliterate the black zone and thereby preclude placement of a second disk. What is the lowest value of \(s\) for which this can happen? As in the image at right, the disk must lie at the center of the square box, and the orange buffer zone surrounding it must extend just far enough out to cover the corners of the inner black square, which defines the locus of all points that could accommodate the center of the second disk. Finding the value of \(s\) that satisfies this condition is a messy but straightforward bit of geometry and algebra. With the help of SageMath I get the answer \(s = 1.282915\). This value—let’s call it \(\overline{s}\)—is an upper bound on the “never jammed” region. Above this limit there is always a nonzero probability that the filling process will end after placing a single disk.

The value of \(\overline{s}\) lies quite close to the experimentally observed boundary between the never-jammed range and the transition zone, where jamming first appears. Is it possible that \(\overline{s}\) actually marks the edge of the transition zone—that below this value of \(s\) the program can never fail? To prove that conjecture, you would have to show that when the first disk is successfully placed, the process never stalls on a subsequent disk. That’s certainly not true in higher ranges of \(s\). Yet the empirical evidence near the threshold is suggestive. In my experiments I have yet to see a jammed outcome at \(s \lt \overline{s}\), not even in a million trials just below the threshold, at \(s = 0.999 \overline{s}\). In contrast, at \(s = 1.001 \overline{s}\), a million trials produced 53 jammed results—all of them occuring immediately after the first disk was placed.

The same kind of analysis leads to a lower bound on the region where *every* run ends after the first disk *(medium pink in the diagram above)*. In this case the critical situation puts the first disk as close as possible to a corner of the square frame, rather than in the middle. If the disk and its orange penumbra are large enough to block the second disk in this extreme configuration, then they will also block it in any other position. Putting a number on this bound again requires some fiddly equation wrangling; the answer I get is \(\underline{s} = 1.593782\). No process with higher \(s\) can possibly live forever, since it will die with the second disk. In analogy with the lower-bound conjecture, one might propose that the probability of being jammed remains below \(1\) until \(s\) reaches \(\underline{s}\). If both conjectures were true, the transition region would extend from \(\overline{s}\) to \(\underline{s}\).

The final landmark, way out at \(s \approx 2.7\), marks the point where the first disk threatens to burst the bounds of the enclosing square. In this case the game is over before it begins. In program 9, if you push the slider far to the right, you’ll find that the black square in the middle of the orange field shrinks away and eventually winks out of existence. This extinction event comes when the diameter of the disk equals the side length of the square. Given a disk of area \(1\), and thus radius \(1/\sqrt{\pi}\), we want to find the value of \(s\) that satisfies the equation

\[\frac{2}{\sqrt{\pi}} = \sqrt{\zeta(s)}.\]

Experiments with Program 9 show that the value is just a tad more than 2.7. That’s an interesting numerical neighborhood, no? A famous number lives nearby. Do you suppose?

Another intriguing set of questions concerns the phenomenon that Shier calls infant mortality. If you scroll back up to Program 5 and set the slider to \(s = 1.45\), you’ll find that roughly half the trials jam. The vast majority of these failures come early in the process, after no more than a dozen disks have been placed. At \(s = 1.50\) death at an early age is even more common; three-fourths of all the trials end with the very first disk. On the other hand, if a sequence of disks does manage to dodge all the hazards of early childhood, it may well live on for a very long time—perhaps forever.

Should we be surprised by this behavior? I am. As Shier points out, the patterns formed by our graduated disks are fractals, and one of their characteristic properties is self-similarity, or scale invariance. If you had a fully populated square—one filled with infinitely many disks—you could zoom in on any region to any magnification, and the arrangement of disks would look the same as it does in the full-size square. By “look the same” I don’t mean the disks would be in the same positions, but they would have the same size distribution and the same average number of neighbors at the same distances. This is a statistical concept of identity. And since the pattern looks the same and has the same statistics, you would think that the challenge of finding a place for a new disk would also be the same at any scale. Slipping in a tiny disk late in the filling operation would be no different from plopping down a large disk early on. The probability of jamming ought to be constant from start to finish.

But there’s a rejoinder to this argument: Scale invariance is broken by the presence of the enclosing square. The largest disks are strongly constrained by the boundaries, whereas most of the smaller disks are nowhere near the edges and are little influenced by them. The experimental data offer some support for this view. The graph below summarizes the outcomes of \(20{,}000\) trials at \(s = 1.50\). The red bars show the absolute numbers of trials ending after placing \(n\) disks, for each \(n\) from \(0\) through \(35\). The blue lollipops indicate the proportion of trials reaching disk \(n\) that halted after placing disk \(n\). This ratio can be interpreted (if you’re a frequentist!) as the probability of stopping at \(n\).

It certainly looks like there’s something odd happening on the left side of this graph. More than three fourths of the trials end after a single disk, but none at all jam at the second or third disks, and very few (a total of \(23\)) at disks \(4\) and \(5\). Then, suddenly, \(1{,}400\) more fall by the wayside at disk \(6\), and serious attrition continues through disk \(11\).

Geometry can explain some of this weirdness. It has to do with the squareness of the container; other shapes would produce different results.

At \(s = 1.50\) we are between \(\overline{s}\) and \(\underline{s}\), in a regime where the first disk is large enough to block off the entire black zone but not so large that it *must* do so. This is enough to explain the tall red bar at \(n = 1\): When you place the first disk randomly, roughly \(75\) percent of the time it will block the entire black region, ending the parade of disks. If the first disk *doesn’t* foreclose all further action, it must be tucked into one of the four corners of the square, leaving enough room for a second disk in the diagonally opposite corner. The sequence of images below (made with Program 9) tells the rest of the story.

The placement of the second disk blocks off the open area in that corner, but the narrowing of the orange buffers also creates two tiny openings in the cross-diagonal corners. The third and fourth disks occupy these positions, and simultaneously allow the black background to peek through in two other spots. Finally the fifth and sixth disks close off the last black pixels, and the system jams.

This stereotyped sequence of disk placements accounts for the near absence of mortality at ages \(n = 2\) through \(n = 5\), and the sudden upsurge at age \(6\). The elevated levels at \(n = 7\) through \(11\) are part of the same pattern; depending on the exact positioning of the disks, it may take a few more to expunge the last remnants of black background.

At still higher values of \(n\)—for the small subset of trials that get there—the system seems to shift to a different mode of behavior. Although numerical noise makes it hard to draw firm conclusions, it doesn’t appear that any of the \(n\) values beyond \(n = 12\) are more likely jamming points than others. Indeed, the data are consistent with the idea that the probability of jamming remains constant as each additional disk is added to the array, just as scale invariance would suggest.

A much larger data set would be needed to test this conjecture, and collecting such data is painfully slow. Furthermore, when it comes to rare events, I don’t have much trust in the empirical data. During one series of experiments, I noticed a program run that stalled after \(290\) disks—unusually late. The 290-disk configuration, produced at \(s = 1.47\), is shown at left below.

I wondered if it was *truly* jammed. My program gives up on finding a place for a disk after \(10^7\) random attempts. Perhaps if I had simply persisted, it would have gone on. So I reset the limit on random attempts to \(10^9\), and sat back to wait. After some minutes the program discovered a place where disk \(291\) would fit, and then another for disk \(292\), and kept going as far as 300 disks. The program had an afterlife! Could I revive it again? Upping the limit to \(10^{10}\) allowed another \(14\) disks to squueze in. The final configuration is shown at right above (with the original \(290\) disks faded, in order to make the \(24\) posthumous additions more conspicuous).

Is it really finished now, or is there still room for one more? I have no reliable way to answer that question. Checking \(10\) billion random locations sounds like a lot, but it is still a very sparse sampling of the space inside the square box. Using 64-bit floating-point numbers to define the coordinate system allows for more than \(10^{30}\) distinguishable points. And to settle the question mathematically, we would need unbounded precision.

We know from Ennis’s proof that at values of \(s\) not too far above \(1.0\), the filling process can always go on forever. And we know that beyond \(s \approx 1.6\), every attempt to fill the square is doomed. There must be some kind of transition between these two conditions, but the details are murky. The experimental evidence gathered so far suggests a smooth transition along a sigmoid curve, with the probability of jamming gradually increasing from \(0\) to \(1\). As far as I can tell, however, nothing we know for certain rules out a single hard threshold, below which all disk sequences are immortal and above which all of them die. Thus the phase diagram would be reduced to this simple form:

The softer transition observed in computational experiments would be an artifact of our inability to perform infinite random searches or place infinite sequences of disks.

Here’s a different approach to understanding the random dots-in-a-box phenomenon. It calls for a mental reversal of figure and ground. Instead of placing disks on a square surface, we drill holes in a square metal plate. And the focus of attention is not the array of disks or holes but rather the spaces between them. Shier has a name for the perforated plate: the gasket.

Program 10 allows you to observe a gasket as it evolves from a solid black square to a delicate lace doily with less than 1 percent of its original substance.

Sorry, the program will not run in this browser.

The gasket is quite a remarkable object. When the number of holes becomes infinite, the gasket must disappear entirely; its area falls to zero. Up until that very moment, however, it retains its structural integrity.

As the gasket is etched away, can we measure the average thickness of the surviving wisps and tendrils? I can think of several methods that involve elaborate sampling schemes. Shier has a much simpler and more ingenious proposal: To find the average thickness of the gasket, divide its area by its perimeter. It was not immediately obvious to me why this number would serve as an appropriate measure of the width, but at least the units come out right: We are dividing a length squared by a length and so we get a length. And the operation does make basic sense: The area of the gasket represents the amount of substance in it, and the perimeter is the distance over which it is stretched. (The widths calculated in Program 10 differ slightly from those reported by Shier. The reason, I think, is that I include the outer boundary of the square in the perimeter, and he does not.)

Calculating the area and perimeter of a complicated shape such as a many-holed gasket looks like a formidable task, but it’s easy if we just keep track of these quantities as we go along. Initially (before any holes are drilled), the gasket area \(A_0^g\) is the area of the full square, \(A_\square\). The initial gasket perimeter \(P_0^g\) is four times the side length of the square, which is \(\sqrt{A_\square}\). Thereafter, as each hole is drilled, we subtract the new hole’s area from \(A^g\) and add its perimeter to \(P^g\). The quotient of these quantities is our measure of the average gasket width after drilling hole \(k\): \(\widehat{W}_k^g\). Since the gasket area is shrinking while the perimeter is growing, \(\widehat{W}_k^g\) must dwindle away as \(k\) increases.

The importance of \(\widehat{W}_k^g\) is that it provides a clue to how large a vacant space we’re likely to find for the next disk or hole. If we take the idea of “average” seriously, there must always be at least one spot in the gasket with a width equal to or greater than \(\widehat{W}_k^g\). From this observation Shier makes the leap to a whole new space-filling algorithm. Instead of choosing disk diameters according to a power law and then measuring the resulting average gasket width, he determines the radius of the next disk from the observed \(\widehat{W}_k^g\):

\[r_{k+1} = \gamma \widehat{W}_k^g = \gamma \frac{A_k^g}{P_k^g}.\]

Here \(\gamma\) is a fixed constant of proportionality that determines how tightly the new disks or holes fit into the available openings.

The area-perimeter algorithm has a recursive structure, in which each disk’s radius depends on the state produced by the previous disks. This raises the question of how to get started: What is the size of the first disk? Shier has found that it doesn’t matter very much. Initial disks in a fairly wide range of sizes yield jam-proof and aesthetically pleasing results.

Graphics produced by the original power-law algorithm and by the new recursive one look very similar. One way to understand why is to rearrange the equation of the recursion:

On the right side of this equation we are dividing the average gasket width by the diameter of the next disk to be placed. The result is a dimensionless number—dividing a length by a length cancels the units. More important, the quotient is a constant, unchanging for all \(k\). If we calculate this same dimensionless gasket width when using the power-law algorithm, it also turns out to be nearly constant in the limit of karge \(k\), showing that the two methods yield sequences with similar statistics.

Setting aside Shier’s recursive algorithm, all of the patterns we’ve been looking at are generated by a power law (or zeta function), with the crucial requirement that the series must converge to a finite sum. The world of mathematics offers many other convergent series in addition to power laws. Could some of them also create fulfilling patterns? The question is one that Ennis discusses briefly in his talk at St. Olaf and that Shier also mentions.

Among the obvious candidates are geometric series such as \(\frac{1}{1}, \frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \dots\) A geometric series is a close cousin of a power law, defined in a similar way but exchanging the roles of \(s\) and \(k\). That is, a geometric series is the sum:

\[\sum_{k=0}^{\infty} \frac{1}{s^k} = \frac{1}{s^0} + \frac{1}{s^1} + \frac{1}{s^2} + \frac{1}{s^3} + \cdots\]

For any \(s > 1\), the infinite geiometric series has a finite sum, namely \(\frac{s}{s - 1}\). Thus our task is to construct an infinite set of disks with individual areas \(1/s^k\) that we can pack into a square of area \(\frac{s}{s - 1}\). Can we find a range of \(s\) for which the series is fulfilling? As it happens, this is where Shier began his adventures; his first attempts were not with power laws but with geometric series. They didn’t turn out well. You are welcome to try your own hand in Program 11.

Sorry, the program will not run in this browser.

There’s a curious pattern to the failures you’ll see in this program. No matter what value you assign to \(s\) (within the available range \(1 \lt s \le 2\)), the system jams when the number of disks reaches the neighborhood of \(A_\square = \frac{s}{s-1}\). For example, at \(s = 1.01\), \(\frac{s}{s - 1}\) is 101 and the program typically gets stuck somewhere between \(k = 95\) and \(k = 100\). At \(s = 1.001\), \(\frac{s}{s - 1}\) is \(1{,}001\) and there’s seldom progress beyond about \(k = 1,000\).

For a clue to what’s going wrong here, consider the graph at right, plotting the values of \(1 / k^s\) *(red)* and \(1 / s^k\) *(blue)* for \(s = 1.01\). These two series converge on nearly the same sum (roughly \(100\)), but they take very different trajectories in getting there. On this log-log plot, the power-law series \(1 / s^k\) is a straight line. The geometric series \(1 / s^k\) falls off much more slowly at first, but there’s a knee in the curve at about \(k = 100\) *(dashed mauve line)*, where it steepens dramatically. If only we could get beyond this turning point, it looks like the rest of the filling process would be smooth sledding, but in fact we never get there. Whereas the first \(100\) disks of the power-law series fill up only about \(5\) percent of the available area, they occuy 63 percent in the geometric case. This is where the filling process stalls.

Even in one dimension, the geometric series quickly succumbs. (This is in sharp contrast to the one-dimensional power-law model, where any \(s\) between \(1\) and \(2\) yields a provably infinite progression of disks.)

Sorry, the program will not run in this browser.

And just in case you think I’m pulling a fast one here, let me demonstrate that those same one-dimensional disks will indeed fit in the available space, if packed efficiently. In Program 13 they are placed in order of size from left to right.

Sorry, the program will not run in this browser.

I have made casual attempts to find fulfillment with a few other convergent series, such as the reciprocals of the Fibonacci numbers (which converge to about \(3.36\)) and the reciprocals of the factorials (whose sum is \(e \approx 2.718\)). Both series jam after the first disk. There are plenty of other convergent series one might try, but I doubt this is a fruitful line of inquiry.

All the variations discussed above leave one important factor unchanged: The objects being fitted together are all circular. Exploring the wider universe of shapes has been a major theme of Shier’s work. He asks: What properties of a shape make it suitable for forming a statistical fractal pattern? And what shapes (if any) refuse to cooperate with this treatment? (The images in this section were created by John Shier and are reproduced here with his permission.)

Shier’s first experiments were with circular disks and axis-parallel squares; the filling algorithm worked splendidly in both cases. He also succeeded with axis-parallel rectangles of various aspect ratios, even when he mixed vertical and horizontal orientations in the same tableau. In collaboration with Paul Bourke he tried randomizing the orientation of squares as well as their positions. Again the outcome was positive, as the illustration above left shows.

Equilateral triangles were less cooperative, and at first Shier believed the algorithm would consistently fail with this shape. The triangles tended to form orderly arrays with the sharp point of one triangle pressed close against the broad side of another, leaving little “wiggle room.” Further efforts showed that the algorithm was not truly getting stuck but merely slowing down. With an appropriate choice of parameters in the Hurwitz zeta function, and with enough patience, the triangles did come together in boundlessly extendable space-filling patterns.

The casual exploration of diverse shapes eventually became a deliberate quest to stake out the limits of the space-filling process. Surely there must be *some* geometric forms that the algorithm would balk at, failing to pack an infinite number of objects into a finite area. Perhaps nonconvex shapes such as stars and snowflakes and flowers would expose a limitation—but no, the algorithm worked just fine with these figures, fitting smaller stars into the crevices between the points of larger stars. The next obvious test was “hollow” objects, such as annular rings, where an internal void is not part of the object and is therefore available to be filled with smaller copies. The image at right is my favorite example of this phenomenon. The bowls of the larger nines have smaller nines within them. It’s nines all the way down. When we let the process continue indefinitely, we have a whimsical visual proof of the proposition that \(.999\dots = 1\).

These successes with nonconvex forms and objects with holes led to an *Aha* moment, as Shier describes it. The search for a shape that would break the algorithm gave way to a suspicion that no such shape would be found, and then the suspicion gradually evolved into a conviction that any “reasonably compact” object is suitable for the *Fractalize That!* treatment. The phrase “reasonably compact” would presumably exclude shapes that are in fact dispersed sets of points, such as Cantor dust. But Shier has shown that shapes formed of disconnected pieces, such as the words in the pair of images below, present no special difficulty.

*Fractalize That!* is not all geometry and number theory. Shier is eager to explain the mathematics behind these curious patterns, but he also presents the algorithm as a tool for self-expression. MATH and ART both have their place.

Finally, I offer some notes on what’s needed to turn these algorithms into computer programs. Shier’s book includes a chapter for do-it-yourselfers that explains his strategy and provides some crucial snippets of code (written in C). My own source code (in JavaScript) is available on GitHub. And if you’d like to play with the programs without all the surrounding verbiage, try the GitHub Pages version.

The inner loop of a typical program looks something like this:

```
let attempt = 1;
while (attempt <= maxAttempts) {
disk.x = randomCoord();
disk.y = randomCoord();
if (isNotOverlapping(disk)) {
return disk;
}
attempt++;
}
return false;
```

We generate a pair of random \(x\) and \(y\) coordinates, which mark the center point of the new disk, and check for overlaps with other disks already in place. If no overlaps are discovered, the disk stays put and the program moves on. Otherwise the disk is discarded and we jump back to the top of the loop to try a new \(xy\) pair.

The main computational challenge lies in testing for overlaps. For any two specific disks, the test is easy enough: They overlap if the sum of their radii is greater than the distance between their centers. The problem is that the test might have to be repeated many millions of times. My program makes \(10\) million attempts to place a disk before giving up. If it has to test for overlap with \(100{,}000\) other disks on each attempt, that’s a trillion tests. A trillion is too many for an interactive program where someone is staring at the screen waiting for things to happen. To speed things up a little I divide the square into a \(32 \times 32\) grid of smaller squares. The largest disks—those whose diameter is greater than the width of a grid cell—are set aside in a special list, and all new candidate disks are checked for overlap with them. Below this size threshold, each disk is allocated to the grid cell in which its center lies. A new candidate is checked against the disks in its own cell and in that cell’s eight neighbors. The net result is an improvement by two orders of magnitude—lowering the worst-case total from \(10^{12}\) overlap tests to about \(10{10}\).

All of this works smoothly with circular disks. Devising overlap tests for the variety of shapes that Shier has been working with is much harder.

From a theoretical point of view, the whole rigmarole of overlap testing is hideously wasteful and unnecessary. If the box is already 90 percent full, then we know that 90 percent of the random probes will fail. A smarter strategy would be to generate random points only in the “black zone” where new disks can legally be placed. If you could do that, you would never need to generate more than one point per disk, and there’d be no need to check for overlaps. But keeping track of the points that comprise the black zone—scattered throughout multiple, oddly shaped, transient regions—would be a serious exercise in computational geometry.

For the actual drawing of the disks, Shier relies on the technology known as SVG, or scalable vector graphics. As the name suggests, these drawings retain full resolution at any size, and they are definitely the right choice if you want to create works of art. They are less suitable for the interactive programs embedded in this document, mainly because they consume too much memory. The images you see here rely on the HTML *canvas* element, which is simply a fixed-size pixel array.

Another point of possible interest is the evaluation of the zeta function. If we want to scale the disk sizes to match the box size (or vice versa), we need to compute a good approximation of the Riemann function \(\zeta(s)\) or the Hurwitz function \(\zeta(s, a)\). I didn’t know how to do that, and most of the methods I read about seemed overwhelming. Before I could get to zeta, I’d have to hack my way through thickets of polygamma functions and Stieltjes constants. For the Riemann zeta function I found a somewhat simpler algorithm published by Peter Borwein in 1995. It’s based on a polynomial approximation that yields ample precision and runs in less than a millisecond. For the Hurwitz zeta function I stayed with a straightforward translation of Shier’s code, which takes more of a brute-force approach. (There are alternatives for Hurwitz too, but I couldn’t understand them well enough to make them work.)

The JavaScript file in the GitHub repository has more discussion of implementation details.

Shier, John. 2018. *Fractalize That! A Visual Essay on Statistical Geometry*. Singapore: World Scientific. Publisher’s website.

Shier, John. Website: http://www.john-art.com/

Shier, John. 2011. The dimensionless gasket width \(b(c,n)\) in statistical geometry. http://www.john-art.com/gasket_width.pdf

Shier, John. 2012. Random fractal filling of a line segment. http://www.john-art.com/gasket_width.pdf

Dunham, Douglas, and John Shier. 2014. The art of random fractals. In *Proceedings of Bridges 2014: Mathematics, Music, Art, Architecture, Culture* pp. 79–86. PDF.

Shier, John. 2015. A new recursion for space-filling geometric fractals. http://www.john-art.com/gasket_width.pdf

Dunham, Douglas, and John Shier. 2015. An algorithm for creating aesthetic random fractal patterns. Talk delivered at the Joint Mathematics Meetings January 2015, San Antonio, Texas.

Dunham, Douglas, and John Shier. 2018. A property of area and perimeter. In *ICGG 2018: Proceedings of the 18th International Conference on Geometry and Graphics*, Milano, August 2018, pp. 228–237.

Dunham, Douglas, and John Shier. 2017. New kinds of fractal patterns. In *Proceedings of Bridges 2017: Mathematics, Art, Music, Architecture, Education, Culture*,

pp. 111–116. Preprint.

Shier, John, and Paul Bourke. 2013. An algorithm for random fractal filling of space. *Computer Graphics Forum* 32(8):89–97. PDF. Preprint.

Ennis, Christopher. 2016. (Always) room for one more. *Math Horizons* 23(3):8–12. PDF (paywalled).

Dodds, Peter Sheridan, and Joshua S. Weitz. 2002. Packing-limited growth. Physical Review E 65: 056108.

Lagarias, Jeffrey C., Colin L. Mallows, and Allan R. Wilks. 2001. Beyond the Descartes circle theorem. https://arxiv.org/abs/math/0101066. (Also published in *American Mathematical Monthly*, 2002, 109:338–361.)

Mackenzie, Dana. 2010. A tisket, a tasket, an Apollonian gasket. *American Scientist* 98:10–14. https://www.americanscientist.org/article/a-tisket-a-tasket-an-apollonian-gasket.

Manna, S. S. 1992. Space filling tiling by random packing of discs. *Physica A* 187:373–377.

Bailey, David H., and Jonathan M. Borwein. 2015. Crandall’s computation of the incomplete Gamma function and the Hurwitz zeta function, with applications to Dirichlet L-series. *Applied Mathematics and Computation*, 268, 462–477.

Borwein, Peter. 1995. An efficient algorithm for the Riemann zeta function. http://www.cecm.sfu.ca/personal/pborwein/PAPERS/P155.pdf

Coffey, Mark W. 2009. An efficient algorithm for the Hurwitz zeta and related functions. *Journal of Computational and Applied Mathematics* 225:338–346.

Hurwitz, Adolf. 1882. Einige Eigenschaften der Dirichletschen Funktionen \(F(s) = \sum \left(\frac{D}{n} \frac{1}{n^s}\right)\), die bei der Bestimmung der Klassenzahlen binärer quadratischer Formen auftreten. *Zeitschrift für Mathematik und Physik* 27:86–101. https://gdz.sub.uni-goettingen.de/id/PPN599415665_0027.

Oswald, Nicola, and Jörn Steuding. 2015. Aspects of zeta-function theory in the mathematical works of Adolf Hurwitz. https://arxiv.org/abs/1506.00856.

Xu, Andy. 2018. Approximating the Hurwitz zeta function. PDF.

Disclaimer: The investigations of the MAX 8 disasters are in an early stage, so much of what follows is based on secondary sources—in other words, on leaks and rumors and the speculations of people who may or may not know what they’re talking about. As for my own speculations: I’m not an aeronautical engineer, or an airframe mechanic, or a control theorist. I’m not even a pilot. Please keep that in mind if you choose to read on.

Early on the morning of October 29, 2018, Lion Air Flight 610 departed Jakarta, Indonesia, with 189 people on board. The airplane was a four-month-old 737 MAX 8—the latest model in a line of Boeing aircraft that goes back to the 1960s. Takeoff and climb were normal to about 1,600 feet, where the pilots retracted the flaps (wing extensions that increase lift at low speed). At that point the aircraft unexpectedly descended to 900 feet. In radio conversations with air traffic controllers, the pilots reported a “flight control problem” and asked about their altitude and speed as displayed on the controllers’ radar screens. Cockpit instruments were giving inconsistent readings. The pilots then redeployed the flaps and climbed to 5,000 feet, but when the flaps were stowed again, the nose dipped and the plane began to lose altitude. Over the next six or seven minutes the pilots engaged in a tug of war with their own aircraft, as they struggled to keep the nose level but the flight control system repeatedly pushed it down. In the end the machine won. The airplane plunged into the sea at high speed, killing everyone aboard.

The second crash happened March 8, when Ethiopian Airlines Flight 302 went down six minutes after taking off from Addis Ababa, killing 157. The aircraft was another MAX 8, just two months old. The pilots reported control problems, and data from a satellite tracking service showed sharp fluctuations in altitude. The similarities to the Lion Air crash set off alarm bells: If the same malfunction or design flaw caused both accidents, it might also cause more. Within days, the worldwide fleet of 737 MAX aircraft was grounded. Data recovered since then from the Flight 302 wreckage has reinforced the suspicion that the two accidents are closely related.

The grim fate of Lion Air 610 can be traced in brightly colored squiggles extracted from the flight data recorder. (The chart was published in November in a preliminary report from the Indonesian National Committee on Transportation Safety.)

The outline of the story is given in the altitude traces at the bottom of the chart. The initial climb is interrupted by a sharp dip; then a further climb is followed by a long, erratic roller coaster ride. At the end comes the dive, as the aircraft plunges 5,000 feet in a little more than 10 seconds. (Why are there two altitude curves, separated by a few hundred feet? I’ll come back to that question at the end of this long screed.)

All those ups and downs were caused by movements of the horizontal stabilizer, the small winglike control surface at the rear of the fuselage. The stabilizer controls the airplane’s pitch attitude—nose-up vs. nose-down. On the 737 it does so in two ways. A mechanism for pitch *trim* tilts the entire stabilizer, whereas pushing or pulling on the pilot’s control yoke moves the elevator, a hinged tab at the rear of the stabilizer. In either case, moving the trailing edge of the surface upward tends to force the nose of the airplane up, and vice versa. Here we’re mainly concerned with trim changes rather than elevator movements.

Commands to the pitch-trim system and their effect on the airplane are shown in three traces from the flight data, which I reproduce here for convenience:

The line labeled “trim manual” *(light blue)* reflects the pilots’ inputs, “trim automatic” *(orange)* shows commands from the airplane’s electronic systems, and “pitch trim position” *(dark blue)* represents the tilt of the stabilizer, with higher position on the scale denoting a nose-up command. This is where the tug of war between man and machine is clearly evident. In the latter half of the flight, the automatic trim system repeatedly commands nose down, at intervals of roughly 10 seconds. In the breaks between those automated commands, the pilots dial in nose-up trim, using buttons on the control yoke. In response to these conflicting commands, the position of the horizontal stabilizer oscillates with a period of 15 or 20 seconds. The see-sawing motion continues for at least 20 cycles, but toward the end the unrelenting automatic nose-down adjustments prevail over the briefer nose-up commands from the pilots. The stabilizer finally reaches its limiting nose-down deflection and stays there as the airplane plummets into the sea.

What’s to blame for the perverse behavior of the automatic pitch trim system? The accusatory finger is pointing at something called MCAS, a new feature of the 737 MAX series. MCAS stands for Maneuvering Characteristics Augmentation System—an impressively polysyllabic name that tells you nothing about what the thing is or what it does. As I understand it, MCAS is not a piece of hardware; there’s no box labeled MCAS in the airplane’s electronic equipment bays. MCAS consists entirely of software. It’s a program running on a computer.

MCAS has just one function. It is designed to help prevent an aerodynamic stall, a situation in which an airplane has its nose pointed up so high with respect to the surrounding airflow that the wings can’t keep it aloft. A stall is a little like what happens to a bicyclist climbing a hill that keeps getting steeper and steeper: Eventually the rider runs out of oomph, wobbles a bit, and then rolls back to the bottom. Pilots are taught to recover from stalls, but it’s not a skill they routinely practice with a planeful of passengers. In commercial aviation the emphasis is on *avoiding* stalls—forestalling them, so to speak. Airliners have mechanisms to detect an imminent stall and warn the pilot with lights and horns and a “stick shaker” that vibrates the control yoke. On Flight 610, the captain’s stick was shaking almost from start to finish.

Some aircraft go beyond mere warnings when a stall threatens. If the aircraft’s nose continues to pitch upward, an automated system intervenes to push it back down—if necessary overriding the manual control inputs of the pilot. MCAS is designed to do exactly this. It is armed and ready whenever two criteria are met: The flaps are up (generally true except during takeoff and landing) and the airplane is under manual control (not autopilot). Under these conditions the system is triggered whenever an aerodynamic quantity called angle of attack, or AoA, rises into a dangerous range.

Angle of attack is a concept subtle enough to merit a diagram:

The various angles at issue are rotations of the aircraft body around the pitch axis, a line parallel to the wings, perpendicular to the fuselage, and passing through the airplane’s center of gravity. If you’re sitting in an exit row, the pitch axis might run right under your seat. Rotation about the pitch axis tilts the nose up or down. *Pitch attitude* is defined as the angle of the fuselage with respect to a horizontal plane. The *flight-path angle* is measured between the horizontal plane and the aircraft’s velocity vector, thus showing how steeply it is climbing or descending. *Angle of attack* is the difference between pitch attitude and flight-path angle. It is the angle at which the aircraft is moving through the surrounding air (assuming the air itself is motionless, *i.e.*, no wind).

AoA affects both lift (the upward force opposing the downward tug of gravity) and drag (the dissipative force opposing forward motion and the thrust of the engines). As AoA increases from zero, lift is enhanced because of air impinging on the underside of the wings and fuselage. For the same reason, however, drag also increases. As the angle of attack grows even steeper, the flow of air over the wings becomes turbulent; beyond that point lift diminishes but drag continues increasing. That’s where the stall sets in. The critical angle for a stall depends on speed, weight, and other factors, but usually it’s no more than 15 degrees.

Neither the Lion Air nor the Ethiopian flight was ever in danger of stalling, so if MCAS was activated, it must have been by mistake. The working hypothesis mentioned in many press accounts is that the system received and acted upon erroneous input from a failed AoA sensor.

A sensor to measure angle of attack is conceptually simple. It’s essentially a weathervane poking out into the airstream. In the photo below, the angle-of-attack sensor is the small black vane just forward of the “737 MAX” legend. Hinged at the front, the vane rotates to align itself with the local airflow and generates an electrical signal that represents the vane’s angle with respect to the axis of the fuselage. The 737 MAX has two angle-of-attack vanes, one on each side of the nose. (The protruding devices above the AoA vane are pitot tubes, used to measure air speed. Another device below the word MAX is probably a temperature sensor.)

Angle of attack was not among the variables displayed to the pilots of the Lion Air 737, but the flight data recorder did capture signals derived from the two AoA sensors:

There’s something dreadfully wrong here. The left sensor is indicating an angle of attack about 20 degrees steeper than the right sensor. That’s a huge discrepancy. There’s no plausible way those disparate readings could reflect the true state of the airplane’s motion through the air, with the left side of the nose pointing sky-high and the right side near level. One of the measurements must be wrong, and the higher reading is the suspect one. If the true angle of attack ever reached 20 degrees, the airplane would already be in a deep stall. Unfortunately, on Flight 610 MCAS was taking data only from the left-side AoA sensor. It interpreted the nonsensical measurement as a valid indicator of aircraft attitude, and worked relentlessly to correct it, up to the very moment the airplane hit the sea.

The tragedies in Jakarta and Addis Ababa are being framed as a cautionary tale of automation run amok, with computers usurping the authority of pilots. The *Washington Post* editorialized:

A second fatal airplane accident involving a Boeing 737 MAX 8 may have been a case of man vs. machine…. The debacle shows that regulators should apply extra review to systems that take control away from humans when safety is at stake.

Tom Dieusaert, a Belgian journalist who writes often on aviation and computation, offered this opinion:

What can’t be denied is that the Boeing of Flight JT610 had serious computer problems. And in the hi-tech, fly-by-wire world of aircraft manufacturers, where pilots are reduced to button pushers and passive observers, these accidents are prone to happen more in the future.

The button-pushing pilots are particularly irate. Gregory Travis, who is both a pilot and software developer, summed up his feelings in this acerbic comment:

“Raise the nose, HAL.”

“I’m sorry, Dave, I can’t do that.”

Even Donald Trump tweeted on the issue:

Airplanes are becoming far too complex to fly. Pilots are no longer needed, but rather computer scientists from MIT. I see it all the time in many products. Always seeking to go one unnecessary step further, when often old and simpler is far better. Split second decisions are….

….needed, and the complexity creates danger. All of this for great cost yet very little gain. I don’t know about you, but I don’t want Albert Einstein to be my pilot. I want great flying professionals that are allowed to easily and quickly take control of a plane!

There’s considerable irony in the complaint that the 737 is too automated; in many respects the aircraft is in fact quaintly old-fashioned. The basic design goes back more than 50 years, and even in the latest MAX models quite a lot of 1960s technology survives. The primary flight controls are hydraulic, with a spider web of high-pressure tubing running directly from the control yokes in the cockpit to the ailerons, elevator, and rudder. If the hydraulic systems should fail, there’s a purely mechanical backup, with cables and pulleys to operate the various control surfaces. For stabilizer trim the primary actuator is an electric motor, but again there’s a mechanical fallback, with crank wheels near the pilots’ knees pulling on cables that run all the way back to the tail.

Other aircraft are much more dependent on computers and electronics. The 737′s principal competitor, the Airbus A320, is a thoroughgoing fly-by-wire vehicle. The pilot flies the computer, and the computer flies the airplane. Specifically, the pilot decides where to go—up, down, left, right—but the computer decides how to get there, choosing which control surfaces to deflect and by how much. Boeing’s own more recent designs, the 777 and 787, also rely on digital controls. Indeed, the latest models from both companies go a step beyond fly-by-wire to fly-by-network. Most of the communication from sensors to computers and onward to control surfaces consists of digital packets flowing through a variant of Ethernet. The airplane is a computer peripheral.

Thus if you want to gripe about the dangers and indignities of automation on the flight deck, the 737 is not the most obvious place to start. And a Luddite campaign to smash all the avionics and put pilots back in the seat of their pants would be a dangerously off-target response to the current predicament. There’s no question the 737 MAX has a critical problem. It’s a matter of life and death for those who would fly in it and possibly also for the Boeing Company. But the problem didn’t start with MCAS. It started with earlier decisions that made MCAS necessary. Furthermore, the problem may not end with the remedy that Boeing has proposed—a software update that will hobble MCAS and leave more to the discretion of pilots.

The 737 flew its first passengers in 1968. It was (and still is) the smallest member of the Boeing family of jet airliners, and it is also the most popular by far. More than 10,000 have been sold, and Boeing has orders for another 4,600. Of course there have been changes over the years, especially to engines and instruments. A 1980s update came to be known as 737 Classic, and a 1997 model was called 737 NG, for “next generation.” (Now, with the MAX, the NG has become the *previous* generation.) Through all these revisions, however, the basic structure of the airframe has hardly changed.

Ten years ago, it looked like the 737 had finally come to the end of its life. Boeing announced it would develop an all-new design as a replacement, with a hull built of lightweight composite materials rather than aluminum. Competitive pressures forced a change of course. Airbus had a head start on the A320neo, an update that would bring more efficient engines to their entry in the same market segment. The revised Airbus would be ready around 2015, whereas Boeing’s clean-slate project would take a decade. Customers were threatening to defect. In particular, American Airlines—long a Boeing loyalist—was negotiating a large order of A320neos.

In 2011 Boeing scrapped the plan for an all-new design and elected to do the same thing Airbus was doing: bolt new engines onto an old airframe. This would eliminate most of the up-front design work, as well as the need to build tooling and manufacturing facilities. Testing and certification by the FAA would also go quicker, so that the first deliveries might be made in five or six years, not too far behind Airbus.

*(left)* Bryan via Wikimedia, CC BY 2.0; *(right)* Steve Lynes via Wikimedia, CC BY 2.0.

The original 1960s 737 had two cigar-shaped engines, long and skinny, tucked up under the wings *(left photo above)*. Since then, jet engines have grown fat and stubby. They derive much of their thrust not from the jet exhaust coming out of the tailpipe but from “bypass” air moved by a large-diameter fan. Such engines would scrape on the ground if they were mounted under the wings of the 737; instead they are perched on pylons that extend forward from the leading edge of the wing. The engines on the MAX models *(right photo)* are the fattest yet, with a fan 69 inches in diameter. Compared with the NG series, the MAX engines are pushed a few inches farther forward and hang a few inches lower.

A New York Times article by David Gelles, Natalie Kitroeff, Jack Nicas, and Rebecca R. Ruiz describes the plane’s development as hurried and hectic.

Months behind Airbus, Boeing had to play catch-up. The pace of the work on the 737 Max was frenetic, according to current and former employees who spoke with

The New York Times…. Engineers were pushed to submit technical drawings and designs at roughly double the normal pace, former employees said.

The *Times* article also notes: “Although the project had been hectic, current and former employees said they had finished it feeling confident in the safety of the plane.”

Sometime during the development of the MAX series, Boeing got an unpleasant surprise. The new engines were causing unwanted pitch-up movements under certain flight conditions. When I first read about this problem, soon after the Lion Air crash, I found the following explanation is an article by Sean Broderick and Guy Norris in *Aviation Week and Space Technology* (Nov. 26–Dec. 9, 2018, pp. 56–57):

Like all turbofan-powered airliners in which the thrust lines of the engines pass below the center of gravity (CG), any change in thrust on the 737 will result in a change of flight path angle caused by the vertical component of thrust.

In other words, the low-slung engines not only push the airplane forward but also tend to twirl it around the pitch axis. It’s like a motorcycle doing wheelies. Because the MAX engines are mounted farther below and in front of the center of gravity, they act through a longer lever arm and cause more severe pitch-up motions.

I found more detail on this effect in an earlier *Aviation Week* article, a 2017 pilot report by Fred George, describing his first flight at the controls of the new MAX 8.

The aircraft has sufficient natural speed stability through much of its flight envelope. But with as much as 58,000 lb. of thrust available from engines mounted well below the center of gravity, there is pronounced thrust-versus-pitch coupling at low speeds, especially with aft center of gravity (CG) and at light gross weights. Boeing equips the aircraft with a speed-stability augmentation function that helps to compensate for the coupling by automatically trimming the horizontal stabilizer according to indicated speed, thrust lever position and CG. Pilots still must be aware of the effect of thrust changes on pitching moment and make purposeful control-wheel and pitch-trim inputs to counter it.

The reference to an “augmentation function” that works by “automatically trimming the horizontal stabilizer” sounded awfully familiar, but it turns out this is *not* MCAS. The system that compensates for thrust-pitch coupling is known as *speed-trim*. Like MCAS, it works “behind the pilot’s back,” making adjustments to control surfaces that were not directly commanded. There’s yet another system of this kind called *mach-trim* that silently corrects a different pitch anomally when the aircraft reaches transonic speeds, at about mach 0.6. Neither of these systems is new to the MAX series of aircraft; they have been part of the control algorithm at least since the NG came out in 1997. MCAS runs on the same computer as speed-trim and mach-trim and is part of the same software system, but it is a distinct function. And according to what I’ve been reading in the past few weeks, it addresses a different problem—one that seems more sinister.

Most aircraft have the pleasant property of static stability. When an airplane is properly trimmed for level flight, you can let go of the controls—at least briefly—and it will continue on a stable path. Moreover, if you pull back on the control yoke to point the nose up, then let go again, the pitch angle should return to neutral. The layout of the airplane’s various airfoil surfaces accounts for this behavior. When the nose goes up, the tail goes down, pushing the underside of the horizontal stabilizer into the airstream. The pressure of the air against this tail surface provides a restoring force that brings the tail back up and the nose back down. (That’s why it’s called a *stabilizer*!) This negative feedback loop is built in to the structure of the airplane, so that any departure from equilibrium creates a force that opposes the disturbance.

However, the tail surface, with its helpful stablizing influence, is not the only structure that affects the balance of aerodynamic forces. Jet engines are not designed to contribute lift to the airplane, but at high angles of attack they can do so, as the airstream impinges on the lower surface of each engine’s outer covering, or nacelle. When the engines are well forward of the center of gravity, the lift creates a pitch-up turning moment. If this moment exceeds the counterbalancing force from the tail, the aircraft is unstable. A nose-up attitude generates forces that raise the nose still higher, and positive feedback takes over.

Is the 737 MAX vulnerable to such runaway pitch excursions? The possibility had not occurred to me until I read a commentary on MCAS on the Boeing 737 Technical Site, a web publication produced by Chris Brady, a former 737 pilot and flight instructor. He writes:

MCAS is a longitudinal stability enhancement. It is not for stall prevention or to make the MAX handle like the NG; it was introduced to counteract the non-linear lift of the LEAP-1B engine nacelles and give a steady increase in stick force as AoA increases. The LEAP engines are both larger and relocated slightly up and forward from the previous NG CFM56-7 engines to accommodate their larger fan diameter. This new location and size of the nacelle cause the vortex flow off the nacelle body to produce lift at high AoA; as the nacelle is ahead of the CofG this lift causes a slight pitch-up effect (ie a reducing stick force) which could lead the pilot to further increase the back pressure on the yoke and send the aircraft closer towards the stall. This non-linear/reducing stick force is not allowable under

FAR = Federal Air Regulations. Part 25 deals with airworthiness standards for transport category airplanes. FAR §25.173 “Static longitudinal stability”. MCAS was therefore introduced to give an automatic nose down stabilizer input during steep turns with elevated load factors (high AoA) and during flaps up flight at airspeeds approaching stall.

Brady cites no sources for this statement, and as far as I know Boeing has neither confirmed nor denied. But *Aviation Week*, which earlier mentioned the thrust-pitch linkage, has more recently (issue of March 20) gotten behind the nacelle-lift instability hypothesis:

The MAX’s larger CFM Leap 1 engines create more lift at high AOA and give the aircraft a greater pitch-up moment than the CFM56-7-equipped NG. The MCAS was added as a certification requirement to minimize the handling difference between the MAX and NG.

Assuming the Brady account is correct, an interesting question is when Boeing noticed the instability. Were the designers aware of this hazard from the outset? Did it emerge during early computer simulations, or in wind tunnel testing of scale models? A story by Dominic Gates in the *Seattle Times* hints that Boeing may not have recognized the severity of the problem until flight tests of the first completed aircraft began in 2015.

According to Gates, the safety analysis that Boeing submitted to the FAA specified that MCAS would be allowed to move the horizontal stabilizer by no more than 0.6 degree. In the airplane ultimately released to the market, MCAS can go as far as 2.5 degrees, and it can act repeatedly until reaching the mechanical limit of motion at about 5 degrees. Gates writes:

That limit was later increased after flight tests showed that a more powerful movement of the tail was required to avert a high-speed stall, when the plane is in danger of losing lift and spiraling down.

The behavior of a plane in a high angle-of-attack stall is difficult to model in advance purely by analysis and so, as test pilots work through stall-recovery routines during flight tests on a new airplane, it’s not uncommon to tweak the control software to refine the jet’s performance.

The high-AoA instability of the MAX appears to be a property of the aerodynamic form of the entire aircraft, and so a direct way to suppress it would be to alter that form. For example, enlarging the tail surface might restore static stability. But such airframe modifications would have delayed the delivery of the airplane, especially if the need for them was discovered only after the first prototypes were already flying. Structural changes might also jeopardize inclusion of the new model under the old type certificate. Modifying software instead of aluminum must have looked like an attractive alternative. Someday, perhaps, we’ll learn how the decision was made.

By the way, according to Gates, the safety document filed with the FAA specifying a 0.6 degree limit has yet to be amended to reflect the true range of MCAS commands.

Instability is not necessarily the kiss of death in an airplane. There have been at least a few successful unstable designs, starting with the 1903 Wright Flyer. The Wright brothers deliberately put the horizontal stabilizer in front of the wing rather than behind it because their earlier experiments with kites and gliders had shown that what we call stability can also be described as sluggishness. The Flyer’s forward control surfaces (known as canards) tended to amplify any slight nose-up or nose-down motions. Maintaining a steady pitch attitude demanded high alertness from the pilot, but it also allowed the airplane to respond more quickly when the pilot *wanted* to pitch up or down. (The pros and cons of the design are reviewed in a 1984 paper by Fred E. C. Culick and Henry R. Jex.)

Another dramatically unstable aircraft was the Grumman X-29, a research platform designed in the 1980s. The X-29 had its wings on backwards; to make matters worse, the primary surfaces for pitch control were canards mounted in front of the wings, as in the Wright Flyer. The aim of this quirky project was to explore designs with exceptional agility, sacrificing static stability for tighter maneuvering. No unaided human pilot could have mastered such a twitchy vehicle. It required a digital fly-by-wire system that sampled the state of the airplane and adjusted the control surfaces up to 80 times per second. The controller was successful—perhaps too much so. It allowed the airplane to be flown safely, but in taming the instability it also left the plane with rather tame handling characteristics.

I have a glancing personal connection with the X-29 project. In the 1980s I briefly worked as an editor with members of the group at Honeywell who designed and built the X-29 control system. I helped prepare publications on the control laws and on their implementation in hardware and software. That experience taught me just enough to recognize something odd about MCAS: It is way too slow to be suppressing aerodynamic instability in a jet aircraft. Whereas the X-29 controller had a response time of 25 milliseconds, MCAS takes 10 seconds to move the 737 stabilizer through a 2.5-degree adjustment. At that pace, it cannot possibly keep up with forces that tend to flip the nose upward in a positive feedback loop.

There’s a simple explanation. MCAS is not meant to control an unstable aircraft. It is meant to restrain the aircraft from entering the regime where it becomes unstable. This is the same strategy used by other mechanisms of stall prevention—intervening before the angle of attack reaches the critical point. However, if Brady is correct about the instability of the 737 MAX, the task is more urgent for MCAS. Instability implies a steep and slippery slope. MCAS is a guard rail that bounces you back onto the road when you’re about to drive over the cliff.

Which brings up the question of Boeing’s announced plan to fix the MCAS problem. Reportedly, the revised system will not keep reactivating itself so persistently, and it will automatically disengage if it detects a large difference between the two AoA sensors. These changes should prevent a recurrence of the recent crashes. But do they provide adequate protection against the kind of mishap that MCAS was designed to prevent in the first place? With MCAS shut down, either manually or automatically, there’s nothing to stop an unwary or misguided pilot from wandering into the corner of the flight envelope where the MAX becomes unstable.

Without further information from Boeing, there’s no telling how severe the instability might be—if indeed it exists at all. The Brady article at the Boeing 737 Technical Site implies the problem is partly pilot-induced. Normally, to make the nose go higher and higher you have to pull harder and harder on the control yoke. In the unstable region, however, the resistance to pulling suddenly fades, and so the pilot may unwittingly pull the yoke to a more extreme position.

Is this human interaction a *necessary* part of the instability, or is it just an exacerbating factor? In other words, without the pilot in the loop, would there still be positive feedback causing runaway nose-up pitch? I have yet to find answers.

Another question: If the root of the problem is a deceptive change in the force resisting a nose-up movements of the control yoke, why not address that issue directly?

Even after the spurious activation of MCAS on Lion Air 610, the crash and the casualties would have been avoided if the pilots had simply turned the damn thing off. Why didn’t they? Apparently because they had never heard of MCAS, and didn’t know it was installed on the airplane they were flying, and had not received any instruction on how to disable it. There’s no switch or knob in the cockpit labeled “MCAS ON/OFF.” The Flight Crew Operation Manual does not mention it (except in a list of abbreviations), and neither did the transitional training program the pilots had completed before switching from the 737 NG to the MAX. The training consisted of either one or two hours (reports differ) with an iPad app.

Boeing’s explanation of these omissions was captured in a *Wall Street Journal* story:

One high-ranking Boeing official said the company had decided against disclosing more details to cockpit crews due to concerns about inundating average pilots with too much information—and significantly more technical data—than they needed or could digest.

To call this statement disingenuous would be disingenuous. What it is is preposterous. In the first place, Boeing did not withhold “more details”; they failed to mention the very existence of MCAS. And the too-much-information argument is silly. I don’t have access to the Flight Crew Operation Manual for the MAX, but the NG edition runs to more than 1,300 pages, plus another 800 for the Quick Reference Handbook. A few paragraphs on MCAS would not have sunk any pilot who wasn’t already drowning in TMI. Moreover, the manual carefully documents the speed-trim and mach-trim features, which seem to fall in the same category as MCAS: They act autonomously, and offer the pilot no direct interface for monitoring or adjusting them.

In the aftermath of the Lion Air accident, Boeing stated that the procedure for disabling MCAS was spelled out in the manual, even though MCAS itself wasn’t mentioned. That procedure is given in a checklist for “runaway stabilizer trim.” It is not complicated: Hang onto the control yoke, switch off the autopilot and autothrottles if they’re on; then, if the problem persists, flip two switches labeled “STAB TRIM” to the “CUTOUT” position. Only the last step will actually matter in the case of an MCAS malfunction.

This checklist is considered a “memory item”; pilots must be able to execute the steps without looking it up in the handbook. The Lion Air crew should certainly have been familiar with it. But could they recognize that it was the right checklist to apply in an airplane whose behavior was unlike anything they had seen in their training or previous 737 flying experience? According to the handbook, the condition that triggers use of the runaway checklist is “Uncommanded stabilizer trim movement occurs continuously.” The MCAS commands were not continuous but repetitive, so some leap of inference would have been needed to make this diagnosis.

By the time of the Ethiopian crash, 737 pilots everywhere knew all about MCAS and the procedure for disabling it. A preliminary report issued last week by Ethiopian Airlines indicates that after a few minutes of wrestling with the control yoke, the pilots on Flight 302 did invoke the checklist procedure, and moved the STAB TRIM switches to CUTOUT. The stabilizer then stopped responding to MCAS nose-down commands, but the pilots were unable to regain control of the airplane.

It’s not entirely clear why they failed or what was going on in the cockpit in those last minutes. One factor may be that the cutout switch disables not only automatic pitch trim movements but also manual ones requested through the buttons on the control yoke. The switch cuts all power to the electric motor that moves the stabilizer. In this situation the only way to adjust the trim is to turn the hand crank wheels near the pilots’ knees. During the crisis on Flight 302 that mechanism may have been too slow to correct the trim in time, or the pilots may have been so fixated on pulling the control yoke back with maximum force that they did not try the manual wheels. It’s also possible that they flipped the switches back to the NORMAL setting, restoring power to the stabilizer motor. The report’s narrative doesn’t mention this possibility, but the graph from the flight data recorder suggests it *(see below)*.

There’s room for debate on whether the MCAS system is a good idea when it is operating correctly, but when it activates *mistakenly* and sends an airplane diving into the sea, no one would defend it. By all appearances, the rogue behavior in both the Lion Air and the Ethiopian accidents was triggered by a malfunction in a single sensor. That’s not supposed to happen in aviation. It’s unfathomable that any aircraft manufacturer would knowingly build a vehicle in which the failure of a single part would lead to a fatal accident.

Protection against single failures comes from redundancy, and the 737 is so committed to this principle that it almost amounts to two airplanes wrapped up in a single skin. *three* of everything—sensors, computers, and actuators.

There’s one asterisk in this roster of redundancy: A device called the flight control computer, or FCC, apparently gets special treatment. There are two FCCs, but according to the Boeing 737 Technical Site only one of them operates during any given flight. All the other duplicated components run in parallel, receiving independent inputs, doing independent computations, emitting independent control actions. But for each flight just one FCC does all the work, and the other is put on standby. The scheme for choosing the active computer seems strangely arbitrary. Each day when the airplane is powered up, the left side FCC gets control for the first flight, then the right side unit takes over for the second flight of the day, and the two sides alternate until the power is shut off. After a restart, the alternation begins again with the left FCC.

Aspects of this scheme puzzle me. I don’t understand why redundant FCC units are treated differently from other components. If one FCC dies, does the other automatically take over? Can the pilots switch between them in flight? If so, would that be an effective way to combat MCAS misbehavior? I’ve tried to find answers in the manuals, but I don’t trust my interpretation of what I read.

I’ve also had a hard time learning anything about the FCC itself. I don’t know who makes it, or what it looks like, or how it is programmed. On a website called Closet Wonderfuls an item identified as a 737 flight control computer is on offer for $43.82, with free shipping.

In the context of the MAX crashes, the flight control computer is important for two reasons. First, it’s where MCAS lives; this is the computer on which the MCAS software runs. Second, the curious procedure for choosing a different FCC on alternating flights also winds up choosing which AoA sensor is providing input to MCAS. The left and right sensors are connected to the corresponding FCCs.

If the two FCCs are used in alternation, that raises an interesting question about the history of the aircraft that crashed in Indonesia. The preliminary crash report describes trouble with various instruments and controls on five flights over four days (including the fatal flight). All of the problems were on the left side of the aircraft or involved a disagreement between the left and right sides.

date | route | trouble reports | maintenance |
---|---|---|---|

Oct 26 | Tianjin → Manado | left side: no airspeed or altitude indications |
test left Stall Management and Yaw Damper computer; passed |

? | Manado → Denpasar | ? | ? |

Oct 27 | Denpasar → Manado | left side: no airspeed or altitude indications speed trim and mach trim warning lights |
test left Stall Management and Yaw Damper computer; failed reset left Air Data and Inertial Reference Unit retest left Stall Management and Yaw Damper computer; passed clean electrical connections |

Oct 27 | Manado → Denpasar | left side: no airspeed or altitude indications speed trim and mach trim warning lights autothrottle disconnect |
test left Stall Management and Yaw Damper computer; failed reset left Air Data and Inertial Reference Unit replace left AoA sensor |

Oct 28 | Denpasar → Jakarta | left/right disagree warning on airspeed and altitude stick shaker [MCAS activation] |
flush left pitot tube and static port clean electrical connectors on elevator “feel” computer |

Oct 29 | Jakarta → Pangkal Pinang | stick shaker [MCAS activation] |

Which of the five flights had the left-side FCC as active computer? The final two flights *(red)*, where MCAS activated, were both first-of-the-day flights and so presumably under control of the left FCC. For the rest it’s hard to tell, especially since maintenance operations may have entailed full shutdowns of the aircraft, which would have reset the alternation sequence.

The revised MCAS software will reportedly consult signals from both AoA sensors. What will it do with the additional information? Only one clue has been published so far: If the readings differ by more than 5.5 degrees, MCAS will shut down. What if the readings differ by 4 or 5 degrees?

The present MCAS system, with its alternating choice of left and right, has a 50 percent chance of disaster when a single random failure causes an AoA sensor to spew out falsely high data. With the same one-sided random failure, the updated MCAS will have a 100 percent chance of ignoring a pilot’s excursion into stall territory. Is that an improvement?

Although a faulty sensor should not bring down an airplane, I would still like to know what went wrong with the AoA vane.

It’s no surprise that AoA sensors can fail. They are mechanical devices operating in a harsh environment: winds exceeding 500 miles per hour and temperatures below –40. A common failure mode is a stuck vane, often caused by ice (despite a built-in de-icing heater). But a seized vane would produce a constant output, regardless of the real angle of attack, which is not the symptom seen in Flight 610. The flight data recorder shows small fluctuations in the signals from both the left and the right instruments. Furthermore, the jiggles in the two curves are closely aligned, suggesting they are both tracking the same movements of the aircraft. In other words, the left-hand sensor appears to be functioning; it’s just giving measurements offset by a constant deviation of roughly 20 degrees.

Is there some other failure mode that might produce the observed offset? Sure: Just bend the vane by 20 degrees. Maybe a catering truck or an airport jetway blundered into it. Another creative thought is that the sensor might have been installed wrong, with the entire unit rotated by 20 degrees. Several writers on a website called the Professional Pilots Rumour Network explored this possibility, but they ultimately concluded it was impossible. The manufacturer, doubtless aware of the risk, placed the mounting screws and locator pins asymmetrically, so the unit will only go into the hull opening one way.

You might get the same effect through an assembly error during the manufacture of the sensor. The vane could be incorrectly attached to the shaft, or else the internal transducer that converts angular position into an electrical signal might be mounted wrong. Did the designers also ensure that such mistakes are impossible? I don’t know; I haven’t been able to find any drawings or photographs of the sensor’s innards.

Looking for other ideas about what might have gone wrong, I made a quick, scattershot survey of FAA airworthiness directives that call for servicing or replacing AoA sensors. I found dozens of them, including several that discuss the same sensor installed on the 737 MAX (the Rosemount 0861). But none of the reports I read describes a malfunction that could cause a consistent 20-degree error.

For a while I thought that the fault might lie not in the sensor itself but farther along the data path. It could be something as simple as a bad cable or connector. Signals from the AoA sensor go to the Air Data and Inertial Reference Unit (ADIRU), where the sine and cosine components are combined and digitized to yield a number representing the measured angle of attack. The ADIRU also receives inputs from other sensors, including the pitot tubes for measuring airspeed and the static ports for air pressure. And it houses the gyroscopes and accelerometers of an inertial guidance system, which can keep track of aircraft motion without reference to external cues. (There’s a separate ADIRU for each side of the airplane.) Maybe there was a problem with the digitizer—a stuck bit rather than a stuck vane.

Further information has undermined this idea. For one thing, the AoA sensor removed by the Lion Air maintenance crew on October 27 is now in the hands of investigators. According to news reports, it was “deemed to be defective,” though I’ve heard no hint of what the defect might be. Also, it turns out that one element of the control system, the Stall Management and Yaw Damper (SMYD) computer, receives the raw sine and cosine voltages directly from the sensor, not a digitized angle calculated by the ADIRU. It is the SMYD that controls the stick-shaker function. On both the Lion Air and the Ethiopian flights the stick shaker was active almost continuously, so those undigitized sine and cosine voltages must have been indicating a high angle of attack. In other words the error already existed before the signals reached the ADIRU.

I’m still stumped by the fixed angular offset in the Lion Air data, but the question now seems a little less important. The release of the preliminary report on Ethiopian Flight 302 shows that the left-side AoA sensor on that aircraft also failed badly, but in a way that looks totally different. Here are the relevant traces from the flight data recorder:

The readings from the AoA sensors are the uppermost lines, red for the left sensor and blue for the right. At the left edge of the graph they differ somewhat when the airplane has just begun to move, but they fall into close coincidence once the roll down the runway has built up some speed. At takeoff, however, they suddenly diverge dramtically, as the left vane begins reading an utterly implausible 75 degrees nose up. Later it comes down a few degrees but otherwise shows no sign of the ripples that would suggest a response to airflow. At the very end of the flight there are some more unexplained excursions.

By the way, in this graph the light blue trace of automatic trim commands offers another clue to what might have happened in the last moments of Flight 302. Around the middle of the graph, the STAB TRIM switches were pulled, with the result that an automatic nose-down command had no effect on the stabilizer position. But at the far right, another automatic nose-down command does register in the trim-position trace, suggesting that the cutout switches may have been turned on again.

There’s so much I still don’t understand.

Puzzle 1. If the Lion Air and Ethiopian accidents were both caused by faulty AoA sensors, then there were three parts with similar defects in brand new aircraft (including the replacement sensor installed by Lion Air on October 27). A recent news item says the replacement was not a new part but one that had been refurbished by a Florida shop called XTRA Aerospace. This fact offers us somewhere else to point the accusatory finger, but presumably the two sensors installed by Boeing were not retreads, so XTRA can’t be blamed for all of them.

There are roughly 400 MAX aircraft in service, with 800 AoA sensors. Is a failure rate of 3 out of 800 unusual or unacceptable? Does that judgment depend on whether or not it’s the same defect in all three cases?

Puzzle 2. Let’s look again at the traces for pitch trim and angle of attack in the Lion Air 610 data. The conflicting manual and automatic commands in the second half of the flight have gotten lots of attention, but I’m also baffled by what was going on in the first few minutes.

During the roll down the runway, the pitch trim system was set near its maximum pitch-up position *(dark blue line)*. Immediately after takeoff, the automatic trim system began calling for further pitch-up movement, and the stabilizer probably reached its mechanical limit. At that point the pilots manually trimmed it in the pitch-down direction, and the automatic system replied with a rapid sequence of up adjustments. In other words, there was already a tug-of-war underway, but the pilots and the automated controls were pulling in directions opposite to those they would choose later on. All this happened while the flaps were still deployed, which means that MCAS could not have been active. Some other element of the control system must have been issuing those automatic pitch-up orders. Deepening the mystery, the left side AoA sensor was already feeding its spurious high readings to the left-side flight control computer. If the FCC was acting on that data, it should not have been commanding nose-up trim.

Puzzle 3. The AoA readings are not the only peculiar data in the chart from the Lion Air preliminary report. Here are the altitude and speed traces:

The left-side altitude readings *(red)* are low by at least a few hundred feet. The error looks like it might be multiplicative rather than additive, perhaps 10 percent. The left and right computed airspeeds also disagree, although the chart is too squished to allow a quantitative comparison. It was these discrepancies that initially upset the pilots of Flight 610; they could see them on their instruments. (They had no angle of attack indicators in the cockpit, so that conflict was invisible to them.)

Altitude, airspeed, and angle of attack are all measured by different sensors. Could they all have gone haywire at the same time? Or is there some common point of failure that might explain all the weird behavior? In particular, is it possible a single wonky AoA sensor caused all of this havoc? My guess is yes. The sensors for altitude and airspeed and even temperature are influenced by angle of attack. The measured speed and pressure are therefore adjusted to compensate for this confounding variable, using the output of the AoA sensor. That output was wrong, and so the adjustments allowed one bad data stream to infect all of the air data measurements.

Six months ago, I was writing about another disaster caused by an out-of-control control system. In that case the trouble spot was a natural gas distribution network in Massachusetts, where a misconfigured pressure-regulating station caused fires and explosions in more than 100 buildings, with one fatality and 20 serious injuries. I lamented: “The special pathos of technological tragedies is that the engines of our destruction are machines that we ourselves design and build.”

In a world where defective automatic controls are blowing up houses and dropping aircraft out of the sky, it’s hard to argue for *more* automation, for adding further layers of complexity to control systems, for endowing machines with greater autonomy. Public sentiment leans the other way. Like President Trump, most of us trust pilots more than we trust computer scientists. We don’t want MCAS on the flight deck. We want Chesley Sullenberger III, the hero of USAir Flight 1549, who guided his crippled A320 to a dead-stick landing in the Hudson River and saved all 155 souls on board. No amount of cockpit automation could have pulled off that feat.

Nevertheless, a cold, analytical view of the statistics suggests a different reaction. The human touch doesn’t always save the day. On the contrary, pilot error is responsible for more fatal crashes than any other cause. One survey lists pilot error as the initiating event in 40 percent of fatal accidents, with equipment failure accounting for 23 percent. No one is (yet) advocating a pilotless cockpit, but at this point in the history of aviation technology that’s a nearer prospect than a computer-free cockpit.

The MCAS system of the 737 MAX represents a particularly awkward compromise between fully manual and fully automatic control. The software is given a large measure of responsibility for flight safety and is even allowed to override the decisions of the pilot. And yet when the system malfunctions, it’s entirely up to the pilot to figure out what went wrong and how to fix it—and the fix had better be quick, before MCAS can drive the plane into the ground.

Two lost aircraft and 346 deaths are strong evidence that this design was not a good idea. But what to do about it? Boeing’s plan is a retreat from automatic control, returning more responsibility and authority to the pilots:

- Flight control system will now compare inputs from both AOA sensors. If the sensors disagree by 5.5 degrees or more with the flaps retracted, MCAS will not activate. An indicator on the flight deck display will alert the pilots.
- If MCAS is activated in non-normal conditions, it will only provide one input for each elevated AOA event. There are no known or envisioned failure conditions where MCAS will provide multiple inputs.
- MCAS can never command more stabilizer input than can be counteracted by the flight crew pulling back on the column. The pilots will continue to always have the ability to override MCAS and manually control the airplane.

A statement from Dennis Muilenburg, Boeing’s CEO, says the software update “will ensure accidents like that of Lion Air Flight 610 and Ethiopian Airlines Flight 302 never happen again.” I hope that’s true, but what about the accidents that MCAS was designed to prevent? I also hope we will not be reading about a 737 MAX that stalled and crashed because the pilots, believing MCAS was misbehaving, kept hauling back on the control yokes.

If Boeing were to take the opposite approach—not curtailing MCAS but enhancing it with still more algorithms that fiddle with the flight controls—the plan would be greeted with hoots of outrage and derision. Indeed, it seems like a terrible idea. MCAS was installed to prevent pilots from wandering into hazardous territory. A new supervisory system would keep an eye on MCAS, stepping in if it began acting suspiciously. Wouldn’t we then need another custodian to guard the custodians, ad infinitum? Moreoever, with each extra layer of complexity we get new side effects and unintended consequences and opportunities for something to break. The system becomes harder to test, and impossible to prove correct.

Those are serious objections, but the problem being addressed is also serious.

Suppose the 737 MAX didn’t have MCAS but did have a cockpit indicator of angle of attack. On the Lion Air flight, the captain would have felt the stick-shaker warning him of an incipient stall and would have seen an alarmingly high angle of attack on his instrument panel. His training would have impelled him to do the same thing MCAS did: Push the nose down to get the wings working again. Would he have continued pushing it down until the plane crashed? Surely not. He would have looked out the window, he would have cross-checked the instruments on the other side of the cockpit, and after some scary moments he would have realized it was a false alarm. (In darkness or low visibility, where the pilot can lose track of the horizon, the outcome might be worse.)

I see two lessons in this hypothetical exercise. First, erroneous sensor data is dangerous, whether the airplane is being flown by a computer or by Chesley Sullenberger. A prudently designed instrument and control system would take steps to detect (and ideally correct) such errors. At the moment, redundancy is the only defense against these failures—and in the unpatched version of MCAS even that protection is compromised. It’s not enough. One key to the superiority of human pilots is that they exercise judgment and sometimes skepticism about what the instruments tell them. That kind of reasoning is not beyond the reach of automated systems. There’s plenty of information to be exploited. For example, inconsistencies between AoA sensors, pitot tubes, static pressure ports, and air temperature probes not only signal that something’s wrong but can offer clues about *which* sensor has failed. The inertial reference unit provides an independent check on aircraft attitude; even GPS signals might be brought to bear. Admittedly, making sense of all this data and drawing a valid conclusion from it—a problem known as sensor fusion—is a major challenge.

Second, a closed-loop controller has yet another source of information: an implicit model of the system being controlled. If you change the angle of the horizontal stabilizer, the state of the airplane is expected to change in known ways—in angle of attack, pitch angle, airspeed, altitude, and in the rate of change in all these parameters. If the result of the control action is not consistent with the model, something’s not right. To persist in issuing the same commands when they don’t produce the expected results is not reasonable behavior. Autopilots include rules to deal with such situations; the lower-level control laws that run in manual-mode flight could incorporate such sanity checks as well.

I don’t claim to have the answer to the MCAS problem. And I don’t want to fly in an airplane I designed. (Neither do you.) But there’s a general principle here that I believe should be taken to heart: If an autonomous system makes life-or-death decisions based on sensor data, it ought to verify the validity of the data.

Boeing continues to insist that MCAS is “not a stall-protection function and not a stall-prevention function. It is a handling-qualities function. There’s a misconception it is something other than that.” This statement comes from Mike Sinnett, who is vice president of product development and future airplane development at Boeing; it appears in an *Aviation Week* article by Guy Norris published online April 9.

I don’t know exactly what “handling qualities” means in this context. To me the phrase connotes something that might affect comfort or aesthetics or pleasure more than safety. An airplane with different handling qualities would feel different to the pilot but could still be flown without risk of serious mishap. Is Sinnett implying something along those lines? If so—if MCAS is not critical to the safety of flight—I’m surprised that Boeing wouldn’t simply disable it temporarily, as a way of getting the fleet back in the air while they work out a permanent solution.

The Norris article also quote Sinnett as saying: “The thing you are trying to avoid is a situation where you are pulling back and all of a sudden it gets easier, and you wind up overshooting and making the nose higher than you want it to be.” That situation, with the nose higher than you want it to be, sounds to me like an airplane that might be approaching a stall.

A story by Jack Nicas, David Gelles, and James Glanz in today’s *New York Times* offers a quite different account, suggesting that “handling qualities” may have motivated the first version of MCAS, but stall risks were part of the rationale for later beefing it up.

The system was initially designed to engage only in rare circumstances, namely high-speed maneuvers, in order to make the plane handle more smoothly and predictably for pilots used to flying older 737s, according to two former Boeing employees who spoke on the condition of anonymity because of the open investigations.

For those situations, MCAS was limited to moving the stabilizer—the part of the plane that changes the vertical direction of the jet—about 0.6 degrees in about 10 seconds.

It was around that design stage that the F.A.A. reviewed the initial MCAS design. The planes hadn’t yet gone through their first test flights.

After the test flights began in early 2016, Boeing pilots found that just before a stall at various speeds, the Max handled less predictably than they wanted. So they suggested using MCAS for those scenarios, too, according to one former employee with direct knowledge of the conversations

Finally, another *Aviation Week* story by Guy Norris, published yesterday, gives a convincing account of what happened to the angle of attack sensor on Ethiopian Airlines Flight 302. According to Norris’s sources, the AoA vane was sheared off moments after takeoff, probably by a bird strike. This hypothesis is consistent with the traces extracted from the flight data recorder, including the strange-looking wiggles at the very end of the flight. I wonder if there’s hope of finding the lost vane, which shouldn’t be far from the end of the runway.

The moment I saw it, I had to stop in my tracks, grab a scratch pad, and check out the formula. The result made sense in a rough-and-ready sort of way. Since the multiplicative version of \(n!\) goes to infinity as \(n\) increases, the “divisive” version should go to zero. And \(\frac{n^2}{n!}\) does exactly that; the polynomial function \(n^2\) grows slower than the exponential function \(n!\) for large enough \(n\):

\[\frac{1}{1}, \frac{4}{2}, \frac{9}{6}, \frac{16}{24}, \frac{25}{120}, \frac{36}{720}, \frac{49}{5040}, \frac{64}{40320}, \frac{81}{362880}, \frac{100}{3628800}.\]

But why does the quotient take the particular form \(\frac{n^2}{n!}\)? Where does the \(n^2\) come from?

To answer that question, I had to revisit the long-ago trauma of learning to divide fractions, but I pushed through the pain. Proceeding from left to right through the formula in the tweet, we first get \(\frac{n}{n-1}\). Then, dividing that quantity by \(n-2\) yields

\[\cfrac{\frac{n}{n-1}}{n-2} = \frac{n}{(n-1)(n-2)}.\]

Continuing in the same way, we ultimately arrive at:

\[n \mathbin{/} (n-1) \mathbin{/} (n-2) \mathbin{/} (n-3) \mathbin{/} \cdots \mathbin{/} 1 = \frac{n}{(n-1) (n-2) (n-3) \cdots 1} = \frac{n}{(n-1)!}\]

To recover the tweet’s stated result of \(\frac{n^2}{n!}\), just multiply numerator and denominator by \(n\). (To my taste, however, \(\frac{n}{(n-1)!}\) is the more perspicuous expression.)

I am a card-carrying factorial fanboy. You can keep your fancy Fibonaccis; *this* is my favorite function. Every time I try out a new programming language, my first exercise is to write a few routines for calculating factorials. Over the years I have pondered several variations on the theme, such as replacing \(\times\) with \(+\) in the definition (which produces triangular numbers). But I don’t think I’ve ever before considered substituting \(\mathbin{/}\) for \(\times\). It’s messy. Because multiplication is commutative and associative, you can define \(n!\) simply as the product of all the integers from \(1\) through \(n\), without worrying about the order of the operations. With division, order can’t be ignored. In general, \(x \mathbin{/} y \ne y \mathbin{/}x\), and \((x \mathbin{/} y) \mathbin{/} z \ne x \mathbin{/} (y \mathbin{/} z)\).

The Fermat’s Library tweet puts the factors in descending order: \(n, n-1, n-2, \ldots, 1\). The most obvious alternative is the ascending sequence \(1, 2, 3, \ldots, n\). What happens if we define the divisive factorial as \(1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n\)? Another visit to the schoolroom algorithm for dividing fractions yields this simple answer:

\[1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n = \frac{1}{2 \times 3 \times 4 \times \cdots \times n} = \frac{1}{n!}.\]

In other words, when we repeatedly divide while counting up from \(1\) to \(n\), the final quotient is the reciprocal of \(n!\). (I wish I could put an exclamation point at the end of that sentence!) If you’re looking for a canonical answer to the question, “What do you get if you divide instead of multiplying in \(n!\)?” I would argue that \(\frac{1}{n!}\) is a better candidate than \(\frac{n}{(n - 1)!}\). Why not embrace the symmetry between \(n!\) and its inverse?

Of course there are many other ways to arrange the *n* integers in the set \(\{1 \ldots n\}\). How many ways? As it happens, \(n!\) of them! Thus it would seem there are \(n!\) distinct ways to define the divisive \(n!\) function. However, looking at the answers for the two permutations discussed above suggests there’s a simpler pattern at work. Whatever element of the sequence happens to come first winds up in the numerator of a big fraction, and the denominator is the product of all the other elements. As a result, there are really only \(n\) different outcomes—assuming we stick to performing the division operations from left to right. For any integer \(k\) between \(1\) and \(n\), putting \(k\) at the head of the queue creates a divisive \(n!\) equal to \(k\) divided by all the other factors. We can write this out as:

\[\cfrac{k}{\frac{n!}{k}}, \text{ which can be rearranged as } \frac{k^2}{n!}.\]

And thus we also solve the minor mystery of how \(\frac{n}{(n-1)!}\) became \(\frac{n^2}{n!}\) in the tweet.

It’s worth noting that all of these functions converge to zero as \(n\) goes to infinity. Asymptotically speaking, \(\frac{1^2}{n!}, \frac{2^2}{n!}, \ldots, \frac{n^2}{n!}\) are all alike.

Ta dah! Mission accomplished. Problem solved. Done and dusted. Now we know everything there is to know about divisive factorials, right?

Well, maybe there’s one more question. What does the computer say? If you take your favorite factorial algorithm, and do as the tweet suggests, replacing any appearance of the \(\times\) (or `*`

) operator with `/`

, what happens? Which of the \(n\) variants of divisive \(n!\) does the program produce?

Here’s *my* favorite algorithm for computing factorials, in the form of a Julia program:

```
function mul!(n)
if n == 1
return 1
else
return n * mul!(n - 1)
end
end
```

This is the algorithm that has introduced generations of nerds to the concept of recursion. In narrative form it says: If \(n\) is \(1\), then \(mul!(n)\) is \(1\). Otherwise, evaluate the function \(mul!(n-1)\), then multiply the result by \(n\).

The function can be written more tersely using Julia’s one-liner style of definition:.

`mul!(n) = n == 1 ? 1 : n * mul!(n - 1)`

The right side of the assignment statement is a conditional expression, or ternary operator, which has the form `a ? b : c`

. Here `a`

is a boolean test clause, which must return a value of either `true`

or `false`

. If `a`

is `true`

, clause `b`

is evaluated, and the result becomes the value of the entire expression. Otherwise clause `c`

is evaluated.

Just to be sure I’ve got this right, here are the first 10 factorials, as calculated by this program:

```
[mul!(n) for n in 1:10]
10-element Array{Int64,1}:
1
2
6
24
120
720
5040
40320
362880
3628800
```

Now let’s edit that definition and convert the single occurence of `*`

to a `/`

, leaving everything else (except the name of the function) unchanged.

`div!(n) = n == 1 ? 1 : n / div!(n - 1)`

And here’s what comes back when we run the program for values of \(n\) from \(1\) through \(20\):

```
[div!(n) for n in 1:20]
20-element Array{Real,1}:
1
2.0
1.5
2.6666666666666665
1.875
3.2
2.1875
3.657142857142857
2.4609375
4.063492063492063
2.70703125
4.432900432900433
2.9326171875
4.773892773892774
3.14208984375
5.092152292152292
3.338470458984375
5.391690662278897
3.523941040039063
5.675463855030418
```

Huh? That sure doesn’t look like it’s converging to zero—not as \(\frac{1}{n!}\) or as \(\frac{n}{n - 1}\). As a matter of fact, it doesn’t look like it’s going to converge at all. The graph below suggests the sequence is made up of two alternating components, both of which appear to be slowly growing toward infinity as well as diverging from one another.

In trying to make sense of what we’re seeing here, it helps to change the output type of the `div!`

function. Instead of applying the division operator `/`

, which returns the quotient as a floating-point number, we can substitute the `//`

operator, which returns an exact rational quotient, reduced to lowest terms.

`div!(n) = n == 1 ? 1 : n // div!(n - 1)`

Here’s the sequence of values for `n in 1:20`

:

```
20-element Array{Real,1}:
1
2//1
3//2
8//3
15//8
16//5
35//16
128//35
315//128
256//63
693//256
1024//231
3003//1024
2048//429
6435//2048
32768//6435
109395//32768
65536//12155
230945//65536
262144//46189
```

The list is full of curious patterns. It’s a double helix, with even numbers and odd numbers zigzagging in complementary strands. The even numbers are not just even; they are all powers of \(2\). Also, they appear in pairs—first in the numerator, then in the denominator—and their sequence is nondecreasing. But there are gaps; not all powers of \(2\) are present. The odd strand looks even more complicated, with various small prime factors flitting in and out of the numbers. (The primes *have* to be small—smaller than \(n\), anyway.)

This outcome took me by surprise. I had really expected to see a much tamer sequence, like those I worked out with pencil and paper. All those jagged, jitterbuggy ups and downs made no sense. Nor did the overall trend of unbounded growth in the ratio. How could you keep dividing and dividing, and wind up with bigger and bigger numbers?

At this point you may want to pause before reading on, and try to work out your own theory of where these zigzag numbers are coming from. If you need a hint, you can get a strong one—almost a spoiler—by looking up the sequence of numerators or the sequence of denominators in the Online Encyclopedia of Integer Sequences.

Here’s another hint. A small edit to the `div!`

program completely transforms the output. Just flip the final clause, changing `n // div!(n - 1)`

into `div!(n - 1) // n`

.

`div!(n) = n == 1 ? 1 : div!(n - 1) // n`

Now the results look like this:

```
10-element Array{Real,1}:
1
1//2
1//6
1//24
1//120
1//720
1//5040
1//40320
1//362880
1//3628800
```

This is the inverse factorial function we’ve already seen, the series of quotients generated when you march left to right through an ascending sequence of divisors \(1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n\).

It’s no surprise that flipping the final clause in the procedure alters the outcome. After all, we know that division is not commutative or associative. What’s not so easy to see is why the sequence of quotients generated by the original program takes that weird zigzag form. What mechanism is giving rise to those paired powers of 2 and the alternation of odd and even?

I have found that it’s easier to explain what’s going on in the zigzag sequence when I describe an iterative version of the procedure, rather than the recursive one. (This is an embarrassing admission for someone who has argued that recursive definitions are easier to reason about, but there you have it.) Here’s the program:

```
function div!_iter(n)
q = 1
for i in 1:n
q = i // q
end
return q
end
```

I submit that this looping procedure is operationally identical to the recursive function, in the sense that if `div!(n)`

and `div!_iter(n)`

both return a result for some positive integer `n`

, it will always be the same result. Here’s my evidence:

```
[div!(n) for n in 1:20] [div!_iter(n) for n in 1:20]
1 1//1
2//1 2//1
3//2 3//2
8//3 8//3
15//8 15//8
16//5 16//5
35//16 35//16
128//35 128//35
315//128 315//128
256//63 256//63
693//256 693//256
1024//231 1024//231
3003//1024 3003//1024
2048//429 2048//429
6435//2048 6435//2048
32768//6435 32768//6435
109395//32768 109395//32768
65536//12155 65536//12155
230945//65536 230945//65536
262144//46189 262144//46189
```

To understand the process that gives rise to these numbers, consider the successive values of the variables \(i\) and \(q\) each time the loop is executed. Initially, \(i\) and \(q\) are both set to \(1\); hence, after the first passage through the loop, the statement `q = i // q`

gives \(q\) the value \(\frac{1}{1}\). Next time around, \(i = 2\) and \(q = \frac{1}{1}\), so \(q\)’s new value is \(\frac{2}{1}\). On the third iteration, \(i = 3\) and \(q = \frac{2}{1}\), yielding \(\frac{i}{q} \rightarrow \frac{3}{2}\). If this is still confusing, try thinking of \(\frac{i}{q}\) as \(i \times \frac{1}{q}\). The crucial observation is that on every passage through the loop, \(q\) is inverted, becoming \(\frac{1}{q}\).

If you unwind these operations, and look at the multiplications and divisions that go into each element of the series, a pattern emerges:

\[\frac{1}{1}, \quad \frac{2}{1}, \quad \frac{1 \cdot 3}{2}, \quad \frac{2 \cdot 4}{1 \cdot 3}, \quad \frac{1 \cdot 3 \cdot 5}{2 \cdot 4} \quad \frac{2 \cdot 4 \cdot 6}{1 \cdot 3 \cdot 5}\]

The general form is:

\[\frac{1 \cdot 3 \cdot 5 \cdot \cdots \cdot n}{2 \cdot 4 \cdot \cdots \cdot (n-1)} \quad (\text{odd } n) \qquad \frac{2 \cdot 4 \cdot 6 \cdot \cdots \cdot n}{1 \cdot 3 \cdot 5 \cdot \cdots \cdot (n-1)} \quad (\text{even } n).

\]

The functions \(1 \cdot 3 \cdot 5 \cdot \cdots \cdot n\) for odd \(n\) and \(2 \cdot 4 \cdot 6 \cdot \cdots \cdot n\) for even \(n\) have a name! They are known as double factorials, with the notation \(n!!\). *n* is defined as the product of *n* and all smaller positive integers of the same parity. Thus our peculiar sequence of zigzag quotients is simply \(\frac{n!!}{(n-1)!!}\).

A 2012 article by Henry W. Gould and Jocelyn Quaintance (behind a paywall, regrettably) surveys the applications of double factorials. They turn up more often than you might guess. In the middle of the 17th century John Wallis came up with this identity:

\[\frac{\pi}{2} = \frac{2 \cdot 2 \cdot 4 \cdot 4 \cdot 6 \cdot 6 \cdots}{1 \cdot 3 \cdot 3 \cdot 5 \cdot 5 \cdot 7 \cdots} = \lim_{n \rightarrow \infty} \frac{((2n)!!)^2}{(2n + 1)!!(2n - 1)!!}\]

An even weirder series, involving the cube of a quotient of double factorials, sums to \(\frac{2}{\pi}\). That one was discovered by (who else?) Srinivasa Ramanujan.

Gould and Quaintance also discuss the double factorial counterpart of binomial coefficients. The standard binomial coefficient is defined as:

\[\binom{n}{k} = \frac{n!}{k! (n-k)!}.\]

The double version is:

\[\left(\!\binom{n}{k}\!\right) = \frac{n!!}{k!! (n-k)!!}.\]

Note that our zigzag numbers fit this description and therefore qualify as double factorial binomial coefficients. Specifically, they are the numbers:

\[\left(\!\binom{n}{1}\!\right) = \left(\!\binom{n}{n - 1}\!\right) = \frac{n!!}{1!! (n-1)!!}.\]

The regular binomial \(\binom{n}{1}\) is not very interesting; it is simply equal to \(n\). But the doubled version \(\left(\!\binom{n}{1}\!\right)\), as we’ve seen, dances a livelier jig. And, unlike the single binomial, it is not always an integer. (The only integer values are \(1\) and \(2\).)

Seeing the zigzag numbers as ratios of double factorials explains quite a few of their properties, starting with the alternation of evens and odds. We can also see why all the even numbers in the sequence are powers of 2. Consider the case of \(n = 6\). The numerator of this fraction is \(2 \cdot 4 \cdot 6 = 48\), which acquires a factor of \(3\) from the \(6\). But the denominator is \(1 \cdot 3 \cdot 5 = 15\). The \(3\)s above and below cancel, leaving \(\frac{16}{5}\). Such cancelations will happen in every case. Whenever an odd factor \(m\) enters the even sequence, it must do so in the form \(2 \cdot m\), but at that point \(m\) itself must already be present in the odd sequence.

Is the sequence of zigzag numbers a reasonable answer to the question, “What happens when you divide instead of multiply in \(n!\)?” Or is the computer program that generates them just a buggy algorithm? My personal judgment is that \(\frac{1}{n!}\) is a more intuitive answer, but \(\frac{n!!}{(n - 1)!!}\) is more interesting.

Furthermore, the mere existence of the zigzag sequence broadens our horizons. As noted above, if you insist that the division algorithm must always chug along the list of \(n\) factors in order, at each stop dividing the number on the left by the number on the right, then there are only \(n\) possible outcomes, and they all look much alike. But the zigzag solution suggests wilder possibilities. We can formulate the task as follows. Take the set of factors \(\{1 \dots n\}\), select a subset, and invert all the elements of that subset; now multiply all the factors, both the inverted and the upright ones. If the inverted subset is empty, the result is the ordinary factorial \(n!\). If *all* of the factors are inverted, we get the inverse \(\frac{1}{n!}\). And if every second factor is inverted, starting with \(n - 1\), the result is an element of the zigzag sequence.

These are only a few among the many possible choices; in total there are \(2^n\) subsets of \(n\) items. For example, you might invert every number that is prime or a power of a prime \((2, 3, 4, 5, 7, 8, 9, 11, \dots)\). For small \(n\), the result jumps around but remains consistently less than \(1\):

If I were to continue this plot to larger \(n\), however, it would take off for the stratosphere. Prime powers get sparse farther out on the number line.

Here’s a question. We’ve seen factorial variants that go to zero as \(n\) goes to infinity, such as \(1/n!\). We’ve seen other variants grow without bound as \(n\) increases, including \(n!\) itself, and the zigzag numbers. Are there any versions of the factorial process that converge to a finite bound other than zero?

My first thought was this algorithm:

```
function greedy_balance(n)
q = 1
while n > 0
q = q > 1 ? q /= n : q *= n
n -= 1
end
return q
end
```

We loop through the integers from \(n\) down to \(1\), calculating the running product/quotient \(q\) as we go. At each step, if the current value of \(q\) is greater than \(1\), we divide by the next factor; otherwise, we multiply. This scheme implements a kind of feedback control or target-seeking behavior. If \(q\) gets too large, we reduce it; too small and we increase it. I conjectured that as \(n\) goes to infinity, \(q\) would settle into an ever-narrower range of values near \(1\).

Running the experiment gave me another surprise:

That sawtooth wave is not quite what I expected. One minor peculiarity is that the curve is not symmetric around \(1\); the excursions above have higher amplitude than those below. But this distortion is more visual than mathematical. Because \(q\) is a ratio, the distance from \(1\) to \(10\) is the same as the distance from \(1\) to \(\frac{1}{10}\), but it doesn’t look that way on a linear scale. The remedy is to plot the log of the ratio:

Now the graph is symmetric, or at least approximately so, centered on \(0\), which is the logarithm of \(1\). But a larger mystery remains. The sawtooth waveform is very regular, with a period of \(4\), and it shows no obvious signs of shrinking toward the expected limiting value of \(\log q = 0\). Numerical evidence suggests that as \(n\) goes to infinity the peaks of this curve converge on a value just above \(q = \frac{5}{3}\), and the troughs approach a value just below \(q = \frac{3}{5}\). (The corresponding base-\(10\) logarithms are roughly \(\pm0.222\). I have not worked out why this should be so. Perhaps someone will explain it to me.

The failure of this greedy algorithm doesn’t mean we can’t find a divisive factorial that converges to \(q = 1\).

I have computed the optimal partitionings up to \(n = 30\), where there are a billion possibilities to choose from.

The graph is clearly flatlining. You could use the same method to force convergence to any other value between \(0\) and \(n!\).

And thus we have yet another answer to the question in the tweet that launched this adventure. What happens when you divide instead of multiply in n!? Anything you want.

]]>On my visit to Baltimore for the Joint Mathematics Meetings a couple of weeks ago, I managed to score a hotel room with a spectacular scenic view. My seventh-floor perch overlooked the Greene Street substation of the Baltimore Gas and Electric Company, just around the corner from the Camden Yards baseball stadium.

Some years ago, writing about such technological landscapes, I argued that you can understand what you’re looking at if you’re willing to invest a little effort:

At first glance, a substation is a bewildering array of hulking steel machines whose function is far from obvious. Ponderous tanklike or boxlike objects are lined up in rows. Some of them have cooling fins or fans; many have fluted porcelain insulators poking out in all directions…. If you look closer, you will find there is a logic to this mélange of equipment. You can make sense of it. The substation has inputs and outputs, and with a little study you can trace the pathways between them.

If I were writing that passage now, I would hedge or soften my claim that an electrical substation will yield its secrets to casual observation. Each morning in Baltimore I spent a few minutes peering into the Greene Street enclosure. I was able to identify all the major pieces of equipment in the open-air part of the station, and I know their basic functions. But making sense of the circuitry, finding the logic in the arrangement of devices, tracing the pathways from inputs to outputs—I have to confess, with a generous measure of chagrin, that I failed to solve the puzzle. I think I have the answers now, but finding them took more than eyeballing the hardware.

Basics first. A substation is not a generating plant. BGE does not “make” electricity here. The substation receives electric power in bulk from distant plants and repackages it for retail delivery. At Greene Street the incoming supply is at 115,000 volts (or 115 kV). The output voltage is about a tenth of that: 13.8 kV. How do I know the voltages? Not through some ingenious calculation based on the size of the insulators or the spacing between conductors. In an enlargement of one of my photos I found an identifying plate with the blurry and partially obscured but still legible notation “115/13.8 KV.”

The biggest hunks of machinery in the yard are the transformers *(photo below)*, which do the voltage conversion. Each transformer is housed in a steel tank filled with oil, which serves as both insulator and coolant. Immersed in the oil bath are coils of wire wrapped around a massive iron core. Stacks of radiator panels, with fans mounted underneath, help cool the oil when the system is under heavy load. A bed of crushed stone under the transformer is meant to soak up any oil leaks and reduce fire hazards.

Electricity enters and leaves the transformer through the ribbed gray posts, called bushings, mounted atop the casing. A bushing is an insulator with a conducting path through the middle. It works like the rubber grommet that protects the power cord of an appliance where it passes through the steel chassis. The high-voltage inputs attach to the three tallest bushings, with red caps; the low-voltage bushings, with dark gray caps, are shorter and more closely spaced. Notice that each high-voltage input travels over a single slender wire, whereas each low-voltage output has three stout conductors. That’s because reducing the voltage to one-tenth increases the current tenfold.

What about the three slender gray posts just to the left of the high-voltage bushings? They are lightning arresters, shunting sudden voltage surges into the earth to protect the transformer from damage.

Perhaps the most distinctive feature of this particular substation is what’s *not* to be seen. There are no tall towers carrying high-voltage transmission lines to the station. Clearing a right of way for overhead lines would be difficult and destructive in an urban center, so the high-voltage “feeders” run underground. In the photo at right, near the bottom left corner, a bundle of three metal-sheathed cables emerges from the earth. Each cable, about as thick as a human forearm, has a copper or aluminum conductor running down the middle, surrounded by insulation. I suspect these cables are insulated with layers of paper impregnated with oil under pressure; some of the other feeders entering the station may be of a newer design, with solid plastic insulation. Each cable plugs into the bottom of a ceramic bushing, which carries the current to a copper wire at the top. (You can tell the wire is copper because of the green patina.)

Connecting the feeder input to the transformer is a set of three hollow aluminum conductors called bus bars, held high overhead on steel stanchions and ceramic insulators. At both ends of the bus bars are mechanical switches that open like hinged doors to break the circuit. I don’t know whether these switches can be opened when the system is under power or whether they are just used to isolate components for maintenance after a feeder has been shut down. Beyond the bus bars, and hidden behind a concrete barrier, we can glimpse the bushings atop a different kind of switch, which I’ll return to below.

At this point you might be asking, why does everything come in sets of three—the bus bars, the feeder cables, the terminals on the transformer? It’s because electric power is distributed as three-phase alternating current. Each conductor carries a voltage oscillating at 60 Hertz, with the three waves offset by one-third of a cycle. If you recorded the voltage between each of the three pairs of conductors *(AB, AC, BC)*, you’d see a waveform like the one above at left.

At the other end of the conducting pathway, connected to three more bus bars on the low-voltage side of the transformer, is an odd-looking stack of three large drums. These

are current-limiting reactors (no connection with nuclear reactors). They are coils of thick conductors wound on a stout concrete armature. Under normal operating conditions they have little effect on the transmission of power, but in the milliseconds following a short circuit, the sudden rush of current generates a strong magnetic field in the coils, absorbing the energy of the fault current and preventing damage to other equipment.

So those are the main elements of the substation I was able to spot from my hotel window. They all made sense individually, and yet I realized over the course of a few days that I didn’t really understand how it all works together. My doubts are easiest to explain with the help of a bird’s eye view of the substation layout, cribbed from Google Maps:

My window vista was from off to the right, beyond the eastern edge of the compound. In the Google Maps view, the underground 115 kV feeders enter at the bottom or southern edge, and power flows northward through the transformers and the reactor coils, finally entering the building that occupies the northeast corner of the lot. Neither Google nor I can see inside this windowless building, but I know what’s in there, in a general way. That’s where the low-voltage (13.8 kV) distribution lines go underground and fan out to their various destinations in the neighborhood.

Let’s look more closely at the outdoor equipment. There are four high-voltage feeders, four transformers, and four sets of reactor coils. Apart from minor differences in geometry (and one newer-looking, less rusty transformer), these four parallel pathways all look alike. It’s a symmetric four-lane highway. Thus my first hypothesis was that four independent 115 kV feeders supply power to the station, presumably bringing it from larger substations and higher-voltage transmission lines outside the city.

However, something about the layout continued to bother me. If we label the four lanes of the highway from left to right, then on the high-voltage side, toward the bottom of the map view, it looks like there’s something connecting lanes 1 and 2 and, and there’s a similar link between lanes 3 and 4. From my hotel window the view of this device is blocked by a concrete barricade, and unfortunately the Google Maps image does not show it clearly either. (If you zoom in for a closer view, the goofy Google compression algorithm will turn the scene into a dreamscape where all the components have been draped in Saran Wrap.) Nevertheless, I’m quite sure of what I’m looking at. The device connecting the pairs of feeders is a high-voltage three-phase switch, or circuit breaker, something like the ones seen in the image at right (photographed at another substation, in Missouri.) The function of this device is essentially the same as that of a circuit breaker in your home electrical panel. You can turn it off manually to shut down a circuit, but it may also “trip” automatically in response to an overload or a short circuit. The concrete barriers flanking the two high-voltage breakers at Greene Street hint at one of the problems with such switches. Interrupting a current of hundreds of amperes at more than 100,000 volts is like stopping a runaway truck: It requires absorbing a lot of energy. The switch does not always survive the experience.

When I first looked into the Greene Street substation, I was puzzled by the *absence* of breakers at the input end of each main circuit. I expected to see them there to protect the transformers and other components from overloads or lightning strikes. I think there are breakers on the low-voltage side, tucked in just behind the transformers and thus not clearly visible from my window. But there’s nothing on the high side. I could only guess that such protection is provided by breakers near the output of the next substation upstream, the one that sends the 115 kV feeders into Greene Street.

That leaves the question of why pairs of circuits within the substation are cross-linked by breakers. I drew a simplified diagram of how things are wired up:

Two adjacent 115 kV circuits run from bottom to top; the breaker between them connects corresponding conductors—left to left, middle to middle, right to right. But what’s the point of doing so?

I had some ideas. If one transformer were out of commission, the pathway through the breaker could allow power to be rerouted through the remaining transformer (assuming it could handle the extra load). Indeed, maybe the entire design simply reflects a high level of redundancy. There are four incoming feeders and four transformers, but perhaps only two are expected to operate at any given time. The breaker provides a means of switching between them, so that you could lose a circuit (or maybe two) and still keep all the lights on. After all, this is a substation supplying power to many large facilities—the convention center (where the math meetings were held), a major hospital, large hotels, the ball park, theaters, museums, high-rise office buildings. Reliability is important here.

After further thought, however, this scheme seemed highly implausible. There are other substation layouts that would allow any of the four feeders to power any of the four transformers, allowing much greater flexibility in handling failures and making more efficient use of all the equipment. Linking the incoming feeders in pairs made no sense.

I would love to be able to say that I solved this puzzle on my own, just by dint of analysis and deduction, but it’s not true. When I got home and began looking at the photographs, I was still baffled. The answer eventually came via Google, though it wasn’t easy to find. Before revealing where I went wrong, I’ll give a couple of hints, which might be enough for you to guess the answer.

Hint 1. I was led astray by a biased sample. I am much more familiar with substations out in the suburbs or the countryside, partly because they’re easier to see into. Most of them are surrounded by a chain-link fence rather than a brick wall. But country infrastructure differs from the urban stuff.

Hint 2. I was also fooled by geometry when I should have been thinking about topology. To understand what you’re seeing in the Greene Street compound, you have to get beyond individual components and think about how it’s all connected to the rest of the network.

The web offers marvelous resources for the student of infrastructure, but finding them can be a challenge. You might suppose that the BGE website would have a list of the company’s facilities, and maybe a basic tutorial on where Baltimore’s electricity comes from. There’s nothing of the sort (although the utility’s parent company does offer thumbnail descriptions of some of their generating plants). Baltimore City websites were a little more helpful—not that they explained any details of substation operation, but they did report various legal and regulatory filings concerned with proposals for new or updated facilities. From those reports I learned the names of several BGE installations, which I could take back to Google to use as search terms.

One avenue I pursued was figuring out where the high-voltage feeders entering Greene Street come from. I discovered a substation called Pumphrey about five miles south of the city, near BWI airport, which seemed to be a major nexus of transmission lines. In particular, four 115 kV feeders travel north from Pumphrey to a substation in the Westport neighborhood, which is about a mile south of downtown. The Pumphrey-Westport feeders are overhead lines, and I had seen them already. Their right of way parallels the light rail route I had taken into town from the airport. Beyond the Westport substation, which is next to a light rail stop of the same name, the towers disappear. An obvious hypothesis is that the four feeders dive underground at Westport and come up at Greene Street. This guess was partly correct: Power does reach Greene Street from Westport, but not exclusively.

At Westport BGE has recently built a small, gas-fired generating plant, to help meet peak demands. The substation is also near the Baltimore RESCO waste-to-energy power plant *(photo above)*, which has become a local landmark. (It’s the only garbage burner I know that turns up on postcards sold in tourist shops.) Power from both of these sources could also make its way to the Greene Street substation, via Westport.

I finally began to make sense of the city’s wiring diagram when I stumbled upon some documents published by the PJM Interconnection, the administrator and coordinator of the power “pool” in the mid-Atlantic region. PJM stands for Pennsylvania–New Jersey–Maryland, but it covers a broader territory, including Delaware, Ohio, West Virginia, most of Virginia, and parts of Kentucky, Indiana, Michigan, and Illinois. Connecting to such a pool has important advantages for a utility. If an equipment failure means you can’t meet your customers’ demands for electricity, you can import power from elsewhere in the pool to make up the shortage; conversely, if you have excess generation, you can sell the power to another utility. The PJM supervises the market for such exchanges.

The idea behind power pooling is that neighbors can prop each other up in times of trouble; however, they can also knock each other down. As a condition of membership in the pool, utilities have to maintain various standards for engineering and reliability. PJM committees review plans for changes or additions to a utility’s network. It was a set of Powerpoint slides prepared for one such committee that first alerted me to my fundamental misconception. One of the slides included the map below, tracing the routes of 115 kV feeders *(green lines)* in and around downtown Baltimore.

I had been assuming—even though I should have known better—that the distribution network is essentially treelike, with lines radiating from each node to other nodes but never coming back together. For low-voltage distribution lines in sparsely settled areas, this assumption is generally correct. If you live in the suburbs or in a small town, there is one power line that runs from the local substation to your neighborhood; if a tree falls on it, you’re in the dark until the problem is fixed. There is no alternative route of supply. But that is *not* the topology of higher-voltage circuits. The Baltimore network consists of rings, where power can reach most nodes by following either of two pathways.

In the map we can see the four 115 kV feeders linking Pumphrey to Westport. From Westport, two lines run due north to Greene Street, then make a right turn to another station named Concord Street.

This double-ring architecture calls for a total reinterpretation of how the Greene Street substation works. I had imagined the four 115 kV inputs as four lanes of one-way traffic, all pouring into the substation and dead-ending in the four transformers. In reality we have just two roadways, both of which enter the substation and then leave again, continuing on to further destinations. And they are not one-way; they can both carry traffic in either direction. The transformers are like exit ramps that siphon off a portion of the traffic while the main stream passes by.

At Greene Street, two of the underground lines entering the compound come from Westport, but the other two proceed to Concord Street, the next station around the ring. What about the breakers that sit between the incoming and outgoing branches of each circuit? They open up the ring to isolate any section that experiences a serious failure. For example, a short circuit in one of the cables running between Greene Street and Concord Street would cause breakers at both of those stations to open up, but both stations would continue to receive power coming around the other branch of the loop.

This revised interpretation was confirmed by another document made available by PJM, this one written by BGE engineers as an account of their engineering practices for transmission lines and substations. It includes a schematic diagram of a typical downtown Baltimore substation. The diagram makes no attempt to reproduce the geometric layout of the components; it rearranges them to make the topology clearer.

The two 115 kV feeders that run through the substation are shown as horizontal lines; the solid black squares in the middle are the breakers that join the pairs of feeders and thereby close the two rings that run through all the downtown substations. The transformers are the W-shaped symbols at the ends of the branch lines.

A mystery remains. The symbol represents a disconnect switch, a rather simple mechanical device that generally cannot be operated when the power line is under load. The symbol is identified in the BGE document as a *circuit switcher*, a more elaborate device capable of interrupting a heavy current. In the Greene Street photos, however, the switches at the two ends of the high-voltage bus bars appear almost identical. I’m not seeing any circuit switchers there. But, as should be obvious by now, I’m capable of misinterpreting what I see.