This flimsy slip of paper seems like an odd scrap to preserve for the ages, but when I pulled it out of the envelope, I knew instantly where it came from and why I had saved it.

The year was 1967. I was 17 then; I’m 71 now. Transposing those two digits takes just a flick of the fingertips. I can blithely skip back and forth from one prime number to the other. But the span of lived time between 1967 and 2021 is a chasm I cannot so easily leap across. At 17 I was in a great hurry to grow up, but I couldn’t see as far as 71; I didn’t even try. Going the other way—revisiting the mental and emotional life of an adolescent boy—is also a journey deep into alien territory. But the straw wrapper helps—it’s a Proustian *aide memoire*.

In the spring of 1967 I had a girlfriend, Lynn. After school we would meet at the Maple Diner, where the booths had red leatherette upholstery and formica tabletops with a boomerang motif. We’d order two Cokes and a plate of french fries to share. The waitress liked us; she’d make sure we had a full bottle of ketchup. I mention the ketchup because it was a token of our progress toward intimacy. On our first dates Lynn had put only a dainty dab on her fries, but by April we were comfortable enough to reveal our true appetites.

One afternoon I noticed she was fiddling intently with the wrapper from her straw, folding and refolding. I had no idea what she was up to. A teeny paper airplane she would sail over my head? When she finished, she pushed her creation across the table:

What a wallop there was in that little wad of paper. At that point in our romance, the words had not yet been spoken aloud.

How did I respond to Lynn’s folded declaration? I can’t remember; the words are lost. But evidently I got through that awkward moment without doing any permanent damage. A year later Lynn and I were married.

Today, at 71, with the preserved artifact in front of me, my chief regret is that I failed to take up the challenge implicit in the word game Lynn had invented. Why didn’t I craft a reply by folding my own straw wrapper? There are quite a few messages I could have extracted by strategic deletions from “It’s a pleasure to serve you.”

itsapleasuretoserveyou==> I love you.

itsapleasuretoserveyou==> I please you.

itsapleasuretoserveyou==> I tease you.

itsapleasuretoserveyou==> I pleasure you.

itsapleasuretoserveyou==> I pester you.

itsapleasuretoserveyou==> I peeve you.

itsapleasuretoserveyou==> I salute you.

itsapleasuretoserveyou==> I leave you.

Not all of those statements would have been suited to the occasion of our rendezvous at the Maple Diner, but over the course of our years together—17 years, as it turned out—there came a moment for each of them.

How many words can we form by making folds in the straw-paper slogan? I could not have answered that question in 1967. I couldn’t have even asked it. But times change. Enumerating all the foldable messages now strikes me as an obvious thing to do when presented with the straw wrapper. Furthermore, I have the computational means to do it—although the project was not quite as easy as I expected.

A first step is to be explicit about the rules of the game. We are given a source text, in this case “It’s a pleasure to serve you.” Let us ignore the spaces between words as well as all punctuation and capitalization; in this way we arrive at the normalized text “itsapleasuretoserveyou”. A word is *foldable* if all of its letters appear in the normalized text in the correct order (though not necessarily consecutively). The folding operation amounts to an editing process in which our only permitted act is deletion of letters; we are not allowed to insert, substitute, or permute. If two or more foldable words are to be combined to make a phrase or sentence, they must follow one another in the correct order without overlaps.

So much for foldability. Next comes the fraught question: What is a word? Linguists and lexicographers offer many subtly divergent opinions on this point, but for present purposes a very simple definition will suffice: A finite sequence of characters drawn from the 26-letter English alphabet is a word if it can legally be played in a game of Scrabble. I have been working with a word list from the 2015 edition of Collins Scrabble Words, which has about 270,000 entries. (There are a number of alternative lists, which I discuss in an appendix at the end of this article.)

Scrabble words range in length from 2 to 15 letters. The upper limit—determined by the size of the game board—is not much of a concern. You’re unlikely to meet a straw-paper text that folds to yield words longer than *sesquipedalian*. The absence of 1-letter words is more troubling, but the remedy is easy: I simply added the words *a*, *I*, and *O* to my copy of the Scrabble list.

My first computational experiments with foldable words searched for examples at random. Writing a program for random sampling is often easier than taking an exact census of a population, and the sample offers a quick glimpse of typical results. The following Python procedure generates random foldable sequences of letters drawn from a given source text, then returns those sequences that are found in the Scrabble word list. (The parameter *k* is the length of the words to be generated, and *reps* specifies the number of random trials.)

```
def randomFoldableWords(text, lexicon, k, reps):
normtext = normalize(text)
n = len(normtext)
findings = []
for i in range(reps):
indices = random.sample(range(n), k)
indices.sort()
letters = ""
for idx in indices:
letters += normtext[idx]
if letters in lexicon:
findings.append(letters)
return findings
```

Here are the six-letter foldable words found by invoking the program as follows: `randomFoldableWords(scrabblewords, 6, 10000)`

.

please, plater, searer, saeter, parter, sleety, sleeve, parser, purvey, laster, islets, taster, tester, slarts, paseos, tapers, saeter, eatery, salute, tsetse, setose, salues, sparer

Note that the word saeter (you could look it up—I had to) appears twice in this list. The frequency of such repetitions can yield an estimate of the total population size. A variant of the mark-and-recapture method, well-known in wildlife ecology, led me to an estimate of 92 six-letter foldable Scrabble words in the straw-wrapper slogan. The actual number turns out to be 106.

Samples and estimates are helpful, but they leave me wondering, What am I missing? What strange and beautiful word has failed to turn up in any of the samples, like the big fish that never takes the bait? I had to have an exhaustive list.

In many word games, the tool of choice for computer-aided playing (or cheating) is the regular expression, or regex. A regex is a pattern defining a set of strings, or character sequences; from a collection of strings, a regex search will pick out those that match the pattern. For example, the regular expression `^.*love.*$`

selects from the Scrabble word list all words that have the letter sequence *love* somewhere within them. There are 137 such words, including some that I would not have thought of, such as *rollover* and *slovenly*. The regex `^.*l.*o.*v.*e.*$`

finds all words in which *l, o, v,* and *e* appear in sequence, whether of not they are adjacent. The set has 267 members, including such secret-lover gems as *bloviate*, *electropositive*, and *leftovers*.

A solution to the foldable words problem could surely be crafted with regular expressions, but I am not a regex wizard. In search of a more muggles-friendly strategy, my first thought was to extend the idea behind the random-sampling procedure. Instead of selecting foldable sequences at random, I’d generate all of them, and check each one against the word list.

The procedure below generates all three-letter strings that can be folded from the given text, and returns the subset of those strings that appear in the Scrabble word list:

```
def foldableStrings3(lexicon, text):
normtext = normalize(text)
n = len(normtext)
words = []
for i in range(0, n-2):
for j in range(i+1, n-1):
for k in range(j+1, n):
s = normtext[i] + normtext[j] + normtext[k]
if s in lexicon:
words.append(s)
return(words)
```

At the heart of the procedure are three nested loops that methodically step through all the foldable combinations: For any initial letter `text[i]`

we can choose any following letter `text[j]`

with` j > i`

; likewise `text[j]`

can be followed by any `text[k]`

with `k > j`

. This scheme works perfectly well, finding 348 instances of three-letter words. I speak of “instances” because some words appear in the list more than once; for example, *pee* can be formed in three ways. If we count only unique words, there are 137.

Following this model, we could write a separate routine for each word length from 1 to 15 letters, but that looks like a dreary and repetitious task. Nobody wants to write a procedure with loops nested 15 deep. An alternative is to write a meta-procedure, which would generate the appropriate procedure for each word length. I made a start on that exercise in advanced loopology, but before I got very far I realized there’s an easier way. I was wondering: In a text of *n* letters, how many foldable substrings exist—whether or not they are recognizable words? There are several ways of answering this question, but to me the most illuminating argument comes from an inclusion/exclusion principle. Consider the first letter of the text, which in our case is the letter *I*. In the set of all foldable strings, half include this letter and half exclude it. The same is true of the second letter, and the third, and so on. Thus each letter added to the text doubles the number of foldable strings, which means the total number of strings is simply \(2^n\). (Included in this count is the empty string, made up of no letters.)

This observation suggests a simple algorithm for generating all the foldable strings in any *n*-letter text. Just count from \(0\) to \(2^{n} - 1\), and for each value along the way line up the binary representation of the number with the letters of the text. Then select those letters that correspond to a `1`

bit, like so:

itsapleasuretoserveyou 0000100000110011111000

And so we see that the word `preserve`

corresponds to the binary representation of the number `134392`

.

Counting is something that computers are good at, so a word-search procedure based on this principle is straightforward:

```
def foldablesByCounting(lexicon, text):
normtext = normalize(text)
n = len(normtext)
words = []
for i in range(2**n - 1):
charSeq = ''
positions = positionsOf1Bits(i, n)
for p in positions:
charSeq += normtext[p]
if charSeq in lexicon:
words.append(charSeq)
return(words)
```

The outer loop (variable `i`

) counts from \(0\) to \(2^{n} - 1\); for each of these numbers the inner loop (variable `p`

) picks out the letters corresponding to 1 bits. The program produces the output expected. Unfortunately, it does so very slowly. For every character added to the text, running time roughly doubles. I haven’t the patience to plod through the \(2^22\) patterns in “itsapleasuretoserveyou”; estimates based on shorter phrases suggest the running time would be more than three hours.

In the middle of the night I realized my approach to this problem was totally backwards. Instead of blindly generating all possible character strings and filtering out the few genuine words, I could march through the list of Scrabble words and test each of them to see if it’s foldable. At worst I would have to try some 270,000 words. I could speed things up even more by making a preliminary pass through the Scrabble list, discarding all words that include characters not present in the normalized text. For the text “It’s a pleasure to serve you,” the character set has just 12 members: `aeiloprstuvy`

. Allowing only words formed from these letters slashes the Scrabble list down to a length of 12,816.

To make this algorithm work, we need a procedure to report whether or not a word can be formed by folding the given text. The simplest approach is to slide the candidate word along the text, looking for a match for each character in turn:

taste itsapleasuretoserveyoutaste itsapleasuretoserveyout aste itsapleasuretoserveyout a ste itsapleasuretoserveyout a s te itsapleasuretoserveyout a s t eitsapleasuretoserveyou

If every letter of the word finds a mate in the text, the word is foldable, as in the case of `taste`

, shown above. But an attempt to match `tastes`

would fall off the end of the text looking for a second `s`

, which does not exist.

The following code implements this idea:

```
def wordIsFoldable(word, text):
normtext = normalize(text)
t = 0 # pointer to positions in normtext
w = 0 # pointer to positions in word
while t < len(normtext):
if word[w] == normtext[t]: # matching chars in word and text
w += 1 # move to next char in word
if w == len(word): # matched all chars in word
return(True) # so: thumbs up
t += 1 # move to next char in text
return(False) # fell off the end: thumbs down
```

All we need to do now is embed this procedure in a loop that steps through all the candidate Scrabble words, collecting those for which `wordIsFoldable`

returns `True`

.

There’s still some waste motion here, since we are searching letter-by-letter through the same text, and repeating the same searches thousands of times. The source code (available on GitHub as a Jupyter notebook) explains some further speedups. But even the simple version shown here runs in less than two tenths of a second, so there’s not much point in optimizing.

I can now report that there are 778 unique foldable Scrabble words in “It’s a pleasure to serve you” (including the three one-letter words I added to the list). Words that can be formed in multiple ways bring the total count to 899.

And so we come to the tah-dah! moment—the unveiling of the complete list. I have organized the words into groups based on each word’s starting position within the text. (By Python convention, the positions are numbered from 0 through \(n-1\).) Within each group, the words are sorted according to the position of their last character; that position is given in the subscript following the word. For example, *tapestry* is in Group 1 because it begins at position 1 in the text (the *t* in *It’s*), and it carries the subscript 19 because it ends at position 19 (the *y* in *you*).

This arrangement of the words is meant to aid in contructing multiword phrases. If a word ends at position \(m\), the next word in the phrase must come from a group numbered \(m+1\) or greater.

**Group 0:** i_{0} it_{1} is_{2} its_{2} ita_{3} isle_{6} ilea_{7} isles_{8} itas_{8} ire_{11} issue_{11} iure_{11} islet_{12} io_{13} iso_{13} ileus_{14} ios_{14} ires_{14} islets_{14} isos_{14} issues_{14} issuer_{16} ivy_{19}

**Group 1:** ta_{3} tap_{4} tae_{6} tale_{6} tape_{6} te_{6} tala_{7} talea_{7} tapa_{7} tea_{7} taes_{8} talas_{8} tales_{8} tapas_{8} tapes_{8} taps_{8} tas_{8} teas_{8} tes_{8} tapu_{9} tau_{9} talar_{10} taler_{10} taper_{10} tar_{10} tear_{10} tsar_{10} taleae_{11} tare_{11} tease_{11} tee_{11} tapet_{12} tart_{12} tat_{12} taut_{12} teat_{12} test_{12} tet_{12} tret_{12} tut_{12} tao_{13} taro_{13} to_{13} talars_{14} talers_{14} talus_{14} taos_{14} tapers_{14} tapets_{14} tapus_{14} tares_{14} taros_{14} tars_{14} tarts_{14} tass_{14} tats_{14} taus_{14} tauts_{14} tears_{14} teases_{14} teats_{14} tees_{14} teres_{14} terts_{14} tests_{14} tets_{14} tres_{14} trets_{14} tsars_{14} tuts_{14} tasse_{15} taste_{15} tate_{15} terete_{15} terse_{15} teste_{15} tete_{15} toe_{15} tose_{15} tree_{15} tsetse_{15} taperer_{16} tapster_{16} tarter_{16} taser_{16} taster_{16} tater_{16} tauter_{16} tearer_{16} teaser_{16} teer_{16} teeter_{16} terser_{16} tester_{16} tor_{16} tutor_{16} tav_{17} tarre_{18} testee_{18} tore_{18} trove_{18} tutee_{18} tapestry_{19} tapstry_{19} tarry_{19} tarty_{19} tasty_{19} tay_{19} teary_{19} terry_{19} testy_{19} toey_{19} tory_{19} toy_{19} trey_{19} troy_{19} try_{19} too_{20} toro_{20} toyo_{20} tatou_{21} tatu_{21} tutu_{21}

**Group 2:** sap_{4} sal_{5} sae_{6} sale_{6} sea_{7} spa_{7} sales_{8} sals_{8} saps_{8} seas_{8} spas_{8} sau_{9} sar_{10} sear_{10} ser_{10} slur_{10} spar_{10} spear_{10} spur_{10} sur_{10} salse_{11} salue_{11} seare_{11} sease_{11} seasure_{11} see_{11} sere_{11} sese_{11} slae_{11} slee_{11} slue_{11} spae_{11} spare_{11} spue_{11} sue_{11} sure_{11} salet_{12} salt_{12} sat_{12} saut_{12} seat_{12} set_{12} slart_{12} slat_{12} sleet_{12} slut_{12} spart_{12} spat_{12} speat_{12} spet_{12} splat_{12} spurt_{12} st_{12} suet_{12} salto_{13} so_{13} salets_{14} salses_{14} saltos_{14} salts_{14} salues_{14} sapless_{14} saros_{14} sars_{14} sass_{14} sauts_{14} sears_{14} seases_{14} seasures_{14} seats_{14} sees_{14} seres_{14} sers_{14} sess_{14} sets_{14} slaes_{14} slarts_{14} slats_{14} sleets_{14} slues_{14} slurs_{14} sluts_{14} sos_{14} spaes_{14} spares_{14} spars_{14} sparts_{14} spats_{14} spears_{14} speats_{14} speos_{14} spets_{14} splats_{14} spues_{14} spurs_{14} spurts_{14} sues_{14} suets_{14} sures_{14} sus_{14} salute_{15} saree_{15} sasse_{15} sate_{15} saute_{15} setose_{15} slate_{15} sloe_{15} sluse_{15} sparse_{15} spate_{15} sperse_{15} spree_{15} saeter_{16} salter_{16} saluter_{16} sapor_{16} sartor_{16} saser_{16} searer_{16} seater_{16} seer_{16} serer_{16} serr_{16} slater_{16} sleer_{16} spaer_{16} sparer_{16} sparser_{16} spearer_{16} speer_{16} spuer_{16} spurter_{16} suer_{16} surer_{16} sutor_{16} sav_{17} sov_{17} salve_{18} save_{18} serre_{18} serve_{18} slave_{18} sleave_{18} sleeve_{18} slove_{18} sore_{18} sparre_{18} sperre_{18} splore_{18} spore_{18} stere_{18} sterve_{18} store_{18} stove_{18} salary_{19} salty_{19} sassy_{19} saury_{19} savey_{19} say_{19} serry_{19} sesey_{19} sey_{19} slatey_{19} slaty_{19} slavey_{19} slay_{19} sleety_{19} sley_{19} slurry_{19} sly_{19} soy_{19} sparry_{19} spay_{19} speary_{19} splay_{19} spry_{19} spurrey_{19} spurry_{19} spy_{19} stey_{19} storey_{19} story_{19} sty_{19} suety_{19} surety_{19} surrey_{19} survey_{19} salvo_{20} servo_{20} stereo_{20} sou_{21} susu_{21}

**Group 3:** a_{3} al_{5} ae_{6} ale_{6} ape_{6} aa_{7} ala_{7} aas_{8} alas_{8} ales_{8} als_{8} apes_{8} as_{8} alu_{9} alar_{10} aper_{10} ar_{10} alae_{11} alee_{11} alure_{11} apse_{11} are_{11} aue_{11} alert_{12} alt_{12} apart_{12} apert_{12} apt_{12} aret_{12} art_{12} at_{12} aero_{13} also_{13} alto_{13} apo_{13} apso_{13} auto_{13} aeros_{14} alerts_{14} altos_{14} alts_{14} alures_{14} alus_{14} apers_{14} apos_{14} apres_{14} apses_{14} apsos_{14} apts_{14} ares_{14} arets_{14} ars_{14} arts_{14} ass_{14} ats_{14} aures_{14} autos_{14} alate_{15} aloe_{15} arete_{15} arose_{15} arse_{15} ate_{15} alastor_{16} alerter_{16} alter_{16} apter_{16} aster_{16} arere_{18} ave_{18} aery_{19} alary_{19} alay_{19} aleatory_{19} apay_{19} apery_{19} arsey_{19} arsy_{19} artery_{19} artsy_{19} arty_{19} ary_{19} ay_{19} aloo_{20} arvo_{20} avo_{20} ayu_{21}

**Group 4:** pe_{6} pa_{7} pea_{7} plea_{7} pas_{8} peas_{8} pes_{8} pleas_{8} plu_{9} par_{10} pear_{10} per_{10} pur_{10} pare_{11} pase_{11} peare_{11} pease_{11} pee_{11} pere_{11} please_{11} pleasure_{11} plue_{11} pre_{11} pure_{11} part_{12} past_{12} pat_{12} peart_{12} peat_{12} pert_{12} pest_{12} pet_{12} plast_{12} plat_{12} pleat_{12} pst_{12} put_{12} pareo_{13} paseo_{13} peso_{13} pesto_{13} po_{13} pro_{13} pareos_{14} pares_{14} pars_{14} parts_{14} paseos_{14} pases_{14} pass_{14} pasts_{14} pats_{14} peares_{14} pears_{14} peases_{14} peats_{14} pees_{14} peres_{14} perts_{14} pesos_{14} pestos_{14} pests_{14} pets_{14} plats_{14} pleases_{14} pleasures_{14} pleats_{14} plues_{14} plus_{14} pos_{14} pros_{14} pures_{14} purs_{14} pus_{14} puts_{14} parse_{15} passe_{15} paste_{15} pate_{15} pause_{15} perse_{15} plaste_{15} plate_{15} pose_{15} pree_{15} prese_{15} prose_{15} puree_{15} purse_{15} parer_{16} parr_{16} parser_{16} parter_{16} passer_{16} paster_{16} pastor_{16} pater_{16} pauser_{16} pearter_{16} peer_{16} perter_{16} pester_{16} peter_{16} plaster_{16} plater_{16} pleaser_{16} pleasurer_{16} pleater_{16} poser_{16} pretor_{16} proser_{16} puer_{16} purer_{16} purr_{16} purser_{16} parev_{17} pav_{17} perv_{17} pareve_{18} parore_{18} parve_{18} passee_{18} pave_{18} peeve_{18} perve_{18} petre_{18} pore_{18} preeve_{18} preserve_{18} preve_{18} prore_{18} prove_{18} parry_{19} party_{19} pastry_{19} pasty_{19} patsy_{19} paty_{19} pay_{19} peatery_{19} peaty_{19} peavey_{19} peavy_{19} peeoy_{19} peery_{19} perry_{19} pervy_{19} pesty_{19} plastery_{19} platy_{19} play_{19} ploy_{19} plurry_{19} ply_{19} pory_{19} posey_{19} posy_{19} prey_{19} prosy_{19} pry_{19} pursy_{19} purty_{19} purvey_{19} puy_{19} parvo_{20} poo_{20} proo_{20} proso_{20} pareu_{21} patu_{21} poyou_{21}

**Group 5:** la_{7} lea_{7} las_{8} leas_{8} les_{8} leu_{9} lar_{10} lear_{10} lur_{10} lare_{11} lase_{11} leare_{11} lease_{11} leasure_{11} lee_{11} lere_{11} lure_{11} last_{12} lat_{12} least_{12} leat_{12} leet_{12} lest_{12} let_{12} lo_{13} lares_{14} lars_{14} lases_{14} lass_{14} lasts_{14} lats_{14} leares_{14} lears_{14} leases_{14} leasts_{14} leasures_{14} leats_{14} lees_{14} leets_{14} leres_{14} leses_{14} less_{14} lests_{14} lets_{14} los_{14} lues_{14} lures_{14} lurs_{14} laree_{15} late_{15} leese_{15} lose_{15} lute_{15} laer_{16} laser_{16} laster_{16} later_{16} leaser_{16} leer_{16} lesser_{16} lor_{16} loser_{16} lurer_{16} luser_{16} luter_{16} lav_{17} lev_{17} luv_{17} lave_{18} leave_{18} lessee_{18} leve_{18} lore_{18} love_{18} lurve_{18} lay_{19} leary_{19} leavy_{19} leery_{19} levy_{19} ley_{19} lory_{19} lovey_{19} loy_{19} lurry_{19} laevo_{20} lasso_{20} levo_{20} loo_{20} lassu_{21} latu_{21} lou_{21}

**Group 6:** ea_{7} eas_{8} es_{8} eau_{9} ear_{10} er_{10} ease_{11} ee_{11} ere_{11} east_{12} eat_{12} est_{12} et_{12} euro_{13} ears_{14} eases_{14} easts_{14} eats_{14} eaus_{14} eres_{14} eros_{14} ers_{14} eses_{14} ess_{14} ests_{14} euros_{14} erose_{15} esse_{15} easer_{16} easter_{16} eater_{16} err_{16} ester_{16} erev_{17} eave_{18} eve_{18} easy_{19} eatery_{19} eery_{19} estro_{20} evo_{20}

**Group 7:** a_{7} as_{8} ar_{10} ae_{11} are_{11} aue_{11} aret_{12} art_{12} at_{12} auto_{13} ares_{14} arets_{14} ars_{14} arts_{14} ass_{14} ats_{14} aures_{14} autos_{14} arete_{15} arose_{15} arse_{15} ate_{15} aster_{16} arere_{18} ave_{18} aery_{19} arsey_{19} arsy_{19} artery_{19} artsy_{19} arty_{19} ary_{19} ay_{19} aero_{20} arvo_{20} avo_{20} ayu_{21}

**Group 8:** sur_{10} sue_{11} sure_{11} set_{12} st_{12} suet_{12} so_{13} sets_{14} sos_{14} sues_{14} suets_{14} sures_{14} sus_{14} see_{15} sese_{15} setose_{15} seer_{16} ser_{16} suer_{16} surer_{16} sutor_{16} sov_{17} sere_{18} serve_{18} sore_{18} stere_{18} sterve_{18} store_{18} stove_{18} sesey_{19} sey_{19} soy_{19} stey_{19} storey_{19} story_{19} sty_{19} suety_{19} surety_{19} surrey_{19} survey_{19} servo_{20} stereo_{20} sou_{21} susu_{21}

**Group 9:** ur_{10} ure_{11} ut_{12} ures_{14} us_{14} uts_{14} use_{15} ute_{15} ureter_{16} user_{16} uey_{19} utu_{21}

**Group 10:** re_{11} ret_{12} reo_{13} reos_{14} res_{14} rets_{14} ree_{15} rete_{15} roe_{15} rose_{15} rev_{17} reeve_{18} resee_{18} reserve_{18} retore_{18} rore_{18} rove_{18} retry_{19} rory_{19} rosery_{19} rosy_{19} retro_{20} roo_{20}

**Group 11:** et_{12} es_{14} ee_{15} er_{16} ere_{18} eve_{18} eery_{19} evo_{20}

**Group 12:** to_{13} te_{15} toe_{15} tose_{15} tor_{16} tee_{18} tore_{18} toey_{19} tory_{19} toy_{19} trey_{19} try_{19} too_{20} toro_{20} toyo_{20}

**Group 13:** o_{13} os_{14} oe_{15} ose_{15} or_{16} ore_{18} oy_{19} oo_{20} ou_{21}

**Group 14:** ser_{16} see_{18} sere_{18} serve_{18} sey_{19} servo_{20} so_{20} sou_{21}

**Group 15:** er_{16} ee_{18} ere_{18} eve_{18} evo_{20}

**Group 16:** re_{18} reo_{20}

**Group 17:**

**Group 18:**

**Group 19:** yo_{20} you_{21} yu_{21}

**Group 20:** o_{20} ou_{21}

**Group 21:**

Naturally, I’ve tried out the code on a few other well-known phrases.

If Lynn and I had met at a different dining establishment, she might have found a straw with the statement, “It takes two hands to handle a Whopper.” There’s quite a diverse assortment of possible messages lurking in this text, with 1,154 unique foldable words and almost 2,000 word instances. Perhaps she would have chosen the upbeat “Inhale hope.” Or, in a darker mood, “I taste woe.”

If we had been folding dollar bills instead of straw wrappers, “In God We Trust” might have become the forward-looking proclamation, “I go west!” Horace Greeley’s marching order on the same theme, “Go west, young man,” gives us the enigmatic “O, wet yoga!” or, perhaps more aptly, “Gunman.”

Jumping forward from 1967 to 2021—from the Summer of Love to the Winter of COVID—I can turn “Wear a mask. Wash your hands.” into the plaintive, “We ask: Why us?” With “Maintain social distance,” the best I can do is “A nasal dance” or “A sad stance.”

And then there’s “Make America Great Again.” It yields “Meme rage.” Also “Make me ragtag.”

In a project like this one, you might think that getting a suitable list of English words would be the easy part. In fact it seems to be the main trouble spot.

The Scrabble lexicon I’ve been relying on derives from a word list known as SOWPODS, compiled by two associations of Scrabble players starting in the 1980s. Current editions of the list are distributed by a commercial publisher, Collins Dictionaries. If I understand correctly, all versions of the list are subject to copyright (see discussion on Stack Exchange) and cannot legally be distributed without permission. But no one seems to be much bothered by that fact. Copies of the lists in plain-text format, with one word per line, are easy to find on the internet—and not just on dodgy sites that specialize in pirated material.

There are alternative lists without legal encumbrances. Indeed, there’s a good chance you already have one such list pre-installed on your computer. A file called `words`

is included in most distributions of the Unix operating system, including MacOS; my copy of the file lives in `usr/share/dict/words`

. If you don’t have or can’t find the Unix `words`

file, I suggest downloading the Natural Language Toolkit, a suite of data files and Python programs that includes a lexicon almost identical to Unix words, as well as many other linguistic resources.

The Scrabble list has one big advantage over `words`

: It includes plurals and inflected forms of verbs—not just *test* but also *tests*, *tested*, and *testing*. [Bad example; see comments below.] The `words`

file is more like a list of dictionary head words, with only the stem form explicitly included. On the other hand, `words`

has an abundance of names and other proper nouns, as well as abbreviations, which are excluded from the Scrabble list since they are not legal plays in the board game.

How about combining the two word lists? Their union has just under 400,000 entries—quite a large lexicon. Using this augmented list for the analysis of “It’s a pleasure to serve you,” my program finds an additional 219 foldable words, beyond the 778 found with the Scrabble list alone. Here they are:

aaru aer aerose aes alares alaster alea alerse aleut alo alose alur aly ao apa apar aperu apus aro arry aru ase asor asse ast astor atry aueto aurore aus ausu aute e eastre eer erse esere estre eu ey iao ie ila islay ist isuret itala itea iter ito iyo l laet lao larry larve lastre lasty latro laur leo ler lester lete leto loro lu lue luo lut luteo lutose ly oer ory ovey p parsee parto passo pastose pato pau paut pavo pavy peasy perty peru pess peste pete peto petr plass platery pluto poe poy presee pretry pu purre purry puru r reve ro roer roey roy s sa saa salar salat salay saltee saltery salvy sao sapa saple sapo sare sart saur sauty sauve se seary seave seavy seesee sero sert sesuto sla slare slav slete sloo sluer soe sory soso spary spass spave spleet splet splurt spor spret sprose sput ssu stero steve stre strey stu sueve suto sutu suu t taa taar tal talao talose taluto tapeats tapete taplet tapuyo tarr tarse tartro tarve tasser tasu taur tave tavy teaer teaey teart teasy teaty teave teet teety tereu tess testor toru torve tosy tou treey tsere tst tu tue tur turr turse tute tutory u uro urs uru usee v vu y

Many of the proper nouns in this list are present in the vocabulary of most English speakers: *Aleut, Peru, Pluto, Slav*; the same is true of personal names such as *Larry, Leo, Stu, Tess*. But the rest of the words are very unlikely to turn up in the smalltalk of teenage sweethearts. Indeed, the list is full of letter sequences I simply don’t recognize as English words. Please define *isuret, ovey, spleet,* or *sput*.

There are even bigger word lists out there. In 2006 Google extracted 13.5 million unique English words from public web pages. (The sheer number implies a very liberal definition of *English* and *word*.) A good place to start exploring this archive is Peter Norvig’s website, which offers a file with the 333,333 most frequent words from the corpus. The list begins as you might expect: *the, of, and, to, a, in, for*…; but the weirdness creeps in early. The single letters *c, e, s,* and *x* are all listed among the 100 most common “words,” and the rest of the alphabet turns up soon after. By the time we get to the end of the file, it’s mostly typos *(mepquest, halloweeb, scholarhips)*, run-together words *(dietsdontwork, weightlossdrugs)*, and hundreds of letter strings that have some phonetic or orthographic resemblance to *Google* or *Yahoo!* or both *(hoogol, googgl, yahhol, gofool, yogol)*. (I suspect that much of this rubbish was scraped not from the visible text of web pages but from metadata stuffed into headers for purposes of search-engine optimization.)

Applying the Google list to the search for foldable words more than doubles the volume of results, but it contributes almost nothing to the stock of words that might form interesting messages. I found 1,543 new words, beyond those that are also present in the union of the Scrabble and Unix lists. In alphabetical order, the additions begin: *aae, aao, aaos, aar, aare, aaro, aars, aart, aarts, aase, aass, aast, aasu, aat, aats, aatsr, aau, aaus, aav, aave, aay, aea, aeae….* I’m not going to be folding up any straw wrappers with those words for my sweetheart.

What we really need, I begin to think, is not a longer word list but a shorter and more discriminating one.

]]>The tableau presented below is a product of my amateur efforts to address these questions. It’s a simple exercise in the mechanics of probability. I take a sample of the U.S. population, roughly 10,000 people, and randomly assign them to clusters of size \(n\), where \(n\) can range from 1 to 32. (In any single run of the model, \(n\) is fixed; all the groups are the same size.) Each cluster represents a Thanksgiving gathering. If a cluster includes someone infected with SARS-CoV-2, the disease may spread to the uninfected and susceptible members of the same group.

With the model’s default settings, \(n = 12\). The population sample consists of 9,900 people, represented as tiny colored dots arranged in 825 clusters of 12 dots each. Most of the dots are green, indicating susceptible individuals. Red dots are the infectious spreaders. Purple dots represent the unfortunates who are newly infected as a result of mingling with spreaders in these holiday get-togethers. I count the purple dots and estimate the rate of new infections per 100,000 population.

You can explore the model on your own. Twiddle with the sliders in the control panel, then press the “Go” button to generate a new sample population and a new cycle of infections. For example, by moving the group-size slider you can get a thousand clusters of 10 persons each, or 400 clusters of 25 each.

Before going any further with this discussion, I should make clear that the simulation is *not* offered as a prediction of how Covid-19 will spread during tomorrow’s Thanksgiving festivities. This is not a guide to personal risk assessment. If you play around with the controls, you’ll soon discover you can make the model say anything you wish. Depending on the settings you choose, the result can lie anywhere along the entire spectrum of possible outcomes, from nobody-gets-sick to everybody’s-got-it. There are settings that lead to impossible states, such as infection rates beyond 100 percent. Even so, I’m not totally convinced that the model is useless. It might point to combinations of parameters that would limit the damage.

The crucial input that drives the model is the daily tally of Covid cases for the entire country, expressed as a rate of new infections per 100,000 population. The official version of this statistic is published by the CDC; a few other organizations, including Johns Hopkins and the New York Times, maintain their own daily counts. The CDC report for November 24 cites a seven-day rolling average of 52.3 new cases per 100,000 people. For the model I set the default rate at 50, but the slider marked “daily new cases per 100,000 population” will accommodate any value between 0 and 500.

From the daily case rate we can estimate the prevalence of the disease: the total number of active cases at a given moment. In the model, the prevalence is simply 14 times the daily case rate. In effect, I am assuming (or pretending) that the daily rate is unchanging and that everyone’s illness lasts 14 days from the moment of infection to full recovery. Neither of these assumptions is true. In a model of ongoing disease propagation, where today’s events determine what happens next week, the steady-state approximation would be unacceptable. But this model produces only a snapshot on one particular day of the year, and so dynamics are not very important.

What we *do* need to consider in more detail is the sequence of stages in a case of Covid-19. The archetypal model in epidemiology has three stages: susceptible *(S)*, infected *(I)*, and recovered *(R)*; *R* stands for “removed,” acknowledging that recovery isn’t the only possible end of an illness. But I am going to look away from the grimmer aspects of this story.*(U)*, infectious *(I)*, and symptomatic *(Q)*, which gives us a SUIQR model. An incubating patient has been infected but is not yet producing enough virus particles to infect others. The infectious stage is the most dangerous period: Patients have no conspicuous symptoms and are still unaware of their own infection, but nonetheless they are spewing virus particles with every breath.

During the symptomatic phase, patients know they are sick and should be in quarantine; hence the letter *Q*. For the purposes of the model I assume that everyone in category *Q* will decline the invitation to Thanksgiving dinner. *x*, which I think of as an empty chair at the dinner table. The purple dots for newly acquired infections add a sixth category to the model, although they really belong to the incubating *U* class.

A parameter of some importance is the duration of the presymptomatic infectious stage, since the red-dot people in that category are the only ones actually spreading the disease in my model of Thanksgiving gatherings. I made a foray into the medical literature to pin down this number, but what I learned is that after a year of intense scrutiny there’s still a lot we don’t know about Covid-19. The typical period from infection to the onset of symptoms (encompassing both the *U* and *I* stages of my model) is four or five days, but apparently it can range from less than two days to three weeks. The graph below is based on a paper by Conor McAloon and colleagues that aggregates results of eight studies carried out early in the pandemic (when it’s easier to determine the date of infection, since cases are rare and geographically isolated).

Ultimately I decided, for the sake of simplicity (or lazy convenience) to collapse this distribution to its median, which is about five days. Then there’s the question of when within this period an infected person becomes dangerous to those nearby. Various sources [Harvard, MIT, Fox News] suggest that infected individuals begin spreading the virus two or three days before they show symptoms, and that the moment of maximum infectiousness comes shortly before symptom onset. I chose to interpret “two or three days” as 2.5 days.

What all this boils down to is the following relation: If the national new-case rate is 50 per 100,000, then among Thanksgiving celebrants in the model, 125 per 100,000 are Covid spreaders. That’s 0.125 percent. Turn to the person on your left. Turn to your right. Are you feeling lucky?

The model’s default settings assume a new-case rate of \(50\) per \(100,000\), a Thanksgiving group size of \(12\), and a \(0.25\) probability of transmitting the virus from an infectious person to a susceptible person. Let’s do some back-of-the-envelope calculating. As noted above, the \(50/100{,}000\) new case rate translates into \(125/100{,}000\) infectious individuals. Among the \(\approx 10,000\) members of the model population, we shoud expect to see \(12\) or 1\(\)3 red-dot *I*s. Because the number of *I*s is much smaller than the number of groups \((825)\), it’s unlikely that more than one red dot will turn up in any single group of \(12\). In each group with a single spreader, we can expect the virus to jump to \(0.25 \times 11 = 2.75\) of the spreader’s companions. This assumes that all the companions are green-dot susceptibles, which isn’t quite true. There are also yellow-dot incubating and blue-dot recovered people, as well as the red-*x* empty chairs of those in quarantine. But these are small corrections. The envelope estimate gives \(344/100,000\) new infections on Thanksgiving day; the computer model yields 325 per 100,000, when averaged over many runs.

But the average doesn’t tell the whole story. The variance of these outcomes is quite high, as you’ll see if you press the “Go” button repeatedly. Counting the number of new infections in each of a million runs of the model, the distribution looks like this:

The peak of the curve is at 30 new infections per model run, which corresponds to about 300 cases for 100,000 population, but you shouldn’t be surprised to see a result of 150 or 500.

If the effect of Thanksgiving gatherings in the real world matches the results of this model, we’re in serious trouble. A rate of 300 cases per 100,000 people corresponds to just under a million new cases in the U.S. population. All of those infections would arise on a single day (although few of them would be detected until about a week later). That’s an outburst of contagion more than five times bigger than the worst daily toll recorded so far.

But there are plenty of reasons to be skeptical of this result.

Even in a “normal” year, not everyone in America sits down at a table for 12 to exchange gossip and germs, and surely many more will be sitting out this year’s events. According to a survey attributed

Another potential mitigating factor is that people invited to your holiday celebration are probably not selected at random from the whole population, as they are in the model. Guests tend to come in groups, often family units. If your aunt and uncle and their three kids all live together, they probably get sick together, too. Thus a gathering of 12 individuals might better be treated as an assembly of three or four “pods.” One way to introduce this idea into the computer model is to enforce nonzero correlations between the people selected for each group. If one attendee is infectious, that raises the probability that others will also be infectious, and vice versa. As the correlation coefficient increases, groups are increasingly homogeneous. If lots of spreaders are crowded in one group, they can’t infect the vulnerable people in other groups. In the model, a correlation coefficient of 0.5 reduces the average number of new cases from 32.5 to 23.5. (Complete or perfect correlation eliminates contagion altogether, but this is highly unrealistic.)

Geography should also be considered. The national average case rate of 50 per 100,000 conceals huge local and regional variations. In Hawaii the rate is about 5 cases, so if you and all your guests are Hawaiians, you’ll have to be quite unlucky to pick up a Covid case at the Thanksgiving luau. At the other end of the scale, there are counties in the Great Plains states that have approached 500 cases per 100,000 in recent weeks. A meal with a dozen attendees in one of those hotspots looks fairly calamitous: The model shows 3,000 new cases for 100,000, or 3 percent of the population.

If you are determined to have a big family meal tomorrow and you want to minimize the risks, there are two obvious strategies. You can reduce the chance that your gathering includes someone infectious, or you can reduce the likelihood that any infectious person who happens to be present will transmit the virus to others. Most of the recommendations I’ve read in the newspaper and on health-care websites focus on the latter approach. They urge us to wear masks, the keep everyone at arms’ length, to wash our hands, to open all the windows (or better yet to hold the whole affair outdoors). Making it a briefer event should also help.

In the model, any such measures are implemented by nudging the slider for transmission probability toward smaller values. The effect is essentially linear over a broad range of group sizes. Reducing the transmission probability by half reduces the number of new infections proportionally.

The trouble is, I have no firm idea of what the actual transmission probability might be, or how effective those practices would be in reducing it. A recent study by a group at Vanderbilt University found a transmission rate within households of greater than 50 percent. I chose 25 percent as the default value in the model on the grounds that spending a single day together should be less risky than living permanently under the same roof. But the range of plausible values remains quite wide. Perhaps studies done in the aftermath of this Thanksgiving will yield better data.

As for reducing the chance of having an infectious guest, one approach is simply to reduce the size of the group. In this case the effect is better than linear, but only slightly so. Splitting that 12-person meal into two separate 6-seat gatherings cuts the infection rate by a little more than half, from 32.5 to 15.2. And, predictably, larger groups have worse outcomes. Pack in 24 people per group and you can expect 70 infections. Neither of these strategies seems likely to cut the infection rate by a factor of 10 or more. Unless, of course, everyone eats alone. Set the group-size slider to 1, and no one gets sick.

Another factor to keep in mind is that this model counts only infections passed from person to person during a holiday get-together. Leaving all those cases aside, the country has quite a fierce rate of “background” transmission happening on days with no special events. If the Thanksgiving cases are to be added to the background cases, we’re even worse off than the model would suggest. But the effect could be just the opposite. A family holiday is an occasion when most people skip some ordinary activities that can also be risky. Most of us have the day off from work. We are less likely to go out to a bar or a restaurant. It’s even possible that the holiday will actually suppress the overall case rate. But don’t bet your life on it.

There’s one more wild card to be taken into account. A tacit assumption in the structure of the model is that the reported Covid case count accurately reflects the prevalence of the disease in the population. This is surely not quite true. There are persistent reports of asymptomatic cases—people who are infected and infectious, but who never feel unwell. Those cases are unlikely to be recorded. Others may be ill and suspect the cause is Covid but avoid getting medical care for one reason or another. (For example, they may fear losing their job.) All in all, it seems likely the CDC is under-reporting the number of infections.

Early in the course of the epidemic, a group at Georgia Tech led by Aroon Chande built a risk-estimating web tool based on case rates for individual U.S. counties. They included an adjustment for “ascertainment bias” to compensate for cases omitted from official public health estimates. Their model multiplies the reported case counts by a factor of either 5 or 10. This adjustment may well have been appropriate last spring, when Covid testing was hard to come by even for those with reasonable access to medical services. It seems harder to justify such a large multiplier now, but the model, which is still being maintained, continues to insert a fivefold or tenfold adjustment. Out of curiosity, I have included a slider that can be set to make a similar adjustment.

Is it possible that we are still counting only a tenth of all the cases? If so, the cumulative total of infections since the virus first came ashore in the U.S. is 10 times higher than official estimates. Instead of 12.5 million total cases, we’ve experienced 125 million; more than a third of the population has already been through the ordeal and (mostly) come out the other side. We’ll know the answer soon. At the present infection rate (multiplied by 10), we will have burned through another third of the population in just a few weeks, and infection rates should fall dramatically through herd immunity. (I’m not betting my life on this one either.)

One other element of the Covid story that ought to be in the model is testing, which provides another tool for improving the chances that we socialize only with safe companions. If tests were completely reliable, their effect would merely be to move some fraction of the dangerous red-dot category into the less-dangerous red-*x* quarantined camp. But false-positive and false-negative testing results complicate the situation. (If the actual infection rate is low, false positives may outnumber true positives.)

I offer no conclusions or advice as a result of my little adventure in computational epidemiology. You should not make life-or-death decisions based on the writings of some doofus at a website called bit-player. (Nor based on a tweet from @realDonaldTrump.)

I *do* have some stray thoughts about the nature of holidays in Covid times. In the U.S. most of our holidays, both religious and secular, are intensely social, convivial occasions. Thanksgiving is a feast, New Year’s Eve is a party, Mardi Gras is a parade, St. Patrick’s Day is a pub crawl, July Fourth is a picnic. I’m not asking to abolish these traditions, some of which I enjoy myself. But they are not helping matters in the midst of a raging epidemic. Every one of these occasions can be expected to produce a spike in that curve we’re supposed to be flattening.

I wish we could find a spot on the calendar for a new kind of holiday—a day or a weekend for silent and solitary contemplative respite. Close the door, or go off by yourself. Put a dent in the curve.

]]>In that essay I also mentioned three other questions about trees that have long been bothering me. In this sequel I want to poke at those other questions a bit more deeply.

Botanists have an elaborate vocabulary for describing leaf shapes: *cordate* (like a Valentine heart), *cuneate* (wedgelike), *ensiform* (sword shaped), *hastate* (like an arrowhead, with barbs), *lanceolate* (like a spearhead), *oblanceolate* (a backwards spearhead), *palmate* (leaflets radiating like fingers), *pandurate* (violin shaped), *reniform* (kidney shaped), *runcinate* (saw-toothed), *spatulate* (spoonlike). That’s not, by any means, a complete list.

Steven Vogel, in his 2012 book *The Life of a Leaf*, enumerates many factors and forces that might have an influence on leaf shape. For example, leaves can’t be too heavy, or they would break the limbs that hold them aloft. On the other hand, they can’t be too delicate and wispy, or they’ll be torn to shreds by the wind. Leaves also must not generate too much aerodynamic drag, or the whole tree might topple in a storm.

Job One for a leaf is photosynthesis: gathering sunlight, bringing together molecules of carbon dioxide and water, synthesizing carbohydrates. Doing that efficiently puts further constraints on the design. As much as possible, the leaf should turn its face to the sun, maximizing the flux of photons absorbed. But temperature control is also important; the biosynthetic apparatus shuts down if the leaf is too hot or too cold.

Vogel points out that subtle features of leaf shape can have a measurable impact on thermal and aerodynamic performance. For example, convective cooling is most effective near the margins of a leaf; temperature rises with distance from the nearest edge. In environments where overheating is a risk, shapes that minimize this distance—such as the *lobate* forms of oak leaves—would seem to have an advantage over simpler, disklike shapes. But the choice between frilly and compact forms depends on other factors as well. Broad leaves with convex shapes intercept the most sunlight, but that may not always be a good thing. Leaves with a lacy design let dappled sunlight pass through, allowing multiple layers of leaves to share the work of photosynthesis.

Natural selection is a superb tool for negotiating a compromise among such interacting criteria. If there is some single combination of traits that works best for leaves growing in a particular habitat, I would expect evolution to find it. But I see no evidence of convergence on an optimal solution. On the contrary, even closely related species put out quite distinctive leaves.

Take a look at the three oak leaves in the upper-left quadrant of the image above. They are clearly variations on a theme. What the leaves have in common is a sequence of peninsular protrusions springing alternately to the left and the right of the center line. The variations on the theme have to do with the number of peninsulas (three to five per side in these specimens), their shape (rounded or pointy), and the depth of the coves between peninsulas. Those variations could be attributed to genetic differences at just a few loci. But *why* have the leaves acquired these different characteristics? What evolutionary force makes rounded lobes better for white oak trees and pointy ones better for red oak and pin oak?

Much has been learned about the developmental mechanisms that generate leaf shapes. Biochemically, the main actors are the plant hormones known as auxins; their spatial distribution and their transport through plant tissues regulate local growth rates and hence the pattern of development. (A 2014 review article by Jeremy Dkhar and Ashwani Pareek covers these aspects of leaf form in great detail.) On the mathematical and theoretical side, Adam Runions, Miltos Tsiantis, and Przemyslaw Prusinkiewicz have devised an algorithm that can generate a wide spectrum of leaf shapes with impressive verisimilitude. (Their 2017 paper, along with source code and videos, is at algorithmicbotany.org/papers/leaves2017.html.) With different parameter values the same program yields shapes that are recognizable as oaks, maples, sycamores, and so on. Again, however, all this work addresses questions of *how*, not *why*.

Another property of tree leaves—their size—*does* seem to respond in a simple way to evolutionary pressures. Across all land plants (not just trees), leaf area varies by a factor of a million—from about 1 square millimeter per leaf to 1 square meter. A 2017 paper by Ian J. Wright and colleagues reports that this variation is strongly correlated with climate. Warm, moist regions favor large leaves; think of the banana. Cold, dry environments, such as alpine ridges, host mainly tiny plants with even tinier leaves. So natural selection is alive and well in the realm of tree leaves; it just appears to have no clear preferences when it comes to shape.

Or am I missing something important? Elsewhere in nature we find flamboyant variations that seem gratuitous if you view them strictly in the grim context of survival-of-the-fittest. I’m thinking of the fancy-dress feathers of birds, for example. Cardinals and bluejays both frequent my back yard, but I don’t spend much time wondering whether red or blue is the optimal color for survival in that habitat. Nor do I expect the two species to converge on some shade of purple. Their gaudy plumes are not adaptations to the physical environment but elements of a communication system; they send signals to rivals or potential mates. Could something similar be going on with leaf shape? Do the various oak species maintain distinctive leaves to identity themselves to animals that help with pollination or seed dispersal? I rate this idea unlikely, but I don’t have a better one.

Surely this question is too easy! We know why trees grow tall. They reach for the sky. It’s their only hope of escaping the gloomy depths of the forest’s lower stories and getting a share of the sunshine. In other words, if you are a forest tree, you need to grow tall because your neighbors are tall; they overshadow you. And the neighbors grow tall because you’re tall. It’s is a classic arms race. Vogel has an acute commentary on this point:

In every lineage that has independently come up with treelike plants, a variety of species achieve great height. That appears to me to be the height of stupidity…. We’re looking at, almost surely, an object lesson in the limitations of evolutionary design….

A trunk limitation treaty would permit all individuals to produce more seeds and to start producing seeds at earlier ages. But evolution, stupid process that it is, hasn’t figured that out—foresight isn’t exactly its strong suit.

Vogel’s trash-talking of Darwinian evolution is meant playfully, of course. But I think the question of height-limitation treaties*tree*ties?) deserves more serious attention.

Forest trees in the eastern U.S. often grow to a height of 25 or 30 meters, approaching 100 feet. It takes a huge investment of material and energy to erect a structure that tall. To ensure sufficient strength and stiffness, the girth of the trunk must increase as the \(\frac{3}{2}\) power of the height, and so the cross-sectional area \((\pi r^2)\) grows as the cube of the height. It follows that doubling the height of a tree trunk multiplies its mass by a factor of 16.

Great height imposes another, ongoing, metabolic cost. Every day, a living tree must lift 500 liters of water—weighing 500 kilograms—from the root zone at ground level up to the leaves in the crown. It’s like carrying enough water to fill four or five bathtubs from the basement of a building to the 10th floor.

Height also exacerbates certain hazards to the life and health of the tree. A taller trunk forms a longer lever arm for any force that might tend to overturn the tree. Compounding the risk, average wind speed increases with distance above the ground.

Standing on the forest floor, I tilt my head back and stare dizzily upward toward the leafy crowns, perched atop great pillars of wood. I can’t help seeing these plants on stilts as a colossal waste of resources. It’s even sillier than the needlelike towers of apartments for billionaires that now punctuate the Manhattan skyline. In those buildings, all the floors are put to *some* use. In the forest, the tree trunks are denuded of leaves and sometimes of branches over 90 percent of their length; only the penthouses are occupied.

If the trees could somehow get together and negotiate a deal—a zoning ordinance or a building code—they would *all* benefit. Perhaps they could decree a maximum height of 10 meters. Nothing would change about the crowns of the trees; the rule would simply chop off the bottom 20 meters of the trunk.

If every tree would gain from the accord, why don’t we see such amputated forests evolving in nature? The usual response to this why-can’t-everybody-get-along question is that evolution just doesn’t work that way. Natural selection is commonly taken to be utterly selfish and individualist, even when it hurts. A tree reaching the 10-meter limit would say to itself: “Yes, this is good; I’m getting plenty of light without having to stand on tiptoe. But it could be even better. If I stretched my trunk another meter or two, I’d collect an even bigger share of solar energy.” Of course the other trees reason with themselves in exactly the same way, and so the futile arms race resumes. As Vogel said, foresight is not evolution’s strong suit.

I am willing to accept this dour view of evolution, but I am not at all sure it actually explains what we see in the forest. If evolution has no place for cooperative action in a situation like this one, how does it happen that all the trees do in fact stop growing at about the same height? Specifically, if an agreement to limit height to 10 meters would be spoiled by rampant cheating, why doesn’t the same thing happen at 30 meters?

One might conjecture that 30 meters is a physiological limit, that the trees would grow taller if they could, but some physical constraint prevents it. Perhaps they just can’t lift the water any higher. I would consider this a very promising hypothesis if it weren’t for the sequoias and the coast redwoods in the western U.S. Those trees have not heard about any such physical barriers. They routinely grow to 70 or 80 meters, and a few specimens have exceeded 100 meters. Thus the question for the East Coast trees is not just “Why are you so tall?” but also “Why aren’t you taller?”

I can think of at least one good reason for forest trees to grow to a uniform height. If a tree is shorter than average, it will suffer for being left in the shade. But standing head and shoulders above the crowd also has disadvantages: Such a standout tree is exposed to stronger winds, a heavier load of ice and snow, and perhaps higher odds of lightning strikes. Thus straying too far either below or above the mean height may be punished by lower reproductive success. But the big question remains: How do all the trees reach consensus on what height is best?

Another possibility: Perhaps the height of forest trees is not a result of an arms-race after all but instead is a response to predation. The trees are holding their leaves on high to keep them away from herbivores. I can’t say this is wrong, but it strikes me as unlikely. No giraffes roam the woods of North America (and if they did, 10 meters would be more than enough to put the leaves out of reach). Most of the animals that nibble on tree leaves are arthropods, which can either fly (adult insects) or crawl up the trunk (caterpillars and other larvae). Thus height cannot fully protect the leaves; at best it might provide a deterrent. Tree leaves are not a nutritious diet; perhaps some small herbivores consider them worth a climb of 10 meters, but not 30.

To a biologist, a tree is a woody plant of substantial height. To a mathematician, a tree is a graph without loops. It turns out that math-trees and bio-trees have some important properties in common.

The diagram below shows two mathematical graphs. They are collections of dots (known more formally as vertices), linked by line segments (called edges). A graph is said to be *connected* if you can travel from any vertex to any other vertex by following some sequence of edges. Both of the graphs shown here are connected. Trees form a subspecies of connected graphs. They are *minimally* connected: Between any two vertices there is exactly one *x, y, x, z* is not a path.*a* to *b*. The graph at right is not a tree. There are two routes from *a* to *b* (red and yellow lines).

Here’s another way to describe a math-tree. It’s a graph that obeys the antimatrimonial rule: What branching puts asunder, let no one join together again. Bio-trees generally work the same way: Two limbs that branch away from the trunk will not later return to the trunk or fuse with each other. In other words, there are no cycles, or closed loops. The pattern of radiating branches that never reconverge is evident in the highly regular structure of the bio-tree pictured below. (The tree is a Norfolk Island pine, native to the South Pacific, but this specimen was photographed on Sardinia.)

Trees have achieved great success without loops in their branches. Why would a plant ever want to have its structural elements growing in circles?

I can think of two reasons. The first is mechanical strength and stability. Engineers know the value of triangles (the smallest closed loops) in building rigid structures. Also arches, where two vertical elements that could not stand alone lean on each other. Trees can’t take advantage of these tricks; their limbs are cantilevers, supported only at the point of juncture with the trunk or the parent limb. Loopy structures would allow for various kinds of bracing and buttressing.

The second reason is reliability. Providing multiple channels from the roots to the leaves would improve the robustness of the tree’s circulatory system. An injury near the base of a limb would no longer doom all the structures beyond the point of damage.

Networks with multiple paths between nodes are exploited elsewhere in nature, and even in other aspects of the anatomy of trees. The reticulated channels in the image below are veins distributing fluids and nutrients within a leaf from a red oak tree. The very largest veins (or ribs) have a treelike arrangement, but the smaller channels form a nested hierarchy of loops within loops. (The pattern reminds me of a map of an ancient city.) Because of the many redundant pathways, an insect taking a chomp out of the middle of this network will not block communication with the rest of the leaf.

The absence of loops in the larger-scale structure of trunk and branches may be a natural consequence of the developmental program that guides the growth of a tree. Aristid Lindenmayer, a Hungarian-Dutch biologist, invented a family of formal languages (now called L-systems) for describing such growth. The languages are rewriting systems: You start with a single symbol (the *axiom*) and replace it with a string of symbols specified by the rules of a grammar. Then the string resulting from this substitution becomes a new input to the same rewriting process, with each of its symbols being replaced by another string formed according to the grammar rules. In the end, the symbols are interpreted as commands for constructing a geometric figure.

Here’s an L-system grammar for drawing cartoonish two-dimensional trees:

f ⟶ f [r f] [l f] l ⟶ l r ⟶ r

The symbols `f`

, `l`

, and `r`

are the basic elements of the language; when interpreted as drawing commands, they stand for *forward*, *left*, and *right*. The first rule of the grammar replaces any occurrence of `f`

with the string `f [l f] [r f]`

; the second and third rules change nothing, replacing `l`

and `r`

with themselves. Square brackets enclose a subprogram. On reaching a left bracket, the system makes note of its current position and orientation in the drawing. Then it executes the instructions inside the brackets, and finally on reaching the right bracket backtracks to the saved position and orientation.

Starting with the axiom `f`

, the grammar yields a succession of ever-more-elaborate command sequences:

Stage 0: f Stage 1: f [r f] [l f] Stage 2: f [r f] [l f] [r f [r f] [l f]] [l f [r f] [l f]]]

When this rewriting process is continued for a few further stages and then converted to graphic output, we see a sapling growing into a young tree, with a shape reminiscent of an elm.*forward* step is reduced by a factor of 0.6. And all turns, both *left* and *right*, are through an angle of 20 degrees.

L-systems like this one can produce a rich variety of branching structures. More elaborate versions of the same program can create realistic images of biological trees. (The Algorithmic Botany website at the University of Calgary has an abundance of examples.) What the L-systems *can’t* do is create closed loops. That would require a fundamentally different kind of grammar, such as a transformation rule that takes two symbols or strings as input and produces a conjoined result. (Note that in the stage 5 diagram above, two branches of the tree appear to overlap, but they are not joined. The graph has no vertex at the intersection point.)

If the biochemical mechanisms governing the growth and development of trees operate with the same constraints as L-systems, we have a tidy explanation for the absence of loops in the branching of bio-trees. But perhaps the explanation is a little too tidy. I’ve been saying that trees don’t do loops, and it’s generally true. But what about the tree pictured below—a crepe myrtle I photographed some years ago on a street in Raleigh, North Carolina? (It reminds me of a sinewy Henry Moore sculpture.)

This plant is a tree in the botanical sense, but it’s certainly not a mathematical tree. A single trunk comes out of the ground and immediately divides. At waist height there are four branches, then three of them recombine. At chest height, there’s another split and yet another merger. This rogue tree is flouting all the canons and customs of treedom.

And the crepe myrtle is not the only offender. Banyan trees, native to India, have their horizontal branches propped up by numerous outrigger supports that drop to the ground. The banyan shown below, in Hilo, Hawaii, has a hollowed-out cavity where the trunk ought to be, surrounded by dozens or hundreds of supporting shoots, with cross-braces overhead. The L-system described above could never create such a network. But if the banyan can do this, why don’t other trees adopt the same trick?

In biology, the question “Why *x*?” is shorthand for “What is the evolutionary advantage of *x*?” or “How does *x* contribute to the survival and reproductive success of the organism?” Answering such questions often calls for a leap of imagination. We look at the mottled brown moth clinging to tree bark and propose that its coloration is camouflage, concealing the insect from predators. We look at a showy butterfly and conclude that its costume is aposematic—a warning that says, “I am toxic; you’ll be sorry if you eat me.”

These explanations risk turning into just-so stories,*les contes des pourquoi*.

And if we have a hard time imagining the experiences of animals, the lives of plants are even further beyond our ken. Does the flower lust for the pollen-laden bee? Does the oak tree grieve when its acorns are eaten by squirrels? How do trees feel about woodpeckers? Confronted with these questions, I can only shrug. I have no idea what plants desire or dread.

Others claim to know much more about vegetable sensibilities. Peter Wohlleben, a German forester, has published a book titled *The Hidden Life of Trees: What They Feel, How They Communicate*. He reports that trees suckle their young, maintain friendships with their neighbors, and protect sick or wounded members of their community. To the extent these ideas have a scientific basis, they draw heavily on work done in the laboratory of Suzanne Simard at the University of British Columbia. Simard, leader of the Mother Tree project, studies communication networks formed by tree roots and their associated soil fungi.

I find Simard’s work interesting. I find the anthropomorphic rhetoric unhelpful and offensive. The aim, I gather, is to make us care more about trees and forests by suggesting they are a lot like us; they have families and communities, friendships, alliances. In my view that’s exactly wrong. What’s most intriguing about trees is that they are aliens among us, living beings whose long, immobile, mute lives bear no resemblance to our own frenetic toing-and-froing. Trees are deeply mysterious all on their own, without any overlay of humanizing sentiment.

Dkhar, Jeremy, and Ashwani Pareek. 2014. What determines a leaf’s shape? *EvoDevo* 5:47.

McMahon, Thomas A. 1975. The mechanical design of trees. *Scientific American* 233(1):93–102.

Osnas, Jeanne L. D., Jeremy W. Lichstein, Peter B. Reich, and Stephen W. Pacala. 2013. Global leaf trait relationships: mass, area, and the leaf economics spectrum. *Science* 340:741–744.

Prusinkiewicz, Przemyslaw, and Aristid Lindenmayer, with James S. Hanan, F. David Fracchia, Deborah Fowler, Martin J. M. de Boer, and Lynn Mercer. 1990. *The Algorithmic Beauty of Plants*. New York: Springer-Verlag. PDF edition available at http://algorithmicbotany.org/papers/.

Runions, Adam, Martin Fuhrer, Brendan Lane, Pavol Federl, Anne-Gaëlle Rolland-Lagan, and Przemyslaw Prusinkiewicz. 2005. Modeling and visualization of leaf venation patterns. *ACM Transactions on Graphics* 24(3):702?711.

Runions, Adam, Miltos Tsiantis, and Przemyslaw Prusinkiewicz. 2017. A common developmental program can produce diverse leaf shapes. *New Phytologist* 216:401–418. Preprint and source code.

Tadrist, Loïc, and Baptiste Darbois Texier. 2016. Are leaves optimally designed for self-support? An investigation on giant monocots. arXiv:1602.03353.

Vogel, Steven. 2012. *The Life of a Leaf*. University of Chicago Press.

Wright, Ian J., Ning Dong, Vincent Maire, I. Colin Prentice, Mark Westoby, Sandra Díaz, Rachael V. Gallagher, Bonnie F. Jacobs, Robert Kooyman, Elizabeth A. Law, Michelle R. Leishman, Ülo Niinemets, Peter B. Reich, Lawren Sack, Rafael Villar, Han Wang, and Peter Wilf. 2017. Global climatic drivers of leaf size. *Science* 357:917–921.

Yamazaki, Kazuo. 2011. Gone with the wind: trembling leaves may deter herbivory. *Biological Journal of the Linnean Society* 104:738–747.

Young, David A. 2010 preprint. Growth-algorithm model of leaf shape. arXiv:1004.4388.

]]>When I follow Frost’s trail, it leads me into an unremarkable patch of Northeastern woodland, wedged between highways and houses and the town dump. It’s nowhere dark and deep enough to escape the sense of human proximity. This is not the forest primeval. Still, it is woodsy enough to bring to mind not only the rhymes of overpopular poets but also some tricky questions about trees and forests—questions I’ve been poking at for years, and that keep poking back. Why are trees so tall? Why aren’t they taller? Why do their leaves come in so many different shapes and sizes? Why are the trees trees (in the graph theoretical sense of that word) rather than some other kind of structure? And then there’s the question I want to discuss today:

Taking a quick census along the Frost trail, I catalog hemlock, sugar maple, at least three kinds of oak (red, white, and pin), beech and birch, shagbark hickory, white pine, and two other trees I can’t identify with certainty, even with the help of a Peterson guidebook and iNaturalist. The stand of woods closest to my home is dominated by hemlock, but on hillsides a few miles down the trail, broadleaf species are more common. The photograph below shows a saddle point (known locally as the Notch) between two peaks of the Holyoke Range, south of Amherst. I took the picture on October 15 last year—in a season when fall colors make it easier to detect the species diversity.

Forests like this one cover much of the eastern half of the United States. The assortment of trees varies with latitude and altitude, but at any one place the forest canopy is likely to include eight or ten species. A few isolated sites are even richer; certain valleys in the southern Appalachians, known as cove forests, have as many as 25 canopy species. And tropical rain forests are populated by 100 or even 200 tall tree species.

From the standpoint of ecological theory, all this diversity is puzzling. You’d think that in any given environment, one species would be slightly better adapted and would therefore outcompete all the others, coming to dominate the landscape. _{2}, water, various mineral nutrients—so the persistence of mixed-species woodlands begs for explanation.

Here’s a little demo of competitive exclusion. Two tree species—let’s call them olive and orange—share the same patch of forest, a square plot that holds 625 trees.

Initially, each site is randomly assigned a tree of one species or the other. When you click the *Start* button (or just tap on the array of trees), you launch a cycle of death and renewal. At each time step, one tree is chosen—entirely at random and without regard to species—to get the axe. Then another tree is chosen as the parent of the replacement, thereby determining its species. This latter choice is not purely random, however; there’s a bias. One of the species is better adapted to its environment, exploiting the available resources more efficiently, and so it has an elevated chance of reproducing and putting its offspring into the vacant site. In the control panel below the array of trees is a slider labeled “fitness bias”; nudging it left favors the orange species, right the olives.

The outcome of this experiment should not come as a surprise. The two species are playing a zero-sum game: Whatever territory olive wins, orange must lose, and vice versa. One site at a time, the fitter species conquers all. If the advantage is very slight, the process may take a while, but in the end the less-efficient organism is always banished. (What if the two species are exactly equal? I’ll return to that question in a moment, but for now let’s just pretend it never happens. And I have deviously jiggered the simulation so that you can’t set the bias to zero.)

Competitive exclusion does not forbid *all* cohabitation. Suppose olive and orange rely on two mineral nutrients in the soil—say, iron and calcium. Assume both of these elements are in short supply, and their availability is what limits growth in the populations of the trees. If olive trees are better at taking up iron and oranges assimilate calcium more effectively, then the two species may be able to reach an accommodation where both survive.

In this model, neither species is driven to extinction. At the default setting of the slider control, where iron and calcium are equally abundant in the environment, olive and orange trees also maintain roughly equal numbers on average. Random fluctuations carry them away from this balance point, but not very far or for very long. The populations are stabilized by a negative feedback loop. If a random perturbation increases the proportion of olive trees, each one of those trees gets a smaller share of the available iron, thereby reducing the species’ potential for further population growth. The orange trees are less affected by an iron deficiency, and so their population rebounds. But if the oranges then overshoot, they will be restrained by overuse of the limited calcium supply.

Moving the slider to the left or right alters the balance of iron and calcium in the environment. A 60:40 proportion favoring iron will shift the equilibrium between the two tree species, allowing the olives to occupy more of the territory. But, as long as the resource ratio is not too extreme, the minority species is in no danger of extinction. The two kinds of trees have a live-and-let-live arrangement.

In the idiom of ecology, the olive and orange species escape the rule of competitive exclusion because they occupy distinct niches, or roles in the ecosystem. They are specialists, preferentially exploiting different resources. The niches do not have to be completely disjoint. In the simulation above they overlap somewhat: The olives need calcium as well as iron, but only 25 percent as much; the oranges have mirror-image requirements.

Will this loophole in the law of competitive exclusion admit more than two species? Yes: *N* competing species can coexist if there are at least *N* independent resources or environmental strictures limiting their growth, and if each species has a different limiting factor. Everybody must have a specialty. It’s like a youth soccer league where every player gets a trophy for some unique, distinguishing talent.

This notion of slicing and dicing an ecosystem into multiple niches is a well-established practice among biologists. It’s how Darwin explained the diversity of finches on the Galapagos islands, where a dozen species distinguish themselves by habitat (ground, shrubs, trees) or diet (insects, seeds and nuts of various sizes). Forest trees might be organized in a similar way, with a number of microenvironments that suit different species. The process of creating such a diverse community is known as niche assembly.

Some niche differentiation is clearly present among forest trees. For example, gums and willows prefer wetter soil. In my local woods, however, I can’t detect any systematic differences in the sites colonized by maples, oaks, hickories and other trees. They are often next-door neighbors, on plots of land with the same slope and elevation, and growing in soil that looks the same to me. Maybe I’m just not attuned to what tickles a tree’s fancy.

Niche assembly is particularly daunting in the tropics, where it requires a hundred or more distinct limiting resources. Each tree species presides over its own little monopoly, claiming first dibs on some environmental factor no one else really cares about. Meanwhile, all the trees are fiercely competing for the most important resources, namely sunlight and water. Every tree is striving to reach an opening in the canopy with a clear view of the sky, where it can spread its leaves and soak up photons all day long. Given the existential importance of winning this contest for light, it seems odd to attribute the distinctive diversity of forest communities to squabbling over other, lesser resources.

Where niche assembly makes every species the winner of its own little race, another theory dispenses with all competition, suggesting the trees are not even trying to outrun their peers. They are just milling about at random. According to this concept, called neutral ecological drift, all the trees are equally well adapted to their environment, and the set of species appearing at any particular place and time is a matter of chance. A site might currently be occupied by an oak, but a maple or a birch would thrive there just as well. Natural selection has nothing to select. When a tree dies and another grows in its place, nature is indifferent to the species of the replacement.

This idea brings us back to a question I sidestepped above: What happens when two competing species are exactly equal in fitness? The answer is the same whether there are two species or ten, so for the sake of visual variety let’s look at a larger community.

If you have run the simulation—and if you’ve been patient enough to wait for it to finish—you are now looking at a monochromatic array of trees. I can’t know what the single color on your screen might be—or in other words which species has taken over the entire forest patch—but I know there’s just one species left. The other nine are extinct. In this case the outcome might be considered at least a little surprising. Earlier we learned that if a species has even a slight advantage over its neighbors, it will take over the entire system. Now we see that no advantage is needed. Even when all the players are exactly equal, one of them will emerge as king of the mountain, and everyone else will be exterminated. Harsh, no?

Here’s a record of one run of the program, showing the abundance of each species as a function of time:

At the outset, all 10 species are present in roughly equal numbers, clustered close to the average abundance of \(625/10\). As the program starts up, the grid seethes with activity as the sites change color rapidly and repeatedly. Within the first 70,000 times steps, however, all but three species have disappeared. The three survivors trade the lead several times, as waves of contrasting colors wash over the array. Then, after about 250,000 steps, the species represented by the bright green line drops to zero population—extinction. The final one-on-one stage of the contest is highly uneven—the orange species is close to total dominance and the crimson one is bumping along near extinction—but nonetheless the tug of war lasts another 100,000 steps. (Once the system reaches a monospecies state, nothing more can ever change, and so the program halts.)

This lopsided result is not to be explained by any sneaky bias hidden in the algorithm. At all times and for all species, the probability of gaining a member is exactly equal to the probability of losing a member. It’s worth pausing to verify this fact. Suppose species \(X\) has population \(x\), which must lie in the range \(0 \le x \le 625\). A tree chosen at random will be of species \(X\) with probability \(x/625\); therefore the probability that the tree comes from some other species must be \((625 - x)/625\). \(X\) gains one member if it is the replacement species but not the victim species, an event with a combined probability of \(x(625 - x)/625\). \(X\) loses one member if it is the victim but not the replacement, which has the same probability.

It’s a fair game. No loaded dice. Nevertheless, somebody wins the jackpot, and the rest of the players lose everything, every time.

The spontaneous decay of species diversity in this simulated patch of forest is caused entirely by random fluctuations. Think of the population \(x\) as a random walker wandering along a line segment with \(0\) at one end and \(625\) at the other. At each time step the walker moves one unit right \((+1)\) or left \((-1)\) with equal probability; on reaching either end of the segment, the game ends. The most fundamental fact about such a walk is that it *does* always end. A walk that meanders forever between the two boundaries is not impossible, but it has probability \(0\); hitting one wall or the other has probability \(1\).

How long should you expect such a random walk to last? In the simplest case, with a single walker, the expected number of steps starting at position \(x\) is \(x(625 - x)\). This expression has a maximum when the walk starts in the middle of the line segment; the maximum length is just under \(100{,}000\) steps. In the forest simulation with ten species the situation is more complicated because the multiple walks are correlated, or rather anti-correlated: When one walker steps to the right, another must go left. Computational experiments suggest that the median time needed for ten species to be whittled down to one is in the neighborhood of \(320{,}000\) steps.

From these computational models it’s hard to see how neutral ecological drift could be the savior of forest diversity. On the contrary, it seems to guarantee that we’ll wind up with a monoculture, where one species has wiped out all others. But this is not the end of the story.

One issue to keep in mind is the timescale of the process. In the simulation, time is measured by counting cycles of death and replacement among forest trees. I’m not sure how to convert that into calendar years, but I’d guess that 320,000 death-and-replacement events in a tract of 625 trees might take 50,000 years or more. Here in New England, that’s a very long time in the life of a forest. This entire landscape was scraped clean by the Laurentide ice sheet just 20,000 years ago. If the local woodlands are losing species to random drift, they would not yet have had time to reach the end game.

The trouble is, this thesis implies that forests start out diverse and evolve toward a monoculture, which is not supported by observation. If anything, diversity seems to *increase* with time. The cove forests of Tennessee, which are much older than the New England woods, have more species, not fewer. And the hyperdiverse ecosystem of the tropical rain forests is thought to be millions of years old.

Despite these conceptual impediments, a number of ecologists have argued strenuously for neutral ecological drift, most notably Stephen P. Hubbell in a 2001 book, *The Unified Neutral Theory of Biodiversity and Biogeography*. The key to Hubbell’s defense of the idea (as I understand it) is that 625 trees do not make a forest, and certainly not a planet-girdling ecosystem.

Hubbell’s theory of neutral drift was inspired by earlier studies of the biogeography of islands, in particular the collaborative work of Robert H. MacArthur and Edward O. Wilson in the 1960s. Suppose our little plot of \(625\) trees is growing on an island at some distance from a continent. For the most part, the island evolves in isolation, but every now and then a bird carries a seed from the much larger forest on the mainland. We can simulate these rare events by adding a facility for immigration to the neutral-drift model. In the panel below, the slider controls the immigration rate. At the default setting of \(1/100\), every \(100\)th replacement tree comes not from the local forest but from a stable reserve where all \(10\) species have an equal probability of being selected.

For the first few thousand cycles, the evolution of the forest looks much like it does in the pure-drift model. There’s a brief period of complete tutti-frutti chaos, then waves of color erupt over the forest as it blushes pink, then deepens to crimson, or fades to a sickly green. What’s different is that none of those expanding species ever succeeds in conquering the entire array. As shown in the timeline graph below, they never grow much beyond 50 percent of the total population before they retreat into the scrum of other species. Later, another tree color makes a bid for empire but meets the same fate. (Because there is no clear endpoint to this process, the simulation is designed to halt after 500,000 cycles. If you haven’t seen enough by then, click *Resume*.)

Immigration, even at a low level, brings a qualitative change to the behavior of the model and the fate of the forest. The big difference is that we can no longer say extinction is forever. A species may well disappear from the 625-tree plot, but eventually it will be reimported from the permanent reserve. Thus the question is not whether a species is living or extinct but whether it is present or absent at a given moment. At an immigration rate of \(1/100\), the average number of species present is about \(9.6\), so none of them disappear for long.

With a higher level of immigration, the 10 species remain thoroughly mixed, and none of them can ever make any progress toward world domination. On the other hand, they have little risk of disappearing, even temporarily. Push the slider control all the way to the left, setting the immigration rate at \(1/10\), and the forest display becomes an array of randomly blinking lights. In the timeline graph below, there’s not a single extinction.

Pushing the slider in the other direction, rarer immigration events allow the species distribution to stray much further from equal abundance. In the trace below, with an immigrant arriving every \(1{,}000\)th cycle, the population is dominated by one or two species for most of the time; other species are often on the brink of extinction—or *over* the brink—but they come back eventually. The average number of living species is about 4.3, and there are moments when only two are present.

Finally, with a rate of \(1/10{,}000\), the effect of immigration is barely noticeable. As in the model without immigration, one species invades all the terrain; in the example recorded below, this takes about \(400{,}000\) steps. After that, occasional immigration events cause a small blip in the curve, but it will be a very long time before another species is able to displace the incumbent.

The island setting of this model makes it easy to appreciate how sporadic, weak connections between communities can have an outsize influence on their development. But islands are not essential to the argument. Trees, being famously immobile, have only occasional long-distance communication, even when there’s no body of water to separate them. (It’s a rare event when Birnam Wood marches off to Dunsinane.) Hubbell formulates a model of ecological drift in which many small patches of forest are organized into a hierarchical metacommunity. Each patch is both an island and part of the larger reservoir of species diversity. If you choose the right patch sizes and the right rates of migration between them, you can maintain multiple species at equilibrium. Hubbell also allows for the emergence of entirely new species, which is also taken to be a random or selection-neutral process.

Niche assembly and neutral ecological drift are theories that elicit mirror-image questions from skeptics. With niche assembly we look at dozens or hundreds of coexisting tree species and ask, “Can every one of them have a unique limiting resource?” With neutral drift we ask, “Can all of those species be exactly equal in fitness?”

Hubbell responds to the latter question by turning it upside down. The very fact that we observe coexistence implies equality:

All species that manage to persist in a community for long periods with other species must exhibit net long-term population growth rates of nearly zero…. If this were not the case, i.e., if some species should manage to achieve a positive growth rate for a considerable length of time, then from our first principle of the biotic saturation of landscapes, it must eventually drive other species from the community. But if all species have the same net population growth rate of zero on local to regional scales, then ipso facto they must have identical or nearly identical per capita relative fitnesses.

Herbert Spencer proclaimed: Survival of the fittest. Here we have a corollary: If they’re all survivors, they must all be equally fit.

Now for something completely different.

Another theory of forest diversity was devised specifically to address the most challenging case—the extravagant variety of trees in tropical ecosystems. In the early 1970s J. H. Connell and Daniel H. Janzen, field biologists working independently in distant parts of the world, almost simultaneously came up with the same idea.

A tropical rain forest is a tough neighborhood. Trees are under frequent attack by marauding gangs of predators, parasites, and pathogens. (Connell lumped these bad guys together under the label “enemies.”) Many of the enemies are specialists, targeting only trees of a single species. The specialization can be explained by competitive exclusion: Each tree species becomes a unique resource supporting one type of enemy.

Suppose a tree is beset by a dense population of host-specific enemies. The swarm of meanies attacks not only the adult tree but also any offspring of the host that have taken root near their parent. Since young trees are more vulnerable than adults, the entire cohort could be wiped out. Seedlings at a greater distance from the parent should have a better chance of remaining undiscovered until they have grown large and robust enough to resist attack. In other words, evolution might favor the rare apple that falls far from the tree. Janzen illustrated this idea with a graphical model something like the one at right. As distance from the parent increases, the probability that a seed will arrive and take root grows smaller *(red curve)*, but the probability that any such seedling will survive to maturity goes up *(blue curve)*. The overall probability of successful reproduction is the product of these two factors *(purple curve)*; it has a peak where the red and blue curves cross.

The Connell-Janzen theory predicts that trees of the same species will be widely dispersed in the forest, leaving plenty of room in between for trees of other species, which will have a similarly scattered distribution. The process leads to anti-clustering: conspecific trees are farther apart on average than they would be in a completely random arrangement. This pattern was noted by Alfred Russel Wallace in 1878, based on his own long experience in the tropics:

If the traveller notices a particular species and wishes to find more like it, he may often turn his eyes in vain in every direction. Trees of varied forms, dimensions, and colours are around him, but he rarely sees any one of them repeated. Time after time he goes towards a tree which looks like the one he seeks, but a closer examination proves it to be distinct. He may at length, perhaps, meet with a second specimen half a mile off, or may fail altogether, till on another occasion he stumbles on one by accident.

My toy model of the social-distancing process implements a simple rule. When a tree dies, it cannot be replaced by another tree of the same species, nor may the replacement match the species of any of the eight nearest neighbors surrounding the vacant site. Thus trees of the same species must have at least one other tree between them. To say the same thing in another way, each tree has an exclusion zone around it, where other trees of the same species cannot grow.

It turns out that social distancing is a remarkably effective way of preserving diversity. When you click *Start*, the model comes to life with frenetic activity, blinking away like the front panel of a 1950s Hollywood computer. Then it just keeps blinking; nothing else ever really happens. There are no spreading tides of color as a successful species gains ground, and there are no extinctions. The variance in population size is even lower than it would be with a completely random and uniform assignment of species to sites. This stability is apparent in the timeline graph below, where the 10 species tightly hug the mean abundance of 62.5:

When I finished writing this program and pressed the button for the first time, long-term survival of all ten species was not what I expected to see. My thoughts were influenced by some pencil-and-paper doodling. I had confirmed that only four colors are needed to create a pattern where no two trees of the same color are adjacent horizontally, vertically, or on either of the diagonals. One such pattern is shown at right. I suspected that the social-distancing protocol might cause the model to condense into such a crystalline state, with the loss of species that don’t appear in the repeated motif. I was wrong. Although four is indeed the minimum number of colors for a socially distanced two-dimensional lattice, there is nothing in the algorithm that encourages the system to seek the minimum.

After seeing the program in action, I was able to figure out what keeps all the species alive. There’s an active feedback process that puts a premium on rarity. Suppose that oaks currently have the lowest frequency in the population at large. As a result, oaks are least likely to be present in the exclusion zone surrounding any vacancy in the forest, which means in turn they are most likely to be acceptable as a replacement. As long as the oaks remain rarer than the average, their population will tend to grow. Symmetrically, individuals of an overabundant species will have a harder time finding an open site for their offspring. All departures from the mean population level are self-correcting.

The initial configuration in this model is completely random, ignoring the restrictions on adjacent conspecifics. Typically there are about 200 violations of the exclusion zone in the starting pattern, but they are all eliminated in the first few thousand time steps. Thereafter the rules are obeyed consistently. Note that with ten species and an exclusion zone consisting of nine sites, there is always at least one species available to fill a vacancy. If you try the experiment with nine or fewer species, some vacancies must be left as gaps in the forest. I should also mention that the model uses toroidal boundary conditions: the right edge of the grid is adjacent to the left edge, and the top wraps around to the bottom. This ensures that all sites in the lattice have exactly eight neighbors.

Connell and Janzen envisioned much larger exclusion zones, and correspondingly larger rosters of species. Implementing such a model calls for a much larger computation. A recent paper by Taal Levi *et al.* reports on such a simulation. They find that the number of surviving species and their spatial distribution remain reasonably stable over long periods (200 billion tree replacements).

Could the Connell-Janzen mechanism also work in temperate-zone forests? As in the tropics, the trees of higher latitudes do have specialized enemies, some of them notorious—the vectors of Dutch elm disease and chestnut blight, the emerald ash borer, the gypsy moth caterpillars that defoliate oaks. The hemlocks in my neighborhood are under heavy attack by the woolly adelgid, a sap-sucking bug. Thus the forces driving diversification and anti-clustering in the Connell-Janzen model would seem to be present here. However, the observed spatial structure of the northern forests is somewhat different. Social distancing hasn’t caught on here. The distribution of trees tends to be a little clumpy, with conspecifics gathering in small groves.

Plague-driven diversification is an intriguing idea, but, like the other theories mentioned above, it has certain plausibility challenges. In the case of niche assembly, we need to find a unique limiting resource for every species. In neutral drift, we have to ensure that selection really is neutral, assigning exactly equal fitness to trees that look quite different. In the Connell-Janzen model we need a specialized pest for every species, one that’s powerful enough to suppress all nearby seedlings. Can it be true that *every* tree has its own deadly nemesis?

*Invade* more than once, since a new arrival may die out before becoming established. Also note that I have slowed down this simulation, lest it all be over in a flash.*Invade* button.

Lacking enemies, the invader can flout the social-distancing rules, occupying any forest vacancy regardless of neighborhood. Once the invader has taken over a majority of the sites, the distancing rules become less onerous, but by then it’s too late for the other species.

One further half-serious thought on the Connell-Janzen theory: In the war between trees and their enemies, humanity has clearly chosen sides. We would wipe out those insects and fungi and other tree-killing pests if we could figure out how to do so. Everyone would like to bring back the elms and the chestnuts, and save the eastern hemlocks before it’s too late. On this point I’m as sentimental as the next treehugger. But if Connell and Janzen are correct, and if their theory applies to temperate-zone forests, eliminating all the enemies would actually cause a devastating collapse of tree diversity. Without pest pressure, competitive exclusion would be unleashed, and we’d be left with one-tree forests everywhere we look.

Species diversity in the forest is now matched by theory diversity in the biology department. The three ideas I have discussed here—niche assembly, neutral drift, and social distancing—all seem to be coexisting in the minds of ecologists. And why not? Each theory is a success in the basic sense that it can overcome competitive exclusion. Each theory also makes distinctive predictions. With niche assembly, every species must have a unique limiting resource. Neutral drift generates unusual population dynamics, with species continually coming and going, although the overall number of species remains stable. Social distancing entails spatial anticlustering.

How can we choose a winner among these theories (and perhaps others)? Scientific tradition says nature should have the last word. We need to conduct some experiments, or at least go out in the field and make some systematic observations, then compare those results with the theoretical predictions.

There have been quite a few experimental tests of competitive exclusion. For example, Thomas Park and his colleagues ran a decade-long series of experiments with two closely related species of flour beetles. One species or the other always prevailed. In 1969 Francisco Ayala reported on a similar experiment with fruit flies, in which he observed coexistence under circumstances that were thought to forbid it. Controversy flared, but in the end the result was not to overturn the theory but to refine the mathematical description of where exclusion applies.

Wouldn’t it be grand to perform such experiments with trees? Unfortunately, they are not so easily grown in glass vials. And conducting multigenerational studies of organisms that live longer than we do is a tough assignment. With flour beetles, Park had time to observe more than 100 generations in a decade. With trees, the equivalent experiment might take 10,000 years. But field workers in biology are a resourceful bunch, and I’m sure they’ll find a way. In the meantime, I want to say a few more words about theoretical, mathematical, and computational approaches to the problem.

Ecology became a seriously mathematical discipline in the 1920s, with the work of Alfred J. Lotka and Vito Volterra. To explain their methods and ideas, one might begin with the familiar fact that organisms reproduce themselves, thereby causing populations to grow. Mathematized, this observation becomes the differential equation

\[\frac{d x}{d t} = \alpha x,\]

which says that the instantaneous rate of change in the population \(x\) is proportional to \(x\) itself—the more there are, the more there will be. The constant of proportionality \(\alpha\) is called the intrinsic reproduction rate; it is the rate observed when nothing constrains or interferes with population growth. The equation has a solution, giving \(x\) as a function of \(t\):

\[x(t) = x_0 e^{\alpha t},\]

where \(x_0\) is the initial population. This is a recipe for unbounded exponential growth (assuming that \(\alpha\) is positive). In a finite world such growth can’t go on forever, but that needn’t worry us here.

\[\begin{align}

\frac{d x}{d t} &= \alpha x -\gamma x y\\

\frac{d y}{d t} &= -\beta y + \delta x y

\end{align}\]

The prey species \(x\) prospers when left to itself, but suffers as the product \(x y\) increases. The situation is just the opposite for the predator \(y\), which can’t get along alone (\(x\) is its only food source) and whose population swells when \(x\) and \(y\) are both abundant.

Competition is a more symmetrical relation: Either species can thrive when alone, and the interaction between them is negative for both parties.

\[\begin{align}

\frac{d x}{d t} &= \alpha x -\gamma x y\\

\frac{d y}{d t} &= \beta y - \delta x y

\end{align}\]

The Lotka-Volterra equations yield some interesting behavior. At any instant \(t\), the state of the two-species system can be represented as a point in the \(x, y\) plane, whose coordinates are the two population levels. For some combinations of the \(\alpha, \beta, \gamma, \delta\) parameters, there’s a point of stable equilibrium. Once the system has reached this point, it stays put, and it returns to the same neighborhood following any small perturbation. Other equilibria are unstable: The slightest departure from the balance point causes a major shift in population levels. And the *really* interesting cases have no stationary point; instead, the state of the system traces out a closed loop in the \(x, y\) plane, continually repeating a cycle of states. The cycles correspond to oscillations in the two population levels. Such oscillations have been observed in many predator-prey systems. Indeed, it was curiosity about the periodic swelling and contraction of populations in the Canadian fur trade and Adriatic fisheries that inspired Lotka and Volterra to work on the problem.

The 1960s and 70s brought more surprises. Studies of equations very similar to the Lotka-Volterra system revealed the phenomenon of “deterministic chaos,” where the point representing the state of the system follows an extremely complex trajectory, though it’s wandering are not random. There ensued a lively debate over complexity and stability in ecosystems. Is chaos to be found in natural populations? Is a community with many species and many links between them more or less stable than a simpler one?

Viewed as abstract mathematics, there’s much beauty in these equations, but it’s sometimes a stretch mapping the math back to the biology. For example, when the Lotka-Volterra equations are applied to species competing for resources, the resources appear nowhere in the model. The mathematical structure describes something more like a predator-predator interaction—two species that eat each other.

Even the organisms themselves are only a ghostly presence in these models. The differential equations are defined over the continuum of real numbers, giving us population levels or densities, but not individual plants or animals—discrete things that we can count with integers. The choice of number type is not of pressing importance as long as the populations are large, but it leads to some weirdness when a population falls to, say, 0.001—a millitree. Using finite-difference equations instead of differential equations avoids this problem, but the mathematics gets messier.

Another issue is that the equations are rigidly deterministic. Given the same inputs, you’ll always get exactly the same outputs—even in a chaotic model. Determinism rules out modeling anything like neutral ecological drift. Again, there’s a remedy: stochastic differential equations, which include a source of noise or uncertainty. With models of this kind, the answers produced are not numbers but probability distributions. You don’t learn the population of \(x\) at time \(t\); you get a probability \(P(x, t)\) in a distribution with a certain mean and variance. Another approach, called Markov Chain Monte Carlo (MCMC), uses a source of randomness to sample from such distributions. But the MCMC method moves us into the realm of computational models rather than mathematical ones.

Computational methods generally allow a direct mapping between the elements of the model and the things being modeled. You can open the lid and look inside to find the trees and the resources, the births and the deaths. These computational objects are not quite tangible, but they’re discrete, and always finite. A population is neither a number nor a probability distribution but a collection of individuals. I find models of this kind intellectually less demanding. Writing a differential equation that captures the dynamics of a biological system requires insight and intuition. Writing a program to implement a few basic events in the life of a forest—a tree dies, another takes its place—is far easier.

The six little models included in this essay serve mainly as visualizations; they expend most of their computational energy painting colored dots on the screen. But larger, more ambitious models are certainly feasible, as in the work of Taal Levi et al. mentioned above.

However, if computational models are easier to create, they can also be harder to interpret. If you run a model once and species \(X\) goes extinct, what can you conclude? Not much. On the next run \(X\) and \(Y\) might coexist. To make reliable inferences, you need to do some statistics over a large ensemble of runs—so once again the answer takes the form of a probability distribution.

The concreteness and explicitness of Monte Carlo models is generally a virtue, but it has a darker flip side. Where a differential equation model might apply to any “large” population, that vague description won’t work in a computational context. You have to name a number, even though the choice is arbitrary. The size of my forest models, 625 trees, was chosen for mere convenience. With a larger grid, say \(100 \times 100\), you’d have to wait millions of time steps to see anything interesting happen. Of course the same issue arises with experiments in the lab or in the field.

Both kinds of model are always open to a charge of oversimplifying. A model is the Marie Kondo version of nature—relentlessly decluttered and tidied up. Sometime important parts get tossed out. In the case of the forest models, it troubles me that trees have no life history. One dies, and another pops up full grown. Also missing from the models are pollination and seed dispersal, and rare events such a hurricanes and fires that can reshape entire forests. Would we learn more if all those aspects of life in the woods had a place in the equations or the algorithms? Perhaps, but where do you stop?

My introduction to models in ecology came through a book of that title by John Maynard Smith, published in 1974. I recently reread it, learning more than I did the first time through. Maynard Smith makes a distinction between simulations, useful for answering questions about specific problems or situations, and models, useful for testing theories. He offers this advice: “Whereas a good simulation should include as much detail as possible, a good model should include as little as possible.”

Ayala, F. J. 1969. Experimental invalidation of the principle of competitive exclusion. *Nature* 224:1076–1079.

Clark, James S. 2010. Individuals and the variation needed for high species diversity in forest trees. *Science* 327:1129–1132.

Connell, J. H. 1971. On the role of natural enemies in preventing competitive exclusion in some marine animals and in rain forest trees. In *Dynamics of Populations*, P. J. Den Boer and G. Gradwell, eds., Wageningen, pp. 298–312.

Gilpin, Michael E., and Keith E. Justice. 1972. Reinterpretation of the invalidation of the principle of competitive exclusion. *Nature* 236:273–301.

Hardin, Garrett. 1960. The competitive exclusion principle. *Science* 131(3409): 1292–1297.

Hubbell, Stephen P. 2001. *The Unified Neutral Theory of Biodiversity and Biogeography*. Princeton, NJ: Princeton University Press.

Hutchinson, G. E. 1959. Homage to Santa Rosalia, or why are there so many kinds of animals? *The American Naturalist* 93:145–159.

Janzen, Daniel H. 1970. Herbivores and the number of tree species in tropical forests. *The American Naturalist* 104(940):501–528.

Kricher, John. C. 1988. *A Field Guide to Eastern Forests, North America.* The Peterson Field Guide Series. Illustrated by Gordon Morrison. Boston: Houghton Mifflin.

Levi, Taal, Michael Barfield, Shane Barrantes, Christopher Sullivan, Robert D. Holt, and John Terborgh. 2019. Tropical forests can maintain hyperdiversity because of enemies. *Proceedings of the National Academy of Sciences of the USA* 116(2):581–586.

Levin, Simon A. 1970. Community equilibria and stability, and an extension of the competitive exclusion principle. *The American Naturalist*, 104(939):413–423.

MacArthur, R. H., and E. O. Wilson. 1967. *The Theory of Island Biogeography.* Monographs in Population Biology. Princeton University Press, Princeton, NJ.

May, Robert M. 1973. Qualitative stability in model ecosystems. *Ecology*, 54(3):638–641.

Maynard Smith, J. 1974. *Models in Ecology.* Cambridge: Cambridge University Press.

Richards, Paul W. 1973. The tropical rain forest. *Scientific American* 229(6):58–67.

Schupp, Eugene W. 1992. The Janzen-Connell model for tropical tree diversity: population implications and the importance of spatial scale. *The American Naturalist* 140(3):526–530.

Strobeck, Curtis. 1973. *N* species competition. *Ecology*, 54(3):650–654.

Tilman, David. 2004. Niche tradeoffs, neutrality, and community structure: A stochastic theory of resource competition, invasion, and community assembly. *Proceedings of the National Academy of Sciences of the USA* 101(30):10854–10861.

Wallace, Alfred R. 1878. *Tropical Nature, and Other Essays.* London: Macmillan and Company.

The economy’s swan dive is truly breathtaking. In response to the coronavirus threat we have shut down entire commercial sectors: most retail stores, restaurants, sports and entertainment. Travel and tourism are moribund. Manufacturing is threatened too, not only by concerns about workplace contagion but also by softening demand and disrupted supply chains. All of the automakers have closed their assembly plants in the U.S., and Boeing has stopped production at its plants near Seattle, which employ 70,000. Thus it comes as no great surprise—though it’s still a shock—that 3,283,000 Americans filed claims for unemployment compensation last week. That’s by far the highest weekly tally since the program was created in the 1930s. It’s almost five times the previous record from 1982, and 15 times the average for the first 10 weeks of this year. The graph is a dramatic hockey stick:

Here’s the same graph, updated to include new unemployment claims for the weeks ending 28 March and 4 April. The four-week total of new claims is over 16 million, which is roughly 10 percent of the American workforce. [Edited 2020-04-02 and 2020-04-09.]

I’ve been brooding about the economic collapse for a couple of weeks. I worry that the consequences of unemployment and business failures could be even more dire than the direct harm caused by the virus. Recovering from a deep recession can take years, and those who suffer most are the poor and the young. I don’t want to see millions of lives blighted and the dreams of a generation thwarted. But Covid-19 is still rampant. Relaxing our defenses could swamp the hospitals and elevate the death rate. No one is eager to take that risk (except perhaps Donald Trump, who dreams of an Easter resurrection).

The other day I was squabbling about these economic perils with the person I shelter-in-place with. Yes, she said, we’re facing a steep decline, but what makes you so sure it’s going to last for years? Why can’t the economy bounce back? I patiently mansplained about the irreversibility of events like bankruptcy and eviction and foreclosure, which are almost as hard to undo as death. That argument didn’t settle the matter, but we let the subject drop. (We’re hunkered down 24/7 here; we need to get along.)

In the middle of the night, the question came back to me. Why *won’t* it bounce back? Why can’t we just pause the economy like a video, then a month or two later press the play button to resume where we left off?

One problem with pausing the economy is that people can’t survive in suspended animation. They need a continuous supply of air, water, food, shelter, TV shows, and toilet paper. You’ve got to keep that stuff coming, no matter what. But people are only part of the economy. There are also companies, corporations, unions, partnerships, non-profit associations—all the entities we create to organize the great game of getting and spending. A company, considered as an abstraction, has no need for uninterrupted life support. It doesn’t eat or breathe or get bored. So maybe companies could be put in the deep freeze and then thawed when conditions improve.

Lying awake in the dark, I told myself a story:

Clare owns a little café at the corner of Main and Maple in a New England college town. In the middle of March, when the college sent the students home, she lost half her customers. Then, as the epidemic spread, the governor ordered all restaurants to close. Clare called up Rory the Roaster to cancel her order for coffee beans, pulled her ad from the local newspaper, and taped a “C U Soon” sign to the door. Then she sat down with her only employee, Barry the Barista, to talk about the bad news.

Barry was distraught. “I have rent coming due, and my student loan, and a car payment.”

“I wish I could be more help,” Clare replied. “But the rent on the café is also due. If I don’t pay it, we could lose the lease, and you won’t have a job to come back to. We’ll both be on the street.” They sat glumly in the empty shop, six feet apart. Seeing the lost-puppy look in Barry’s eyes, Clare added: “Let me call up Larry the Landlord and see if we can work something out.”

Larry was sympathetic. He’d been hearing from lots of tenants, and he genuinely wanted to help. But he told Clare what he’d told the rest: “The building has a mortgage. If I don’t pay the bank, I’ll lose the place, and we’ll all be on the street.”

You can guess what Betty the Banker said. “I have obligations to my depositors. Accounts earn interest every month. People are redeeming CDs. If I don’t maintain my cash reserves, the FDIC will come in and seize our assets. We’ll all be on the street.”

Everyone in this little fable wants to do the right thing. No one wants to put Clare out of business or leave Barry without an income. And yet my nocturnal meditations come to a dark end, in which the failure of Clare’s corner coffee shop triggers a worldwide recession. Barry gets evicted, Larry defaults on his loan, Betty’s bank goes bottom up. Rory the Roaster also goes under, and the Colombian farm that supplies the beans lays off all its workers. With Clare’s place now an empty storefront, there are fewer shoppers on Main Street, and the bookstore a few doors away folds up. The newspaper where Clare used to advertise ceases publication. The town’s population dwindles. The college closes.

At this point I feel like Ebenezer Scrooge pleading with the Ghost of Christmas Future to save Tiny Tim, or George Bailey desperate to escape the mean streets of Potterville and get back to the human warmth of Bedford Falls. Surely there must be some way to avert this catastrophe.

Here’s my idea. The rent and loan payments that cause all this economic mayhem are different from the transactions that Clare handles at her cash register. In her shopkeeper economy, money comes in only when coffee goes out; the two events are causally connected and simultaneous. And if she’s not selling any coffee, she can stop buying beans. The payment of her rent, on the other hand, is triggered by nothing but the ticking of the clock. She is literally buying time. Now the remedy is obvious: Stop the clock, or reset it. This is easier than you might think. We just go skipping down the Yellow Brick Road and petition the wizard to issue a proclamation. The wizard’s decree says this:

In the year 2020, April 30 shall be followed by April 1.

*Redux* is Latin for “a thing brought back or restored.” The word was introduced—or brought back—into the modern American vocabulary by John Updike’s 1971 novel *Rabbit Redux*, having been used earlier in titles of works by Dryden and Trollope. It’s one of those words I’ve always avoided saying aloud because of doubt about the pronunciation. The OED says it’s *re-ducks*.

How does this fiddling with the calendar help Clare? Consider what happens when the calendar flips from April 30 to April 1 Redux. It’s the first of the month, and the rent is due. But wait! No it’s not. She already paid the rent for April, a month ago. It won’t be due again until May 1, and that’s a month away. It’s the same with Larry’s mortgage payment, and Barry’s car loan. Of course stopping the clock cuts both ways. If you get a monthly pension or Social Security payment, that won’t be coming in April Redux, nor will the bank pay you interest on your deposits.

By means of this sly maneuver we have broken a vicious cycle. Larry doesn’t get a rent check from Clare, but he also doesn’t have to write a mortgage-loan check to Betty, who doesn’t have to make payments to her depositors and creditors. Each of them gets a month’s reprieve. With this extra slack, maybe Clare can keep Barry on the payroll and still have a viable business when her customers finally come out of hiding.

But isn’t this just a sneaky scheme to deprive the creditor class of money they are legally entitled to receive under the terms of contracts that both parties willingly signed? Yes it is, and a clever one at that. It is also a way to more equitably distribute the risks and costs of the present crisis. At the moment the burden falls heavily on Clare and Barry, who are forbidden to sell me a cup of coffee; but Larry and Betty are free to go on collecting their rents and loan payments. In addition to spreading around the financial pain, the scheme might also reduce the likelihood of a major, lasting economic contraction, which none of these characters would enjoy.

In spite of these appeals to the greater good of society as a whole, you may still feel there’s something dishonest about April Redux. If so, we can have the wizard issue a second decree:

In the 30 months from May 2020 through November 2022,

every month shall have one day fewer than the usual number.

every month shall have one day fewer than the usual number.

During this period every scheduled payment will come due a day sooner than usual. At the end, lenders and borrowers are even-steven.

The last time anybody tinkered with the calendar in the English-speaking world was 1752, when the British isles and their colonies finally adopted the Gregorian calendar (introduced elsewhere as early as 1592). *Past & Present*, no. 149, 1995, pp. 95–139. JSTOR (paywall).*was* concern and controversy about the proper calculation of wages, rents, and interest in the abbreviated September.

Riots in the streets are clearly a no-no in this period of social distancing, so presumably we won’t have to worry about mob action when April repeats itself. Besides, who’s going to complain about having 30 days *added* to their lifespan? I suppose there may be some grumbling from people with April birthdays, who think they are suddenly two years older. And back-to-back April Fool days could test the nation’s patience.

Although my plan for an April do-over is presented in the spirit of the season, I do think it illuminates a serious issue—an aspect of modern commerce that makes the current situation especially dangerous. Our problem is not that we have shut down the whole economy. The problem is that we’ve shut down only *half* the economy. The other half carries on with business as usual, creating imbalances that leave the whole edifice teetering on the brink of collapse.

The $2 trillion rescue package enacted last week addresses some of these issues. The cash handout for individual taxpayers, and a sweetening of unemployment benefits, should help Barry muddle through and pay his bills. A program of loans for small businesses could keep Clare afloat, and the loan would be forgiven if she keeps Barry on the payroll. These are thoughtful and useful measures, and a refreshing change from earlier bailout practices. We are not sending all the funds directly to investment banks and insurance companies. But a big share will wind up there anyway, since we are effectively subsidizing the rent and mortgage payments of individuals and small businesses. I wonder if it wouldn’t be fairer, more effective, and less expensive to curtail some of those payments. I’m not suggesting that we shut down the banks along with the shops; that would make matters worse. But we might require financial institutions to defer or forgo certain payments from distressed small businesses and the employs they lay off.

Voluntary efforts along these lines promise to soften the impact for at least a few lucky workers and businesses that have lost their revenue stream. In my New England college town, some of the banks are offering to defer monthly payments on mortgage loans, and there’s social pressure on landlords to do defer rents.

But don’t count on everyone to follow that program. On March 31, following announcements of layoffs and furloughs by Macy’s, Kohl’s, and other large retailers, the *New York Times* reported: “Last week, Taubman, a large owner of shopping malls, sent a letter to its tenants saying that the company expected them to keep paying their rent amid the crisis. Taubman, which oversees well-known properties like the Mall at Short Hills in New Jersey, reminded its tenants that it also had obligations to meet, and was counting on the rent to pay lenders and utilities.” [Added 2020-03-31.]

The coronavirus crisis is being treated as a unique event (and I certainly hope we’ll never see the like of it again). The associated economic crisis is also unique, at least within my memory. Most panics and recessions have their roots in the financial markets. At some point investors realize that tech stocks with an infinite price-to-earnings ratio are not such a bargain after all, or that bundling together thousands of risky mortgages doesn’t actually make them less risky. When the bubble bursts, the first casualties are on Wall Street; only later do the ripple effects reach Clare’s café. Now, we are seeing a rare disturbance that travels in the opposite direction. Do we know how to fix it?

]]>`img`

tag. The process was cumbersome and the product was ugly. In 2009 I wrote an `e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots`

and it would appear on your screen as:

\[e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots\]

All the work of parsing the TeX code and typesetting the math was done by a JavaScript program downloaded into your browser along with the rest of the web page.

Cervone’s jsMath soon evolved into MathJax, an open-source project initially supported by the AMS and SIAM. There are now about two dozen sponsors, and the project is under the aegis of NumFOCUS.

MathJax has made a big difference in my working life, transforming a problem into a pleasure. Putting math on the web is fun! Sometimes I do it just to show off. Furthermore, the software has served as an inspiration as well as a helpful tool. Until I saw MathJax in action, it simply never occurred to me that interesting computations could be done within the JavaScript environment of a web browser, which I had thought was there mainly to make things blink and jiggle. With the example of MathJax in front of me, I realized that I could not only display mathematical ideas but also explore and animate them within a web page.

Last fall I began hearing rumors about MathJax 3.0, “a complete rewrite of MathJax from the ground up using modern techniques.” It’s the kind of announcement that inspires both excitement and foreboding. What will the new version add? What will it take away? What will it fix? What will it break?

Before committing all of bit-player to the new version, I thought I would try a small-scale experiment. I have a standalone web page that makes particularly tricky use of MathJax. The page is a repository of the Dotster programs extracted from a recent bit-player post, My God, it’s full of dots. In January I got the Dotster page running with MathJax 3.

Most math in web documents is static content: An equation needs to be formatted once, when the page is first displayed, and it never changes after that. The initial typesetting is handled automatically by MathJax, in both the old and the new versions. As soon as the page is downloaded from the server, MathJax makes a pass through the entire text, identifying elements flagged as TeX code and replacing them with typeset math. Once that job is done, MathJax can go to sleep.

The Dotster programs are a little different; they include equations that change dynamically in response to user input. Here’s an example:

The slider on the left sets a numerical value that gets plugged into the two equation on the right. Each time the slider is moved, the equations need to be updated and reformatted. Thus with each change to the slider setting, MathJax has to wake up from its slumbers and run again to typeset the altered content.

The MathJax program running in the little demo above is the older version, 2.7. Cosmetically, the result is not ideal. With each change in the slider value, the two equations contract a bit, as if pinched between somebody’s fingers, and then snap back to their original size. They seem to wink at us.

The winking effect is caused by a MathJax feature called Fast Preview. The system does a quick-and-dirty rendering of the math content without calculating the correct final sizes for the various typographic elements. (Evidently that calculation takes a little time). You can turn off Fast Preview by right-clicking or control-clicking one of the equations and then navigating through the submenus shown at right. However, you’ll probably judge the result to be worse rather than better. Without Fast Preview, you’ll get a glimpse of the raw TeX commands. Instead of winking, the equations do jumping jacks.

I am delighted to report that all of this visual noise has been eliminated in the new MathJax. On changing a slider setting, the equations are updated in place, with no unnecessary visual fuss. And there’s no need for a progress indication, because the change is so quick it appears to be instantaneous. See for yourself:

Thus version 3 looks like a big win. There’s a caveat: Getting it to work did not go quite as smoothly as I had hoped. Nevertheless, this is a story with a happy ending.

If you have only static math content in your documents, making the switch to MathJax 3 is easy. In your HTML file you change a URL to load the new MathJax version, and convert any configuration options to a new format. As it happens, all the default options work for me, so I had nothing to convert. What’s most important about the upgrade path is what you *don’t* need to do. In most cases you should not have to alter any of the TeX commands present in the HTML files being processed by MathJax. (There are a few small exceptions.)

With dynamic content, further steps are needed. Here is the JavaScript statement I used to reawaken the typesetting engine in MathJax version 2.7:

```
MathJax.Hub.Queue(["Typeset", MathJax.Hub, mathjax_demo_box]);
```

The statement enters a `Typeset`

command into a queue of pending tasks. When the command reaches the front of the queue, MathJax will typeset any math found inside the HTML element designated by the identifier `mathjax_demo_box`

, ignoring the rest of the document.

In MathJax 3, the documentation suggested I could simply replace this command with a slightly different and more direct one:

```
MathJax.typeset([mathjax_demo_box]);
```

I did that. It didn’t work. When I moved the slider, the displayed math reverted to raw TeX form, and I found an error message in the JavaScript console:

What has gone wrong here? JavaScript’s `appendChild`

method adds a new node to the treelike structure of an HTML document. It’s like hanging an ornament from some specified branch of a Christmas tree. The error reported here indicates that the specified branch does not exist; it is `null`

.

Let’s not tarry over my various false starts and wrong turns as I puzzled over the source of this bug. I eventually found the cause and the solution in the “issues” section of the MathJax repository on GitHub. Back in September of last year Mihai Borobocea had reported a similar problem, along with the interesting observation that the error occurs only when an existing TeX expression is being replaced in a document, not when a new expression is being added. Borobocea had also discovered that invoking the procedure `MathJax.typesetClear()`

before `MathJax.typeset()`

would prevent the error.

A comment by Cervone explains much of what’s going on:

You are correct that you should use

`MathJax.typesetClear()`

if you have removed previously typeset math from the page. (In version 3, there is information stored about the math in a list of typeset expressions, and if you remove typeset math from the page and replace it with new math, that list will hold pointers to math that no longer exists in the page. That is what is causing the error you are seeing . . . )

I found that adding `MathJax.typesetClear()`

did indeed eliminate the error. As a practical matter, that solved my problem. But Borobocea pointed out a remaining loose end. Whereas `MathJax.typeset([mathjax_demo_box])`

operates only on the math inside a specific container, `MathJax.typesetClear()`

destroys the list of math objects for the entire document, an act that might later have unwanted consequences. Thus it seemed best to reformat all the math in the document whenever any one expression changes. This is inefficient, but with the 20-some equations in the Dotster web page the typesetting is so fast there’s no perceptible delay.

In January a fix for this problem was merged into MathJax 3.0.1, which is now the shipping version. Cervone’s comment on this change says that it “prevents the error message,” which left me with the impression that it might suppress the message without curing the error itself. But as far as I can tell the entire issue has been cleared up. There’s no longer any need to invoke `MathJax.typesetClear()`

.

In my first experiments with version 3.0 I stumbled onto another bit of weirdness, but it turned out to be a quirk of my own code, not something amiss in MathJax.

I was seeing occasional size variations in typeset math that seemed reminiscent of the winking problem in version 2.7. Sometimes the initial, automatic typesetting would leave the equations in a slightly smaller size; they would grow back to normal as soon as `MathJax.typeset()`

was applied. In the image at right I have superimposed the two states, with the correct, larger image colored red. It looks like Fast Preview has come back to haunt us, but that can’t be right, because Fast Preview has been removed entirely from version 3.

My efforts to solve this mystery turned into quite a debugging debacle. I got a promising clue from an exchange on the MathJax wiki, discussing size anomalies when math is composed inside an HTML element temporarily flagged `display: none`

, a style rule that makes the math invisible. In that circumstance MathJax has no information about the surrounding text, and so it leaves the typeset math in a default state. The same mechanism might account for what I was seeing—except that my page has no elements with a `display: none`

style.

I first observed this problem in the Chrome browser, where it is intermittent; when I repeatedly reloaded the page, the small type would appear about one time out of five. What fun! It takes multiple trials just to know whether an attempted fix has had any effect. Thus I was pleased to discover that in Firefox the shrunken type appears consistently, every time the page is loaded. Testing became a great deal easier.

I soon found a cure, though not a diagnosis. While browsing again in the MathJax issues archive and in a MathJax user forum, I came across suggestions to try a different form of output, with mathematical expressions constructed not from text elements in HTML and style rules in CSS but from paths drawn in Scalable Vector Graphics, or SVG. I found that the SVG expressions were stable and consistent in size, and in other respects indistinguishable from their HTML siblings. Again my problem was solved, but I still wanted to know the underlying cause.

Here’s where the troubleshooting report gets a little embarrassing. Thinking I might have a new bug to report, I set out to build a minimal exemplar—the smallest and simplest program that would trigger the bug. I failed. I was starting from a blank page and adding more and more elements of the original program—`div`

s nested inside `div`

s in the HTML, various stylesheet rules in the CSS, bigger collections of more complex equations—but none of these additions produced the slightest glitch in typesetting. So I tried working in the other direction, starting with the complex misbehaving program and stripping away elements until the problem disappeared. But it didn’t disappear, even when I reduced the page to a single equation in a plain white box.

As often happens, I found the answer not by banging my head against the problem but by going for a walk. Out in the fresh air, I finally noticed the one oddity that distinguished the failing program from all of the correctly working ones. Because the Dotster program began life embedded in a WordPress blog post, I could not include a link to the CSS stylesheet in the `head`

section of the HTML file. Instead, a JavaScript function constructed the link and inserted it into the `head`

. That happened *after* MathJax made its initial pass over the text. At the time of typesetting, the elements in which the equations were placed had no styles applied, and so MathJax had no way of determining appropriate sizes.

When Don Knuth unveiled TeX, circa 1980, I was amazed. Back then, typewriter-style word processing was impressive enough. TeX did much more: real typesetting, with multiple fonts (which Knuth also had to create from scratch), automatic hyphenation and justification, and beautiful mathematics.

Thirty years later, when Cervone created MathJax, I was amazed again—though perhaps not for the right reasons. I had supposed that the major programming challenge would be capturing all the finicky rules and heuristics for building up math expressions—placing and sizing superscripts, adjusting the height and width of parentheses or a radical sign to match the dimensions of the expression enclosed, spacing and aligning the elements of a matrix. Those are indeed nontrivial tasks, but they are just the beginning. My recent adventures have helped me see that another major challenge is making TeX work in an alien environment.

In classic TeX, the module that typesets equations has direct access to everything it might ever need to know about the surrounding text—type sizes, line spacing, column width, the amount of interword “glue” needed to justify a line of type. Sharing this information is easy because all the formatting is done by the same program. MathJax faces a different situation. Formatting duties are split, with MathJax handling mathematical content but the browser’s layout engine doing everything else. Indeed, the document is written in two different languages, TeX for the math and HTML/CSS for the rest. Coordinating actions in the two realms is not straightforward.

There are other complications of importing TeX into a web page. The classic TeX system runs in batch mode. It takes some inputs, produces its output, and then quits. Batch processing would not offer a pleasant experience in a web browser. The entire user interface (such as the buttons and sliders in my Dotster programs) would be frozen for the duration. To avoid this kind of rudeness to the user, MathJax is never allowed to monopolize JavaScript’s single thread of execution for more than a fraction of a second. To ensure this cooperative behavior, earlier versions relied on a hand-built scheme of queues (where procedures wait their turn to execute) and callbacks (which signal when a task is complete). Version 3 takes advantage of a new JavaScript construct called a *promise*. When a procedure cannot compute a result immediately, it hands out a promise, which it then redeems when the result becomes available.

Wait, there’s more! MathJax is not just a TeX system. It also accepts input written in MathML, a dialect of XML specialized for mathematical notation. Indeed, the internal language of MathJax is based on MathML. And MathJax can also be configured to handle AsciiMath, a cute markup language that aims to make even the raw form of an expression readable. Think of it as math with emoticons: Type ``oo``

and you’ll get \(\infty\), or ``:-``

for \(\div\).

MathJax also provides an extensive suite of tools for accessibility. Visually impaired readers can have an equation read aloud. As I learned at the January Joint Math Meetings, there are even provisions for generating Braille output—but that’s a subject that deserves a post of its own.

When I first encountered MathJax, I saw it as a marvel, but I also considered it a workaround or stopgap. Reading a short document that includes a single equation entails downloading the entire MathJax program, which can be much larger than the document itself. And you need to download it all again for every other mathy document (unless your browser cache hangs onto a copy). What an appalling waste of bandwidth.

Several alternatives seemed more promising as a long-term solution. The best approach, it seemed to me then, was to have support for mathematical notation built into the browser. Modern browsers handle images, audio, video, SVG, animations—why not math? But it hasn’t happened. Firefox and Safari have limited support for MathML; none of the browsers I know are equipped to deal with TeX.

Another strategy that once seemed promising was the browser plugin. A plugin could offer the same capabilities as MathJax, but you would download and install it only once. This sounds like a good deal for readers, but it’s not so attractive for the author of web content. If there are multiple plugins in circulation, they are sure to have quirks, and you need to accommodate all of them. Furthermore, you need some sort of fallback plan for those who have not installed a plugin.

Still another option is to run MathJax on the server, rather than sending the whole program to the browser. The document arrives with TeX or MathML already converted to HTML/CSS or SVG for display. This is the preferred modus operandi for several large websites, most notably Wikipedia. I’ve considered it for bit-player, but it has a drawback: Running on the server, MathJax cannot provide the kind of on-demand typesetting seen in the demos above.

As the years go by, I am coming around to the view that MathJax is not just a useful stopgap while we wait for the right thing to come along; it’s quite a good approximation to the right thing. As the author of a web page, I get to write mathematics in a familiar and well-tested notation, and I can expect that any reader with an up-to-date browser will see output that’s much like what I see on my own screen. At the same time, the reader also has control over how the math is rendered, via the context menu. And the program offers accessibility features that I could never match on my own.

To top it off, the software is open-source—freely available to everyone. That is not just an economic advantage but also a social one. The project has a community that stands ready to fix bugs, listen to suggestions and complaints, offer help and advice. Without that resource, I would still be struggling with the hitches and hiccups described above.

]]>Wandering around in these cavernous spaces always leaves me feeling a little disoriented and dislocated. It’s not just that I’m lost, although often enough I am—searching for Lobby D, or Meeting Room 407, or a toilet. I’m also dumbfounded by the very existence of these huge empty boxes, monuments to the human urge to congregate. If you build it, we will come.

It seems every city needs such a place, commensurate with its civic stature or ambitions. It’s no mystery why the cities make the investment. The JMM attracted more than 5,500 mathematicians (plus a few interlopers like me). I would guess we each spent on the order of $1,000 in payments to hotels, restaurants, taxis, and such, and perhaps as much again on airfare and registration fees. The revenue flowing to the city and its businesses and citizens must be well above $5 million. Furthermore, from the city’s point of view it’s all free money; the visitors do not send their children to the local schools or add to the burden on other city services, and they don’t vote in Denver.

However, this calculation tells only half the story. Although visitors to the Colorado Convention Center leave wads of cash in Denver, at the same time Denver residents are flying off to meetings elsewhere, withdrawing funds from the local economy and spreading the money around in Phoenix, Seattle, or Boston. If the convention-going traffic is symmetrical, the exchange will come out even for everyone. So why don’t we all save ourselves a lot of bother—not to mention millions of dollars—and just stay home? From inside the convention center, you may not be able to tell what city you’re in anyway.

While I was in Denver, I looked at the schedule of upcoming events for the convention center. A boat show was getting underway even as the mathematicians were still roaming the corridors, and tickets were also on sale for some sort of motorcycling event. The drillers and frackers were coming to town a few weeks later, and then in March the American Physical Society would hold its biggest annual gathering, with about twice as many participants as the JMM. The APS meeting was scheduled for this week, Monday through Friday (March 2–6). But late last Saturday night the organizers decided to cancel the entire conference because of the coronavirus threat. Some attendees were already in Denver or on their way.

I was taken aback by this decision, which is not to say I believe it was wrong. A year from now, if the world is still recovering from an epidemic that killed many thousands, the decisionmakers at the APS will be seen as prescient, prudent, and public-spirited. On the other hand, if Covid-19 sputters out in a few weeks, they may well be mocked as alarmists who succumbed to panic. But the latter judgment would be a little unfair. After all, the virus might be halted precisely *because* those 11,000 physicists stayed home.

I have not yet heard of other large scientific conferences shutting down, but a number of meetings in the tech industry have been called off, postponed, or gone virtual, along with some sports and entertainment events. The American Chemical Society is “monitoring developments” in advance of their big annual meeting, scheduled for later this month in Philadelphia. [Update: On March 9 the ACS announced "we are cancelling (terminating) the ACS Spring 2020 National Meeting & Expo."] Even if the events go on, some prospective participants will not be able to attend. I’ve just received an email from Harvard with stern warnings and restrictions on university-related travel.

Presumably, the Covid-19 threat will run its course and dissipate, and life will return to something called normal. But it’s also possible we have a new normal, that we have crossed some sort of demographic or epidemiological threshold, and novel pathogens will be showing up more frequently. Furthermore, the biohazard is not the only reason to question the future of megameetings; the ecohazard may be even more compelling.

All in all, it seems an apt moment to reflect on the human urge to come together in these large, temporary encampments, where we share ideas, opinions, news, gossip—and perhaps viruses—before packing up and going home until next year. Can the custom be sustained? If not, what might replace it?

Mathematicians and physicists have not always formed roving hordes to plunder defenseless cities. Until the 20th century there weren’t enough of them to make a respectable motorcycle gang. Furthermore, they had no motorcycles, or any other way to travel long distances in a reasonable time.

Before the airplane and the railroad, meetings between scientists were generally one-on-one. Consider the sad story of Neils Henrik Abel, a young Norwegian mathematician in the 1820s. Feeling cut off from his European colleagues, he undertook a two-year-long trek from Oslo to Berlin and Paris, traveling almost entirely on foot. In Paris he visited Lagrange and Cauchy, who received him coolly and did not read his proof of the unsolvability of quintic equations. So Abel walked home again. Somewhere along the way he picked up a case of tuberculosis and died two years later, at age 27, impoverished and probably unaware that his work was finally beginning to be noticed. I like to think the outcome would have been happier if he’d been able to present his results in a contributed-paper session at the JMM.

For Abel, the take-a-hike model of scholarly communication proved ineffective; perhaps more important, it doesn’t scale well. If everyone must make individual *tête-à-tête* visits, then forming connections between \(n\) scientists would require \(n(n - 1) / 2\) trips. Having everyone converge at a central point reduces the number to \(n\). From this point of view, the modern mass meeting looks not like a travel extravagance but like a strategy for minimizing total air miles. Still, staying home would be even more frugal, whether the cost is measured in dollars, kelvins, or epidemiological risk.

Most of the big disciplinary conferences got their start toward the end of the 19th century, and by the 1930s and 40s had hundreds of participants. Writing about mathematical life in that era, Ralph Boas notes: “One reason for going to meetings was that photocopying hadn’t been invented; it was at meetings that one found out what was going on.” But now photocopying *has* been invented—and superseded. There’s no need for a cross-country trip to find out what’s new; on any weekday morning you can just check the arXiv. Yet attendance at these meetings is up by another order of magnitude.

Even in a world with faster channels of communication, there are still moments of high excitement in the big convention halls. At the 1987 March meeting of the APS, the recent discovery of high-temperature superconductivity in cuprate ceramics was presented and discussed in a lively session that lasted past 3 a.m. The event is known as the Woodstock of Physics. I missed it—as well as the original Woodstock. But I was at the JMM in 2014 when progress toward confirming the twin prime conjecture caused a big stir. The conjecture (still unproved) says there are infinitely many pairs of prime numbers, such as 11 and 13, separated by exactly 2. Yiting Zhang had just proved there are infinitely many primes separated by no more than 70 million. Several talks discussed this finding and followup work by others, and Zhang himself spoke to a packed room.

Boas emphasized the motive of *hearing* what’s new, but one must not ignore the equally important impulse to *tell* what’s new. At the recent JMM, with its 5,500 visitors, the book of abstracts listed 2,529 presentations. In other words, almost half the visitors came to *deliver* a talk, which is probably a stronger motivation than hearing what others have to say. (When I first saw those numbers, I had the thought: “So, on average every presentation had one speaker and one listener.” The truth is not quite as bad as that, but it’s still worth keeping in mind that a meeting of this kind is not like a rock concert or a football game, with only a dozen or so performers and thousands in the audience.)

At some gatherings, the aim is not so much to talk about math and science but to *do* it. Groups of three or four huddle around blackboards or whiteboards, collaborating. But this activity is commoner at small, narrowly focused meetings—maybe at Aspen for the physicists or Banff for the mathematicians. No doubt such things also happen at the bigger meetings, but they are not a major item on the agenda for most attendees.

For one subpopulation of meeting-goers the main motivation is very practical: getting a job. Again this is a matter of efficiency. Someone looking for a postdoc position can arrange a dozen interviews at a single meeting.

There are many reasons to make the pilgrimage to the Colorado Convention Center, but I think the most important factor is yet to be stated. Dennis Flanagan, who was my employer, friend, and mentor many years ago at *Scientific American*, wrote that “science is intensely social.”

In an active scientific discipline everyone knows everyone else, if not in person, then by their writings and reputation. Scientists attend at least as many meetings and conventions as salesmen. Flanagan’s Version, 1988, p. 15.

You might interpret this comment as saying that scientists—like salesmen—are a bunch of genial, gregarious party animals who like to go out on the town, drink to excess, and misbehave. But I’m pretty sure that’s not what Dennis had in mind. He was arguing that social interactions are essential to the *process* of science. Becoming a mathematician or a physicist is tantamount to joining a club, and you can’t do that in isolation. You have to absorb the customs, the tastes, the values of the culture. For example, you need to internalize the community standard for deciding what is true. (It’s rather different in physics and mathematics.) Even subtler is the standard for deciding what is *interesting*—what ideas are worth pursuing, what problems are worth solving.

Meetings and conferences are not the only way of inculcating culture; the apprenticeship system known as graduate school is clearly more imporant overall. Still, discipline-wide gatherings have a role. By their very nature they are more cosmopolitan than any one university department. They acquaint you with the norms of the population but also with the range of variance, and thereby improve the probability that you’ll figure out where you fit in.

The quintessential big-meeting event is running into someone in the hallway whom you see only once a year. You stop and shake hands, or even hug. (In future we’ll bump elbows.) You’re both in a hurry. If you chat too long, you’ll miss the opening sentences of the next talk, which may be the only sentences you’ll understand. So the exchange of words is brief and unlikely to be deep. As I and my cohort grow older, it often amounts to little more than, “Wow. I’m still alive and so are you!” But sometimes it’s worth traveling a thousand miles to get that human validation.

If we have to dispense with such gatherings, science and math will muddle through somehow. We’ll meet more in the sanitary realm of bits and pixels, less in this fraught environment of atoms. We’ll become more hierarchical, with greater emphasis on local meetings and less on national and international ones. The alternatives can be made to work, and the next generation will view them as perfectly natural, if not inevitable. But I’m going to miss the ugly carpet, the uncomfortable folding/stacking chairs, and the ballrooms where nobody dances.

]]>In mathematics abstraction serves as a kind of stairway to heaven—as well as a test of stamina for those who want to get there.

Some years later you reach higher ground. The symbols representing particular numbers give way to the \(x\)s and \(y\)s that stand for quantities yet to be determined. They are symbols for symbols. Later still you come to realize that this algebra business is not just about “solving for \(x\),” for finding a specific number that corresponds to a specific letter. It’s a magical device that allows you to make blanket statements encompassing *all* numbers: \(x^2 - 1 = (x + 1)(x - 1)\) is true for any value of \(x\).

Continuing onward and upward, you learn to manipulate symbolic expressions in various other ways, such as differentiating and integrating them, or constructing functions of functions of functions. Keep climbing the stairs and eventually you’ll be introduced to areas of mathematics that openly boast of their abstractness. There’s *abstract algebra*, where you build your own collections of numberlike things: groups, fields, rings, vector spaces. *category theory*, where you’ll find a collection of ideas with the disarming label *abstract nonsense*.

Not everyone is filled with admiration for this Jenga tower of abstractions teetering atop more abstractions. Consider Andrew Wiles’s proof of Fermat’s last theorem, and its reception by the public. The theorem, first stated by Pierre de Fermat in the 1630s, makes a simple claim about powers of integers: If \(x, y, z, n\) are all integers greater than \(0\), then \(x^n + y^n = z^n\) has solutions only if \(n \le 2\). The proof of this claim, published in the 1990s, is not nearly so simple. Wiles (with contributions from Richard Taylor) went on a scavenger hunt through much of modern mathematics, collecting a truckload of tools and spare parts needed to make the proof work: elliptic curves, modular forms, Galois groups, functions on the complex plane, *L*-series. It is truly a *tour de force*.

*E* with certain properties. But the properties deduced on the left and right branches of the diagram turn out to be inconsistent, implying that *E* does not exist, nor does the counterexample that gave rise to it.

Is all that heavy machinery really needed to prove such an innocent-looking statement? Many people yearn for a simpler and more direct proof, ideally based on methods that would have been available to Fermat himself. *Parade* columnist, takes an even more extreme position, arguing that Wiles strayed so far from the subject matter of the theorem as to make his proof invalid. (For a critique of her critique, see Boston and Granville.)

Almost all of this grumbling about illegimate methods and excess complexity comes from outside the community of research mathematicians. Insiders see the Wiles proof differently. For them, the wide-ranging nature of the proof is actually what’s most important. The main accomplishment, in this view, was cementing a connection between those far-flung areas of mathematics; resolving FLT was just a bonus.

Yet even mathematicians can have misgivings about the intricacy of mathematical arguments and the ever-taller skyscrapers of abstraction. Jeremy Gray, a historian of mathematics, believes anxiety over abstraction was already rising in the 19th century, when mathematics seemed to be “moving away from reality, into worlds of arbitrary dimension, for example, and into the habit of supplanting intuitive concepts (curves that touch, neighboring points, velocity) with an opaque language of mathematical analysis that bought rigor at a high cost in intelligibility.”

*MAA Focus* by Adriana Salerno. The thesis was to be published in book form last fall by Birkhäuser, but the book doesn’t seem to be available yet.

I like to imagine abstraction (abstractly ha ha ha) as pulling the strings on a marionette. The marionette, being “real life,” is easily accessible. Everyone understands the marionette whether it’s walking or dancing or fighting. We can see it and it makes sense. But watch instead the hands of the puppeteers. Can you look at the hand movements of the puppeteers and know what the marionette is doing?… Imagine it gets worse. Much, much worse. Imagine that the marionettes we see are controlled by marionettoids we don’t see which are in turn controlled by pre-puppeteers which are finally controlled by actual puppeteers.

Keep all those puppetoids in mind. I’ll be coming back to them, but first I want to shift my attention to computer science, where the towers of abstraction are just as tall and teetery, but somehow less scary.

Suppose your computer is about to add two numbers…. No, wait, there’s no need to suppose or imagine. In the orange panel below, type some numbers into the \(a\) and \(b\) boxes, then press the “+” button to get the sum in box \(c\). Now, please describe what’s happening inside the machine as that computation is performed.

a

b

c

You can probably guess that somewhere behind the curtains there’s a fragment of code that looks like `c = a + b`

. And, indeed, that statement appears verbatim in the JavaScript program that’s triggered when you click on the plus button. But if you were to go poking around among the circuit boards under the keyboard of your laptop, you wouldn’t find anything resembling that sequence of symbols. The program statement is a high-level abstraction. If you really want to know what’s going on inside the computing engine, you need to dig deeper—down to something as tangible as a jelly bean.

How about an electron? `c = a + b`

by tracing the motions of all the electrons (perhaps \(10^{23}\) of them) through all the transistors (perhaps \(10^{11}\)).

To understand how electrons are persuaded to do arithmetic for us, we need to introduce a whole sequence of abstractions.

- First, step back from the focus on individual electrons, and reformulate the problem in terms of continuous quantities: voltage, current, capacitance, inductance.
- Replace the physical transistors, in which voltages and currents change smoothly, with idealized devices that instantly switch from totally off to fully on.
- Interpret the two states of a transistor as logical values (
*true*and*false*) or as numerical values (\(1\) and \(0\)). - Organize groups of transistors into “gates” that carry out basic functions of Boolean logic, such as and, or, and not.
- Assemble the gates into larger functional units, including adders, multipliers, comparators, and other components for doing base-\(2\) arithmetic.
- Build higher-level modules that allow the adders and such to be operated under the control of a program. This is the conceptual level of the instruction-set architecture, defining the basic operation codes (
*add, shift, jump*, etc.) recognized by the computer hardware. - Graduating from hardware to software, design an operating system, a collection of services and interfaces for abstract objects such as files, input and output channels, and concurrent processes.
- Create a compiler or interpreter that knows how to translate programming language statements such as
`c = a + b`

into sequences of machine instructions and operating-system requests.

From the point of view of most programmers, the abstractions listed above represent computational *infrastructure*: They lie beneath the level where you do most of your thinking—the level where you describe the algorithms and data structures that solve your problem. But computational abstractions are also a tool for building *superstructure*, for creating new functions beyond what the operating system and the programming language provide. For example, if your programming language handles only numbers drawn from the real number line, you can write procedures for doing arithmetic with complex numbers, such as \(3 + 5i\). (Go ahead, try it in the orange box above.) And, in analogy with the mathematical practice of defining functions of functions, we can build compiler compilers and schemes for metaprogramming—programs that act on other programs.

In both mathematics and computation, rising through the various levels of abstraction gives you a more elevated view of the landscape, with wider scope but less detail. Even if the process is essentially the same in the two fields, however, it doesn’t feel that way, at least to me. In mathematics, abstraction can be a source of anxiety; in computing, it is nothing to be afraid of. In math, you must take care not to tangle the puppet strings; in computing, abstractions are a defense against such confusion. For the mathematician, abstraction is an intellectual challenge; for the programmer, it is an aid to clear thinking.

Why the difference? How can abstraction have such a friendly face in computation and such a stern mien in math? One possible answer is that computation is just plain easier than mathematics.

Another possible explanation is that computer systems are engineered artifacts; we can build them to our own specifications. If a concept is just too hairy for the human mind to master, we can break it down into simpler pieces. Math is not so complaisant—not even for those who hold that mathematical objects are invented rather than discovered. We can’t just design number theory so that the Riemann hypothesis will be true.

But I think the crucial distinction between math abstractions and computer abstractions lies elsewhere. It’s not in the abstractions themselves but in the boundaries between them.

*abstraction barrier* in Abelson and Sussman’s Structure and Interpretation of Computer Programs, circa 1986. The underlying idea is surely older; it’s implicit in the “structured programming” literature of the 1960s and 70s. But *SICP* still offers the clearest and most compelling introduction.*information hiding* is considered a virtue, not an impeachable offense. If a design has a layered structure, with abstractions piled one atop the other, the layers are separated by *abstraction barriers*. A high-level module can reach across the barrier to make use of procedures from lower levels, but it won’t know anything about the implementation of those procedures. When you are writing programs in Lisp or Python, you shouldn’t need to think about how the operating system carries out its chores; and when you’re writing routines for the operating system, you needn’t think about the physics of electrons meandering through the crystal lattice of a semiconductor. Each level of the hierarchy can be treated (almost) independently.

Mathematics also has its abstraction barriers, although I’ve never actually heard the term used by mathematicians. A notable example comes from Giuseppe Peano’s formulation of the foundations of arithmetic, circa 1900. Peano posits the existence of a number \(0\), and a function called *successor*, \(S(n)\), which takes a number \(n\) and returns the next number in the counting sequence. Thus the natural numbers begin \(0, S(0), S(S(0)), S(S(S(0)))\), and so on. Peano deliberately refrains from saying anything more about what these numbers look like or how they work. They might be implemented as sets, with \(0\) being the empty set and successor the operation of adjoining an element to a set. Or they could be unary lists: (), (|), (||), (|||), . . . The most direct approach is to use Church numerals, in which the successor function itself serves as a counting token, and the number \(n\) is represented by \(n\) nested applications of \(S\).

From these minimalist axioms we can define the rest of arithmetic, starting with addition. In calculating \(a + b\), if \(b\) happens to be \(0\), the problem is solved: \(a + 0 = a\). If \(b\) is *not* \(0\), then it must be the successor of some number, which we can call \(c\). Then \(a + S(c) = S(a + c)\). Notice that this definition doesn’t depend in any way on how the number \(0\) and the successor function are represented or implemented. Under the hood, we might be working with sets or lists or abacus beads; it makes no difference. An abstraction barrier separates the levels. From addition you can go on to define multiplication, and then exponentiation, and again abstraction barriers protect you from the lower-level details. There’s never any need to think about how the successor function works, just as the computer programmer doesn’t think about the flow of electrons.

The importance of not thinking was stated eloquently by Alfred North Whitehead, more than a century ago:

Alfred North Whitehead, It is a profoundly erroneous truism, repeated by all copybooks and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilisation advances by extending the number of important operations which we can perform without thinking about them. Operations of thought are like cavalry charges in a battle—they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.An Introduction of Mathematics, 1911, pp. 45–46.

If all of mathematics were like the Peano axioms, we would have a watertight structure, compartmentalized by lots of leakproof abstraction barriers. And abstraction would probably not be considered “the hardest part about math.” But, of course, Peano described only the tiniest corner of mathematics. We also have the puppet strings.

In Piper Harron’s unsettling vision, the puppeteers high above the stage pull strings that control the pre-puppeteers, who in turn operate the marionettoids, who animate the marionettes. Each of these agents can be taken as representing a level of abstraction. The problem is, we want to follow the action at both the top and the bottom of the hierarchy, and possibly at the middle levels as well. The commands coming down from the puppeteers on high embody the abstract ideas that are needed to build theorems and proofs, but the propositions to be proved lie at the level of the marionettes. There’s no separating these levels; the puppet strings tie them together.

In the case of Fermat’s Last Theorem, you might choose to view the Wiles proof as nothing more than an elevated statement about elliptic curves and modular forms, but the proof is famous for something else—for what it tells us about the elementary equation \(x^n + y^n = z^n\). Thus the master puppeteers work at the level of algebraic geometry, but our eyes are on the dancing marionettes of simple number theory. What I’m suggesting, in other words, is that abstraction barriers in mathematics sometimes fail because events on both sides of the barrier make simultaneous claims on our interest.

In computer science, the programmer can ignore the trajectories of the electrons because those details really are of no consequence. Indeed, the electronic guts of the computing machinery could be ripped out and replaced by fluidic devices or fiber optics or hamsters in exercise wheels, and that brain transplant would have no effect on the outcome of the computation. Few areas of mathematics can be so cleanly floated away and rebuilt on a new foundation.

Can this notion of leaky abstraction barriers actually explain why higher mathematics looks so intimidating to most of the human population? It’s surely not the whole story, but maybe it has a role.

In closing I would like to point out an analogy with a few other areas of science, where problems that cross abstraction barriers seem to be particularly difficult. Physics, for example, deals with a vast range of spatial scales. At one end of the spectrum are the quarks and leptons, which rattle around comfortably inside a particle with a radius of \(10^{-15}\) meter; at the other end are galaxy clusters spanning \(10^{24}\) meters. In most cases, effective abstraction barriers separate these levels. When you’re studying celestial mechanics, you don’t have to think about the atomic composition of the planets. Conversely, if you are looking at the interactions of elementary particles, you are allowed to assume they will behave the same way anywhere in the universe. But there are a few areas where the barriers break down. For example, near a critical point where liquid and gas phases merge into an undifferentiated fluid, forces at all scales from molecular to macroscopic become equally important. Turbulent flow is similar, with whirls upon whirls upon whirls. It’s not a coincidence that critical phenomena and turbulence are notoriously difficult to describe.

Biology also covers a wide swath of territory, from molecules and single cells to whole organisms and ecosystems on a planetary scale. Again, abstraction barriers usually allow the biologist to focus on one realm at a time. To understand a predator-prey system you don’t need to know about the structure of cytochrome *c*. But the barriers don’t always hold. Evolution spans all these levels. It depends on molecular events (mutations in DNA), and determines the shape and fate of the entire tree of life. We can’t fully grasp what’s going on in the biosphere without keeping all these levels in mind at once.

Sorry. My program and your browser are not getting along. None of the interactive elements of this page will work. Could you try a different browser? Current versions of Chrome, Firefox, and Safari seem to work.

The disks are scattered randomly, except that no disk is allowed to overlap another disk or extend beyond the boundary of the square. Once a disk has been placed, it never moves, so each later disk has to find a home somewhere in the nooks and crannies between the earlier arrivals. Can this go on forever?

The search for a vacant spot would seem to grow harder as the square gets more crowded, so you might expect the process to get stuck at some point, with no open site large enough to fit the next disk. On the other hand, because the disks get progressively smaller, later ones can squeeze into tighter quarters. In the specific filling protocol shown here, these two trends are in perfect balance. The process of adding disks, one after another, never seems to stall. Yet as the number of disks goes to infinity, they completely fill the box provided for them. There’s a place for every last dot, but there’s no blank space left over.

Or at least that’s the mathematical ideal. The computer program that fills the square above never attains this condition of perfect plenitude. It shuts down after placing just 5,000 disks, which cover about 94 percent of the square’s area. This early exit is a concession to the limits of computer precision and human patience, but we can still dream of how it would work in a world without such tiresome constraints.

This scheme for filling space with randomly placed objects is the invention of John Shier, a physicist who worked for many years in the semiconductor industry and who has also taught at Normandale Community College near Minneapolis. He explains the method and the mathematics behind it in a recent book, *Fractalize That! A Visual Essay on Statistical Geometry*. (For bibliographic details see the links and references at the end of this essay.) I learned of Shier’s work from my friend Barry Cipra.

Shier hints at the strangeness of these doings by imagining a set of 100 round tiles in graduated sizes, with a total area approaching one square meter. He would give the tiles to a craftsman with these instructions:

“Mark off an area of one square meter, either a circle or a square. Start with the largest tile, and attach it permanently anywhere you wish in the marked-off area. Continue to attach the tiles anywhere you wish, proceeding always from larger to smaller.

There will always be a place for every tile regardless of how you choose to place them.” How many experienced tile setters would believe this?

Shier’s own creations go way beyond squares and circles filled with simple shapes such as disks. He has shown that the algorithm also works with an assortment of more elaborate designs, including nonconvex figures and even objects composed of multiple disconnected pieces. We get snowflakes, nested rings, stars, butterflies, fish eating lesser fish, faces, letters of the alphabet, and visual salads bringing together multiple ingredients. Shier’s interest in these patterns is aesthetic as well as mathematical, and several of his works have appeared in art exhibits; one of them won a best-of-show award at the 2017 Joint Mathematics Meeting.

Shier and his colleagues have also shown that the algorithm can be made to work in three-dimensional space. The book’s cover is adorned with a jumble of randomly placed toruses filling the volume of a transparent cube. If you look closely, you’ll notice that some of the rings are linked; they cannot be disentangled without breaking at least one ring. (The 3D illustration was created by Paul Bourke, who has more examples online, including 3D-printed models.)

After reading Shier’s account of his adventures, and admiring the pictures, I had to try it for myself. The experiments I’m presenting in this essay have no high artistic ambitions. I stick with plain-vanilla circular disks in a square frame, all rendered with the same banal blue-to-red color scheme. My motive is merely to satisfy my curiosity—or perhaps to overcome my skepticism. When I first read the details of how these graphics are created, I couldn’t quite believe it would work. Writing my own programs and seeing them in action has helped persuade me. So has a proof by Christopher Ennis, which I’ll return to below.

Filling a region of the plane with disks is not in itself such a remarkable trick. One well-known way of doing it goes by the name Apollonian circles. Start with three disks that are all tangent to one another, leaving a spiky three-pointed vacancy between them. Draw a new disk in the empty patch, tangent to all three of the original disks; this is the largest disk that can possibly fit in the space. Adding the new disk creates three smaller triangular voids, where you can draw three more triply tangent disks. There’s nothing to stop you from going on in this way indefinitely, approaching a limiting configuration where the entire area is filled.

There are randomized versions of the Apollonian model. For example, you might place zero-diameter seed disks at random unoccupied positions and then allow them to grow until they touch one (or more) of their neighbors. This process, too, is space-filling in the limit. And it can never fail: Because the disks are custom-fitted to the space available, you can never get stuck with a disk that can’t find a home.

Shier’s algorithm is different. You are given disks one at a time in a predetermined order, starting with the largest, then the second-largest, and so on. To place a disk in the square, you choose a point at random and test to see if the disk will fit at that location without bumping into its neighbors or poking beyond the boundaries of the square. If the tests fail, you pick another random point and try again. It’s not obvious that this haphazard search will always succeed—and indeed it works only if the successive disks get smaller according to a specific mathematical rule. But if you follow that rule, you can keep adding disks forever. Furthermore, as the number of disks goes to infinity, the fraction of the area covered approaches \(1\). It’s convenient to have a name for series of disks that meet these two criteria; I have taken to calling them *fulfilling* series.

In exploring these ideas computationally, it makes sense to start with the simplest case: disks that are all the same size. This version of the process clearly *cannot* be fulfilling. No matter how the disks are arranged, their aggregate area will eventually exceed that of any finite container. Click in the gray square below to start filling it with equal-size disks. The square box has area \(A_{\square} = 4\). The slider in the control panel determines the area of the individual disks \(A_k\), in a range from \(0.0001\) to \(1.0\).

Sorry, the program will not run in this browser.

If you play with this program for a while, you’ll find that the dots bloom quickly at first, but the process invariably slows down and eventually ends in a state labeled “Jammed,” indicating that the program has been unable

The densest possible packing of equal-size disks places the centers on a triangular lattice with spacing equal to the disk diameter. The resulting density (for an infinite number of disks on an infinite plane) is \(\pi \sqrt{3}\, /\, 6 \approx 0.9069\), which means more than 90 percent of the area is covered. A random filling in a finite square is much looser. My first few trials all halted with a filling fraction fairly close to one-half, and so I wondered if that nice round number might be the expectation value of the probabilistic process. Further experiments suggested otherwise. Over a broad range of disk sizes, from \(0.0001\) up to about \(0.01\), the area covered varied from one run to the next, but the average was definitely above one-half—perhaps \(0.54\). After some rummaging through the voluminous literature on circle packing, I think I may have a clue to the exact expectation value: \(\pi / (3 + 2 \sqrt{2}) \approx 0.539012\). Where does that weird number come from? The answer has nothing to do with Shier’s algorithm, but I think it’s worth a digression.

Consider an adversarial process: Alice is filling a unit square with \(n\) equal-size disks and wants to cover as much of the area as possible. Bob, who wants to minimize the area covered, gets to choose \(n\). If Bob chooses \(n = 1\), Alice can produce a single disk that just fits inside the square and covers about \(79\) percent of the space. Can Bob do better? Yes, if Bob specifies \(n = 2\), Alice’s best option is to squeeze the two disks into diagonally opposite corners of the square as shown in the diagram at right. These disks are bounded by right isosceles triangles, which makes it easy to calculate their radii as \(r = 1 / (2 + \sqrt{2}) \approx 0.2929\). Their combined area works out to that peculiar number \(\pi / (3 + 2 \sqrt{2}) \approx 0.54\).

If two disks are better than one (from Bob’s point of view), could three be better still? Or four, or some larger number? Apparently not. In 2010, Erik Demaine, Sándor Fekete and Robert Lang conjectured that the two-disk configuration shown above represents the worst case for any number of equal-size disks. In 2017 Fekete, Sebastian Morr, and Christian Scheffer proved this result.

Is it just a coincidence that the worst-case density for packing disks into a square also appears to be the expected density when equal-size disks are placed randomly until no more will fit? Wish I knew.

Let us return to the questions raised in Shier’s *Fractalize That!* If we want to fit infinitely many disks into a finite square, our only hope is to work with disks that get smaller and smaller as the process goes on. The disk areas must come from some sequence of ever-diminishing numbers. Among such sequences, the one that first comes to mind is \(\frac{1}{1}, \frac{1}{2}, \frac{1}{3}, \frac{1}{4}, \ldots\) These fractions have been known since antiquity as the harmonic numbers. (They are the wavelengths of the overtones of a plucked string.)

To see what happens when successive disks are sized according to the harmonic sequence, click in the square below.

Sorry, the program will not run in this browser.

Again, the process halts when no open space is large enough to accommodate the next disk in the sequence. If you move the slider all the way to the right, you’ll see a sequence of disks with areas drawn from the start of the full harmonic sequence, \(\frac{1}{1} , \frac{1}{2}, \frac{1}{3}, \dots\); at this setting, you’ll seldom get beyond eight or nine disks. Moving the slider to the left omits the largest disks at the beginning of the sequence, leaving the infinite tail of smaller disks. For example, setting the slider to \(1/20\) skips all the disks from \(\frac{1}{1}\) through \(\frac{1}{19}\) and begins filling the square with disks of area \(\frac{1}{20}, \frac{1}{21}, \frac{1}{22}, \dots\) Such truncated series go on longer, but eventually they also end in a jammed configuration.

The slider goes no further than 1/50, but even if you omitted the first 500 disks, or the first 5 million, the result would be the same. This is a consequence of the most famous property of the harmonic numbers: Although the individual terms \(1/k\) dwindle away to zero as \(k\) goes to infinity, the sum of all the terms,

\[\sum_{k = 1}^{\infty}\frac{1}{k} = \frac{1}{1} + \frac{1}{2} + \frac{1}{3} + \cdots,\]

does not converge to a finite value. As long as you keep adding terms, the sum will keep growing, though ever more slowly. This curious fact was proved in the 14th century by the French bishop and scholar Nicole Oresme. The proof is simple but ingenious. Oresme pointed out that the harmonic sequence

\[\frac{1}{1} + \frac{1}{2} + \left(\frac{1}{3} + \frac{1}{4}\right) + \left(\frac{1}{5} + \frac{1}{6} + \frac{1}{7} + \frac{1}{8}\right) + \cdots\]

is greater than

\[\frac{1}{1} + \frac{1}{2} + \left(\frac{1}{4} + \frac{1}{4}\right) + \left(\frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8}\right) + \cdots\]

The latter series is equivalent to \(1 + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} \cdots\), and so it is clearly divergent. Since the grouped terms of the harmonic series are even greater, they too must exceed any finite bound.

The divergence of the harmonic series implies that disks whose areas are generated by the series will eventually overflow any enclosing container. Dropping a finite prefix of the sequence, such as the first 50 disks, does not change this fact.

Let me note in passing that just as the filling fraction for fixed-size disks seems to converge to a specific constant, 0.5390, disks in harmonic series also seem to have a favored filling fraction, roughly 0.71. Can this be explained by some simple geometric argument? Again, I wish I knew.

Evidently we need to make the disks shrink faster than the harmonic numbers do. Here’s an idea: Square each element of the harmonic series, yielding this:

\[\sum_{k = 1}^{\infty}\frac{1}{k^2} = \frac{1}{1^2} + \frac{1}{2^2} + \frac{1}{3^2} + \cdots.\]

Click below (or press the Start button) to see how this one turns out, again in a square of area 4.

Sorry, the program will not run in this browser.

At last we have a process that won’t get stuck in a situation where there’s no place to put another disk. *could* run forever, but of course it doesn’t. It quits when the area of the next disk shrinks down to about a tenth of the size of a single pixel on a computer display. The stopped state is labeled “Exhausted” rather than “Jammed.”*fulfilling*. The disks are scattered sparsely in the square, leaving vast open spaces unoccupied. The configuration reminds me of deep-sky images made by large telescopes.

Why does this outcome look so different from the others? Unlike the harmonic numbers, the infinite series \(1 + \frac{1}{4} + \frac{1}{9} + \frac{1}{16} + \cdots\) converges to a finite sum. In the 18th century the task of establishing this fact (and determining the exact sum) was known as the Basel Problem, after the hometown of the Bernoulli family, who put much effort into the problem but never solved it. The answer came in 1735 from Leonhard Euler (another native of Basel, though he was working in St. Petersburg), who showed that the sum is equal to \(\pi^2 / 6\). This works out to about \(1.645\); since the area of the square we want to fill is \(4\), even an infinite series of disks would cover only about \(41\) percent of the territory.

Given that the numbers \(\frac{1}{1^1}, \frac{1}{2^1}, \frac{1}{3^1}, \dots\) diminish too slowly, whereas \(\frac{1}{1^2}, \frac{1}{2^2}, \frac{1}{3^2}, \dots\) shrink too fast, it makes sense to try an exponent somewhere between \(1\) and \(2\) in the hope of finding a Goldilocks solution. The computation performed below in Program 4 is meant to facilitate the search for such a happy medium. Here the disk sizes are elements of the sequence \(\frac{1}{1^s}, \frac{1}{2^s}, \frac{1}{3^s}, \dots\), where the value of the exponent \(s\) is determined by the setting of the slider, with a range of \(1 \lt s \le 2\). We already know what happens at the extremes of this range. What is the behavior in the middle?

Sorry, the program will not run in this browser.

If you try the default setting of \(s = 1.5\), you’ll find you are still in the regime where the disks dwindle away so quickly that the box never fills up; if you’re willing to wait long enough, the program will end in an exhausted state rather than a jammed one. Reducing the exponent to \(s = 1.25\) puts you on the other side of the balance point, where the disks remain too large and at some point one of them will not fit into any available space. By continuing to shuttle the slider back and forth, you could carry out a binary search, closing in, step by step, on the “just right” value of \(s\). This strategy can succeed, but it’s not quick. As you get closer to the critical value, the program will run longer and longer before halting. (After all, running forever is the behavior we’re seeking.) To save you some tedium, I offer a spoiler: the optimum setting is between 1.29 and 1.30.

At this point we have wandered into deeper mathematical waters. A rule of the form \(A_k = 1/k^s\) is called a power law, since each \(k\) is raised to the same power. And series of the form \(\sum 1/k^s\) are known as zeta functions, denoted \(\zeta(s)\). Zeta functions have quite a storied place in mathematics. The harmonic numbers correspond to \(\zeta(1) = \sum 1/k^1\), which does not converge.

Today, Riemann’s version of the zeta function is the engine (or enigma!) driving a major mathematical industry. Shier’s use of this apparatus in making fractal art is far removed from that heavy-duty research enterprise—but no less fascinating. Think of it as the zeta function on vacation.

If a collection of disks are to fill a square exactly, their aggregate area must equal the area of the square. This is a necessary condition though not a sufficient one. In all the examples I’ve presented so far, the containing square has an area of 4, so what’s needed is to find a value of \(s\) that satisfies the equation:

\[\zeta(s) = \sum_{k = 1}^{\infty}\frac{1}{k^s} = 4\]

Except for isolated values of \(s\),

Having this result in hand solves one part of the square-filling problem. It tells us how to construct an infinite set of disks whose total area is just enough to cover a square of area \(4\), with adequate precision for graphical purposes. We assign each disk \(k\) (starting at \(k = 1\)) an area of \(1/k^{1.2939615}.\) This sequence begins 1.000, 0.408, 0.241, 0.166, 0.125, 0.098,…

In the graph above, the maroon curve with \(s = 1.29396\) converges to a sum very close to 4. Admittedly, the rate of convergence is not quick. More than 3 million terms are needed to get within 1 percent of the target.

Our off-label use of the zeta function defines an infinite sequence of disks whose aggregate area is equal to \(4\). The disks in this unique collection will exactly fill our square box (assuming they can be properly arranged). It’s satisfying to have a way of reliably achieving this result, after our various earlier failures. On the other hand, there’s something irksome about that number \(4\) appearing in the equation. It’s so arbitrary! I don’t dispute that \(4\) is a perfectly fine and foursquare number, but there are many other sizes of squares we might want to fill with dots. Why give all our attention to the \(2 \times 2\) variety?

This is all my fault. When I set out to write some square-filling programs, I knew I couldn’t use the unit square—which seems like the obvious default choice—because of the awkward fact that \(\zeta(s) = 1\) has no finite solution. The unit square is also troublesome in the case of the harmonic numbers; the first disk, with area \(A_1 = 1\), is too large to fit. So I picked the next squared integer for the box size in those first programs. Having made my choice, I stuck with it, but now I feel hemmed in by that decision made with too little forethought.

We have all the tools we need to fill squares of other sizes (as long as the size is greater than \(1\)). Given a square of area \(A_{\square}\), we just solve for \(s\) in \(\zeta(s) = A_{\square}\). A square of area 8 can be covered by disks sized according to the rule \(A_k = 1/k^s\) with \(s = \zeta(8) \approx 1.1349\). For \(A_{\square} = 100\), the corresponding value of \(s\) is \(\zeta(100) \approx 1.0101\). For any \(A_{\square} \gt 1\) there is an \(s\) that yields a fulfilling set of disks, and vice versa for any \(s \gt 1\).

This relation between the exponent \(s\) and the box area \(A_{\square}\) suggests a neat way to evade the whole bother of choosing a specific container size. We can just scale the disks to fit the box, or else scale the box to accommodate the disks. Shier adopts the former method. Each disk in the infinite set is assigned an area of

\[A_k = \frac{A_{\square}}{\zeta(s)} \frac{1}{k^s},\]

where the first factor is a scaling constant that adjusts the disk sizes to fit the container. In my first experiments with these programs I followed the same approach. Later, however, when I began writing this essay, it seemed easier to think about the scaling—and explain it—if I transformed the size of the box rather than the sizes of the disks. In this scheme, the area of disk \(k\) is simply \(1 / k^s\), and the area of the container is \(A_{\square} = \zeta(s)\). (The two scaling procedures are mathematically equivalent; it’s only the ratio of disk size to container size that matters.)

Program 5 offers an opportunity to play with such scaled zeta functions.

Sorry, the program will not run in this browser.

At the other end of the scale, if you push the value of \(s\) up beyond about \(1.40\), you’ll discover something else: The program more often than not halts after placing just a few disks. At \(s = 1.50\) or higher, it seldom gets beyond the first disk. This failure is similar to what we saw with the harmonic numbers, but more interesting. In the case of the harmonic numbers, the total area of the disks is unbounded, making an overflow inevitable. With this new scaled version of the zeta function, the total area of the disks is always equal to that of the enclosing square. In principle, all the disks could all be made to fit, if you could find the right arrangement. I’ll return below to the question of why that doesn’t happen.

In *Fractalize That!* Shier introduces another device for taming space-filling sets. He not only scales the object sizes so that their total area matches the space available; he also adopts a variant zeta function that has two adjustable parameters rather than just one:

This is the Hurwitz zeta function, named for the German mathematician Adolf Hurwitz (1859–1919). Before looking into the details of the function, let’s play with the program and see what happens. Try a few settings of the \(s\) and \(a\) controls:

Sorry, the program will not run in this browser.

Different combinations of \(s\) and \(a\) produce populations of disks with different size distributions. The separate contributions of the two parameters are not always easy to disentangle, but in general decreasing \(s\) or increasing \(a\) leads to a pattern dominated by smaller disks. Here are snapshots of four outcomes:

Within the parameter range shown in these four panels, the filling process always continues to exhaustion, but at higher values of \(s\) it can jam, just as it does with the scaled Riemann zeta function.

Hurwitz wrote just one paper on the zeta function. It was published in 1882, when he was still quite young and just beginning his first academic appointment, at the University of Göttingen. (The paper is available from the Göttinger Digitalisierungszentrum; see pp. 86–101.)

Hurwitz modified the Riemann zeta function in two ways. First, the constant \(a\) is added to each term, turning \(1/k^s\) into \(1/(a + k)^s\). Second, the summation begins with \(k = 0\) rather than \(k = 1\). By letting \(a\) take on any value in the range \(0 \lt a \le 1\) we gain access to a continuum of zeta functions. The elements of the series are no longer just reciprocals of integers but reciprocals of real numbers. Suppose \(a = \frac{1}{3}\). Then \(\zeta(s, a)\) becomes:

\[\frac{1}{\left(\frac{1}{3} + 0\right)^s} + \frac{1}{\left(\frac{1}{3} + 1\right)^s} + \frac{1}{\left(\frac{1}{3} + 2\right)^s} + \cdots\ = \left(\frac{3}{1}\right)^s + \left(\frac{3}{4}\right)^s + \left(\frac{3}{7}\right)^s + \cdots\]

The Riemann zeta function and the Hurwitz zeta function differ substantially only for small values of \(k\) or large values of \(a\). When \(k\) is large, adding a small \(a\) to it makes little difference in the value of the function. Thus as \(k\) grows toward infinity, the two functions are asymptotically equal, as suggested in the graph at right. When the Hurwitz function is put to work packing disks into a square, a rule with \(a > 1\) causes the first several disks to be smaller than they would be with the Riemann rule. A value of \(a\) between \(0\) and \(1\) enlarges the early disks. In either case, the later disks in the sequence are hardly affected at all.

If \(a\) is a positive integer, the interpretation of \(\zeta(s, a)\) is even simpler. The case \(a = 1\) corresponds to the Riemann zeta sum. When \(a\) is a larger integer, the effect is to omit the first \(a - 1\) entries, leaving only the tail of the series. For example,

\[\zeta(s, 5) = \frac{1}{5^s} + \frac{1}{6^s} + \frac{1}{7^s} + \cdots.\]

In his fractal artworks, Shier chooses various values of \(a\) as a way of controlling the size distribution of the placed objects, and thereby fine-tuning the appearance of the patterns. Having this adjustment knob available is very convenient, but in the interests of simplicity, I am going to revert to the Riemann function in the rest of this essay.

Before going on, however, I also have to confess that I don’t really understand the place of the Hurwitz zeta function in modern mathematical research, or what Hurwitz himself had in mind when he formulated it. Zeta functions have been an indispensable tool in the long struggle to understand how the prime numbers are sprinkled among the integers. The connection between these two realms was made by Euler, with his remarkable equation linking a sum of powers of integers with a product of powers of primes:

*my* motor.

Riemann went further, showing that everything we might want to know about the distribution of primes is encoded in the undulations of the zeta function over the complex plane. Indeed, if we could simply pin down all the complex values of \(s\) for which \(\zeta(s) = 0\), we would have a master key to the primes. Hurwitz, in his 1882 paper, was clearly hoping to make some progress toward this goal, but I have not been able to figure out how his work fits into the larger story. The Hurwitz zeta function gets almost no attention in standard histories and reference works (in contrast to the Riemann version, which is everywhere). Wikipedia notes: “At rational arguments the Hurwitz zeta function may be expressed as a linear combination of Dirichlet *L*-functions and vice versa”—which sounds interesting, but I don’t know if it’s useful or important. A recent article by Nicola Oswald and Jörn Steuding puts Hurwitz’s work in historical context, but it does not answer these questions—at least not in a way I’m able to understand.

But again I digress. Back to dots in boxes.

If a set of circular disks and a square container have the same total area, can you always arrange the disks so that they completely fill the square without overflowing? Certainly not! Suppose the set consists of a single disk with area equal to that of the square; the disk’s diameter is greater than the side length of the square, so it will bulge through the sides while leaving the corners unfilled. A set of two disks won’t work either, no matter how you apportion the area between them. Indeed, when you are putting round pegs in a square hole, no finite set of disks can ever fill all the crevices.

Only an infinite set—a set with no smallest disk—can possibly fill the square completely. But even with an endless supply of ever-smaller disks, it seems like quite a delicate task to find just the right arrangement, so that every gap is filled and every disk has a place to call home. It’s all the more remarkable, then, that simply plunking down the disks at random locations seems to produce exactly the desired result. This behavior is what intrigued and troubled me when I first saw Shier’s pictures and read about his method for generating them. If a *random* arrangement works, it’s only a small step to the proposition that *any* arrangement works. Could that possibly be true?

Computational experiments offer strong hints on this point, but they can never be conclusive. What we need is a proof. *Math Horizons*, a publication of the Mathematical Association of America, which keeps it behind a paywall. If you have no library access and won’t pay the $50 ransom, I can recommend a video of Ennis explaining his proof in a talk at St. Olaf College.

As a warm-up exercise, Ennis proves a one-dimensional version of the area-filling conjecture, where the geometry is simpler and some of the constraints are easier to satisfy. In one dimension a disk is merely a line segment; its area is its length, and its radius is half that length. As in the two-dimensional model, disks are placed in descending order of size at random positions, with the usual proviso that no disk can overlap another disk or extend beyond the end points of the containing interval. In Program 7 you can play with this scheme.

Sorry, the program will not run in this browser.

I have given the line segment some vertical thickness to make it visible. The resulting pattern of stripes may look like a supermarket barcode or an atomic spectrum, but please imagine it as one-dimensional.

If you adjust the slider in this program, you’ll notice a difference from the two-dimensional system. In 2D, the algorithm is fulfilling only if the exponent \(s\) is less than a critical value, somewhere in the neighborhood of 1.4. In one dimension, the process continues without impediment for all values of \(s\) throughout the range \(1 \lt s \lt 2\). Try as you might, you won’t find a setting that produces a jammed state. (In practice, the program halts after placing no more than 10,000 disks, but the reason is exhaustion rather than jamming.)

Ennis titles his *Math Horizons* article “(Always) room for one more.” He proves this assertion by keeping track of the set of points where the center of a new disk can legally be placed, and showing the set is never empty. Suppose \(n - 1\) disks have already been randomly scattered in the container. The next disk to be placed, disk \(n\), will have an area (or length) of \(A_n = 1 / n^s\). Since the geometry is one-dimensional, the corresponding disk radius is simply \(r_n = A_n / 2\). The center of this new disk cannot lie any closer than \(r_n\) to the perimeter of another disk. It must also be at a distance of at least \(r_n\) from the boundary of the containing segment. We can visualize these constraints by adding bumpers, or buffers, of thickness \(r_n\) to the outside of each existing disk and to the inner edges of the containing segment. A few stages of the process are illustrated below.

Placed disks are blue, the excluded buffer areas are orange, and open areas—the set of all points where the center of the next disk could be placed—are black. In the top line, before any disks have been placed, the entire containing segment is open except for the two buffers at the ends. Each of these buffers has a length equal to \(r_1\), the radius of the first disk to be placed; the center of that disk cannot lie in the orange regions because the disk would then overhang the end of the containing segment. After the first disk has been placed *(second line)*, the extent of the open area is reduced by the area of the disk itself and its appended buffers. On the other hand, all of the buffers have also shrunk; each buffer is now equal to the radius of disk \(2\), which is smaller than disk \(1\). The pattern continues as subsequent disks are added. Note that although the blue disks cannot overlap, the orange buffers can.

For another view of how this process evolves, click on the *Next* button in Program 8. Each click inserts one more disk into the array and adjusts the buffer and open areas accordingly.

Sorry, the program will not run in this browser.

Because the blue disks are never allowed to overlap, the total blue area must increase monotonically as disks are added. It follows that the orange and black areas, taken together, must steadily decrease. But there’s nothing steady about the process when you keep an eye on the separate area measures for the orange and black regions. Changes in the amount of buffer overlap cause erratic, seesawing tradeoffs between the two subtotals. If you keep clicking the *Next* button (especially with \(s\) set to a high value), you may see the black area falling below \(1\) percent. Can we be sure it will never vanish entirely, leaving no opening at all for the next disk?

Ennis answers this question through worst-case analysis. He considers only configurations in which no buffers overlap, thereby squeezing the black area to its smallest possible extent. If the black area is always positive under these conditions, it cannot be smaller when buffer overlaps are allowed.

The basic idea of the proof

\[A_{\square} = \zeta(s), \quad A_{\color{blue}{\mathrm{blue}}} = \sum_{k=1}^{k = n - 1} \frac{1}{k^s}, \quad A_{\color{orange}{\mathrm{orange}}} = 2(n-1)r_{n}.\]

Then we need to prove that

\[A_{\square} - (A_{\color{blue}{\mathrm{blue}}} + A_{\color{orange}{\mathrm{orange}}}) \gt 0.\]

A direct proof of this statement would require an exact, closed-form expression for \(\zeta(s)\), which we already know is problematic. Ennis evades this difficulty by turning to calculus. He needs to evaluate the remaining tail of the zeta series, \(\sum_{k = n}^\infty 1/k^s\), but this discrete sum is intractable. On the other hand, by shifting from a sum to an integral, the problem becomes an exercise in undergraduate calculus. Exchanging the discrete variable \(k\) for a continuous variable \(x\), we want to find the area under the curve \(1/x^s\) in the interval from \(n\) to infinity; this will provide a lower bound on the corresponding discrete sum. Evaluating the integral yields:

\[\int_{x = n}^{\infty} \frac{1}{x^{s}} d x = \frac{1}{(s-1) n^{s-1}}.\]

Some further manipulation reveals that the area of the black regions is never smaller than

\[\frac{2 - s}{(s - 1)n^{s - 1}}.\]

If \(s\) lies strictly between \(1\) and \(2\), this expression must be greater than zero, since both the numerator and the denominator will be positive. Thus for all \(n\) there is at least one black point where the center of a new disk can be placed.

Ennis’s proof is a stronger one than I expected. When I first learned there was a proof, I guessed that it would take a probabilistic approach, showing that although a jammed configuration may exist, it has probability zero of turning up in a random placement of the disks. Instead, Ennis shows that no such arrangement exists at all. Even if you replaced the randomized algorithm with an adversarial one that tries its best to block every disk, the process would still run to fulfillment.

The proof for a two-dimensional system follows the same basic line of argument, but it gets more complicated for geometric reasons. In one dimension, as the successive disk areas get smaller, the disk radii diminish in simple proportion: \(r_k = A_k / 2\). In two dimensions, disk radius falls off only as the square root of the disk area: \(r_k = \sqrt{A_k / \pi}\). As a result, the buffer zone surrounding a disk excludes neighbors at a greater distance in two dimensions than it would in one dimension. There is still a range of \(s\) values where the process is provably unstoppable, but it does not extend across the full interval from \(s \gt 1\) to \(s \lt 2\).

Program 9, running in the panel below, is one I find very helpful in gaining intuition into the behavior of Shier’s algorithm. As in the one-dimensional model of Program 8, each press of the *Next* button adds a single disk to the containing square, and shows the forbidden buffer zones surrounding the disks.

Sorry, the program will not run in this browser.

Move the \(s\) slider to a position somewhere near 1.40. *Next*. Shier describes this phenomenon as “infant mortality”: If the placement process survives the high-risk early period, it is all but immortal.

There’s a certain whack-a-mole dynamic to the behavior of this system. Maybe the first disk covers all but one small corner of the black zone. It looks like the next disk will completely obliterate that open area. And so it does—but at the same time the shrinking of the orange buffer rings opens up another wedge of black elsewhere. The third disk blots out that spot, but again the narrowing of the buffers allows a black patch to peek out from still another corner. Later on, when there are dozens of disks, there are also dozens of tiny black spots where there’s room for another disk. You can often guess which of the openings will be filled next, because the random search process is likely to land in the largest of them. Again, however, as these biggest targets are buried, many smaller ones are born.

Ennis’s two-dimensional proof addresses the case of circular disks inside a circular boundary, rather than a square one. (The higher symmetry and the absence of corners streamlines certain calculations.) The proof strategy, again, is to show that after \(n - 1\) disks have been placed, there is still room for the \(n\)th disk, for any value of \(n \ge 1\). The argument follows the same logic as in one dimension, relying on an integral to provide a lower bound for the sum of a zeta series. But because of the \(\pi r^2\) area relation, the calculation now includes quadratic as well as linear terms. As a result, the proof covers only a part of the range of \(s\) values. The black area is provably nonempty if \(s\) is greater than \(1\) but less than roughly \(1.1\); outside that interval, the proof has nothing to say.

As mentioned above, Ennis’s proof applies only to circular disks in a circular enclosure. Nevertheless, in what follows I am going to assume the same ideas carry over to disks in a square frame, although the location of the boundary will doubtless be somewhat different. I have recently learned that Ennis has written a further paper on the subject, expected to be published in the *American Mathematical Monthly*. Perhaps he addresses this question there.

With Program 9, we can explore the entire spectrum of behavior for packing disks into a square. The possibilities are summarized in the candybar graph below.

- The leftmost band, in darker green, is the interval for which Ennis’s proof might hold. The question mark at the upper boundary line signifies that we don’t really know where it lies.
- In the lighter green region no proof is known, but in Shier’s extensive experiments the system never jams there.
- The transition zone sees the probability of jamming rise from \(0\) to \(1\) as \(s\) goes from about \(1.3\) to about \(1.5\).
- Beyond \(s \approx 1.5\), experiments suggest that the system
*always*halts in a jammed configuration. - At \(s \approx 1.6\) we enter a regime where the buffer zone surrounding the first disk invariably blocks the entire black region, leaving nowhere to place a second disk. Thus we have a simple proof that the system always jams.
- Still another barrier arises at \(s \approx 2.7\). Beyond this point, not even one disk will fit. The diameter of a disk with area \(1\) is greater than the side length of the enclosing square.

Can we pin down the exact locations of the various threshold points in the diagram above? This problem is tractable in those situations where the placement of the very first disk determines the outcome. At high values of \(s\) (and thus low values of \(\zeta(s)\), the first disk can obliterate the black zone and thereby preclude placement of a second disk. What is the lowest value of \(s\) for which this can happen? As in the image at right, the disk must lie at the center of the square box, and the orange buffer zone surrounding it must extend just far enough out to cover the corners of the inner black square, which defines the locus of all points that could accommodate the center of the second disk. Finding the value of \(s\) that satisfies this condition is a messy but straightforward bit of geometry and algebra. With the help of SageMath I get the answer \(s = 1.282915\). This value—let’s call it \(\overline{s}\)—is an upper bound on the “never jammed” region. Above this limit there is always a nonzero probability that the filling process will end after placing a single disk.

The value of \(\overline{s}\) lies quite close to the experimentally observed boundary between the never-jammed range and the transition zone, where jamming first appears. Is it possible that \(\overline{s}\) actually marks the edge of the transition zone—that below this value of \(s\) the program can never fail? To prove that conjecture, you would have to show that when the first disk is successfully placed, the process never stalls on a subsequent disk. That’s certainly not true in higher ranges of \(s\). Yet the empirical evidence near the threshold is suggestive. In my experiments I have yet to see a jammed outcome at \(s \lt \overline{s}\), not even in a million trials just below the threshold, at \(s = 0.999 \overline{s}\). In contrast, at \(s = 1.001 \overline{s}\), a million trials produced 53 jammed results—all of them occuring immediately after the first disk was placed.

The same kind of analysis leads to a lower bound on the region where *every* run ends after the first disk *(medium pink in the diagram above)*. In this case the critical situation puts the first disk as close as possible to a corner of the square frame, rather than in the middle. If the disk and its orange penumbra are large enough to block the second disk in this extreme configuration, then they will also block it in any other position. Putting a number on this bound again requires some fiddly equation wrangling; the answer I get is \(\underline{s} = 1.593782\). No process with higher \(s\) can possibly live forever, since it will die with the second disk. In analogy with the lower-bound conjecture, one might propose that the probability of being jammed remains below \(1\) until \(s\) reaches \(\underline{s}\). If both conjectures were true, the transition region would extend from \(\overline{s}\) to \(\underline{s}\).

The final landmark, way out at \(s \approx 2.7\), marks the point where the first disk threatens to burst the bounds of the enclosing square. In this case the game is over before it begins. In program 9, if you push the slider far to the right, you’ll find that the black square in the middle of the orange field shrinks away and eventually winks out of existence. This extinction event comes when the diameter of the disk equals the side length of the square. Given a disk of area \(1\), and thus radius \(1/\sqrt{\pi}\), we want to find the value of \(s\) that satisfies the equation

\[\frac{2}{\sqrt{\pi}} = \sqrt{\zeta(s)}.\]

Experiments with Program 9 show that the value is just a tad more than 2.7. That’s an interesting numerical neighborhood, no? A famous number lives nearby. Do you suppose?

Another intriguing set of questions concerns the phenomenon that Shier calls infant mortality. If you scroll back up to Program 5 and set the slider to \(s = 1.45\), you’ll find that roughly half the trials jam. The vast majority of these failures come early in the process, after no more than a dozen disks have been placed. At \(s = 1.50\) death at an early age is even more common; three-fourths of all the trials end with the very first disk. On the other hand, if a sequence of disks does manage to dodge all the hazards of early childhood, it may well live on for a very long time—perhaps forever.

Should we be surprised by this behavior? I am. As Shier points out, the patterns formed by our graduated disks are fractals, and one of their characteristic properties is self-similarity, or scale invariance. If you had a fully populated square—one filled with infinitely many disks—you could zoom in on any region to any magnification, and the arrangement of disks would look the same as it does in the full-size square. By “look the same” I don’t mean the disks would be in the same positions, but they would have the same size distribution and the same average number of neighbors at the same distances. This is a statistical concept of identity. And since the pattern looks the same and has the same statistics, you would think that the challenge of finding a place for a new disk would also be the same at any scale. Slipping in a tiny disk late in the filling operation would be no different from plopping down a large disk early on. The probability of jamming ought to be constant from start to finish.

But there’s a rejoinder to this argument: Scale invariance is broken by the presence of the enclosing square. The largest disks are strongly constrained by the boundaries, whereas most of the smaller disks are nowhere near the edges and are little influenced by them. The experimental data offer some support for this view. The graph below summarizes the outcomes of \(20{,}000\) trials at \(s = 1.50\). The red bars show the absolute numbers of trials ending after placing \(n\) disks, for each \(n\) from \(0\) through \(35\). The blue lollipops indicate the proportion of trials reaching disk \(n\) that halted after placing disk \(n\). This ratio can be interpreted (if you’re a frequentist!) as the probability of stopping at \(n\).

It certainly looks like there’s something odd happening on the left side of this graph. More than three fourths of the trials end after a single disk, but none at all jam at the second or third disks, and very few (a total of \(23\)) at disks \(4\) and \(5\). Then, suddenly, \(1{,}400\) more fall by the wayside at disk \(6\), and serious attrition continues through disk \(11\).

Geometry can explain some of this weirdness. It has to do with the squareness of the container; other shapes would produce different results.

At \(s = 1.50\) we are between \(\overline{s}\) and \(\underline{s}\), in a regime where the first disk is large enough to block off the entire black zone but not so large that it *must* do so. This is enough to explain the tall red bar at \(n = 1\): When you place the first disk randomly, roughly \(75\) percent of the time it will block the entire black region, ending the parade of disks. If the first disk *doesn’t* foreclose all further action, it must be tucked into one of the four corners of the square, leaving enough room for a second disk in the diagonally opposite corner. The sequence of images below (made with Program 9) tells the rest of the story.

The placement of the second disk blocks off the open area in that corner, but the narrowing of the orange buffers also creates two tiny openings in the cross-diagonal corners. The third and fourth disks occupy these positions, and simultaneously allow the black background to peek through in two other spots. Finally the fifth and sixth disks close off the last black pixels, and the system jams.

This stereotyped sequence of disk placements accounts for the near absence of mortality at ages \(n = 2\) through \(n = 5\), and the sudden upsurge at age \(6\). The elevated levels at \(n = 7\) through \(11\) are part of the same pattern; depending on the exact positioning of the disks, it may take a few more to expunge the last remnants of black background.

At still higher values of \(n\)—for the small subset of trials that get there—the system seems to shift to a different mode of behavior. Although numerical noise makes it hard to draw firm conclusions, it doesn’t appear that any of the \(n\) values beyond \(n = 12\) are more likely jamming points than others. Indeed, the data are consistent with the idea that the probability of jamming remains constant as each additional disk is added to the array, just as scale invariance would suggest.

A much larger data set would be needed to test this conjecture, and collecting such data is painfully slow. Furthermore, when it comes to rare events, I don’t have much trust in the empirical data. During one series of experiments, I noticed a program run that stalled after \(290\) disks—unusually late. The 290-disk configuration, produced at \(s = 1.47\), is shown at left below.

I wondered if it was *truly* jammed. My program gives up on finding a place for a disk after \(10^7\) random attempts. Perhaps if I had simply persisted, it would have gone on. So I reset the limit on random attempts to \(10^9\), and sat back to wait. After some minutes the program discovered a place where disk \(291\) would fit, and then another for disk \(292\), and kept going as far as 300 disks. The program had an afterlife! Could I revive it again? Upping the limit to \(10^{10}\) allowed another \(14\) disks to squueze in. The final configuration is shown at right above (with the original \(290\) disks faded, in order to make the \(24\) posthumous additions more conspicuous).

Is it really finished now, or is there still room for one more? I have no reliable way to answer that question. Checking \(10\) billion random locations sounds like a lot, but it is still a very sparse sampling of the space inside the square box. Using 64-bit floating-point numbers to define the coordinate system allows for more than \(10^{30}\) distinguishable points. And to settle the question mathematically, we would need unbounded precision.

We know from Ennis’s proof that at values of \(s\) not too far above \(1.0\), the filling process can always go on forever. And we know that beyond \(s \approx 1.6\), every attempt to fill the square is doomed. There must be some kind of transition between these two conditions, but the details are murky. The experimental evidence gathered so far suggests a smooth transition along a sigmoid curve, with the probability of jamming gradually increasing from \(0\) to \(1\). As far as I can tell, however, nothing we know for certain rules out a single hard threshold, below which all disk sequences are immortal and above which all of them die. Thus the phase diagram would be reduced to this simple form:

The softer transition observed in computational experiments would be an artifact of our inability to perform infinite random searches or place infinite sequences of disks.

Here’s a different approach to understanding the random dots-in-a-box phenomenon. It calls for a mental reversal of figure and ground. Instead of placing disks on a square surface, we drill holes in a square metal plate. And the focus of attention is not the array of disks or holes but rather the spaces between them. Shier has a name for the perforated plate: the gasket.

Program 10 allows you to observe a gasket as it evolves from a solid black square to a delicate lace doily with less than 1 percent of its original substance.

Sorry, the program will not run in this browser.

The gasket is quite a remarkable object. When the number of holes becomes infinite, the gasket must disappear entirely; its area falls to zero. Up until that very moment, however, it retains its structural integrity.

As the gasket is etched away, can we measure the average thickness of the surviving wisps and tendrils? I can think of several methods that involve elaborate sampling schemes. Shier has a much simpler and more ingenious proposal: To find the average thickness of the gasket, divide its area by its perimeter. It was not immediately obvious to me why this number would serve as an appropriate measure of the width, but at least the units come out right: We are dividing a length squared by a length and so we get a length. And the operation does make basic sense: The area of the gasket represents the amount of substance in it, and the perimeter is the distance over which it is stretched. (The widths calculated in Program 10 differ slightly from those reported by Shier. The reason, I think, is that I include the outer boundary of the square in the perimeter, and he does not.)

Calculating the area and perimeter of a complicated shape such as a many-holed gasket looks like a formidable task, but it’s easy if we just keep track of these quantities as we go along. Initially (before any holes are drilled), the gasket area \(A_0^g\) is the area of the full square, \(A_\square\). The initial gasket perimeter \(P_0^g\) is four times the side length of the square, which is \(\sqrt{A_\square}\). Thereafter, as each hole is drilled, we subtract the new hole’s area from \(A^g\) and add its perimeter to \(P^g\). The quotient of these quantities is our measure of the average gasket width after drilling hole \(k\): \(\widehat{W}_k^g\). Since the gasket area is shrinking while the perimeter is growing, \(\widehat{W}_k^g\) must dwindle away as \(k\) increases.

The importance of \(\widehat{W}_k^g\) is that it provides a clue to how large a vacant space we’re likely to find for the next disk or hole. If we take the idea of “average” seriously, there must always be at least one spot in the gasket with a width equal to or greater than \(\widehat{W}_k^g\). From this observation Shier makes the leap to a whole new space-filling algorithm. Instead of choosing disk diameters according to a power law and then measuring the resulting average gasket width, he determines the radius of the next disk from the observed \(\widehat{W}_k^g\):

\[r_{k+1} = \gamma \widehat{W}_k^g = \gamma \frac{A_k^g}{P_k^g}.\]

Here \(\gamma\) is a fixed constant of proportionality that determines how tightly the new disks or holes fit into the available openings.

The area-perimeter algorithm has a recursive structure, in which each disk’s radius depends on the state produced by the previous disks. This raises the question of how to get started: What is the size of the first disk? Shier has found that it doesn’t matter very much. Initial disks in a fairly wide range of sizes yield jam-proof and aesthetically pleasing results.

Graphics produced by the original power-law algorithm and by the new recursive one look very similar. One way to understand why is to rearrange the equation of the recursion:

On the right side of this equation we are dividing the average gasket width by the diameter of the next disk to be placed. The result is a dimensionless number—dividing a length by a length cancels the units. More important, the quotient is a constant, unchanging for all \(k\). If we calculate this same dimensionless gasket width when using the power-law algorithm, it also turns out to be nearly constant in the limit of karge \(k\), showing that the two methods yield sequences with similar statistics.

Setting aside Shier’s recursive algorithm, all of the patterns we’ve been looking at are generated by a power law (or zeta function), with the crucial requirement that the series must converge to a finite sum. The world of mathematics offers many other convergent series in addition to power laws. Could some of them also create fulfilling patterns? The question is one that Ennis discusses briefly in his talk at St. Olaf and that Shier also mentions.

Among the obvious candidates are geometric series such as \(\frac{1}{1}, \frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \dots\) A geometric series is a close cousin of a power law, defined in a similar way but exchanging the roles of \(s\) and \(k\). That is, a geometric series is the sum:

\[\sum_{k=0}^{\infty} \frac{1}{s^k} = \frac{1}{s^0} + \frac{1}{s^1} + \frac{1}{s^2} + \frac{1}{s^3} + \cdots\]

For any \(s > 1\), the infinite geiometric series has a finite sum, namely \(\frac{s}{s - 1}\). Thus our task is to construct an infinite set of disks with individual areas \(1/s^k\) that we can pack into a square of area \(\frac{s}{s - 1}\). Can we find a range of \(s\) for which the series is fulfilling? As it happens, this is where Shier began his adventures; his first attempts were not with power laws but with geometric series. They didn’t turn out well. You are welcome to try your own hand in Program 11.

Sorry, the program will not run in this browser.

There’s a curious pattern to the failures you’ll see in this program. No matter what value you assign to \(s\) (within the available range \(1 \lt s \le 2\)), the system jams when the number of disks reaches the neighborhood of \(A_\square = \frac{s}{s-1}\). For example, at \(s = 1.01\), \(\frac{s}{s - 1}\) is 101 and the program typically gets stuck somewhere between \(k = 95\) and \(k = 100\). At \(s = 1.001\), \(\frac{s}{s - 1}\) is \(1{,}001\) and there’s seldom progress beyond about \(k = 1,000\).

For a clue to what’s going wrong here, consider the graph at right, plotting the values of \(1 / k^s\) *(red)* and \(1 / s^k\) *(blue)* for \(s = 1.01\). These two series converge on nearly the same sum (roughly \(100\)), but they take very different trajectories in getting there. On this log-log plot, the power-law series \(1 / s^k\) is a straight line. The geometric series \(1 / s^k\) falls off much more slowly at first, but there’s a knee in the curve at about \(k = 100\) *(dashed mauve line)*, where it steepens dramatically. If only we could get beyond this turning point, it looks like the rest of the filling process would be smooth sledding, but in fact we never get there. Whereas the first \(100\) disks of the power-law series fill up only about \(5\) percent of the available area, they occuy 63 percent in the geometric case. This is where the filling process stalls.

Even in one dimension, the geometric series quickly succumbs. (This is in sharp contrast to the one-dimensional power-law model, where any \(s\) between \(1\) and \(2\) yields a provably infinite progression of disks.)

Sorry, the program will not run in this browser.

And just in case you think I’m pulling a fast one here, let me demonstrate that those same one-dimensional disks will indeed fit in the available space, if packed efficiently. In Program 13 they are placed in order of size from left to right.

Sorry, the program will not run in this browser.

I have made casual attempts to find fulfillment with a few other convergent series, such as the reciprocals of the Fibonacci numbers (which converge to about \(3.36\)) and the reciprocals of the factorials (whose sum is \(e \approx 2.718\)). Both series jam after the first disk. There are plenty of other convergent series one might try, but I doubt this is a fruitful line of inquiry.

All the variations discussed above leave one important factor unchanged: The objects being fitted together are all circular. Exploring the wider universe of shapes has been a major theme of Shier’s work. He asks: What properties of a shape make it suitable for forming a statistical fractal pattern? And what shapes (if any) refuse to cooperate with this treatment? (The images in this section were created by John Shier and are reproduced here with his permission.)

Shier’s first experiments were with circular disks and axis-parallel squares; the filling algorithm worked splendidly in both cases. He also succeeded with axis-parallel rectangles of various aspect ratios, even when he mixed vertical and horizontal orientations in the same tableau. In collaboration with Paul Bourke he tried randomizing the orientation of squares as well as their positions. Again the outcome was positive, as the illustration above left shows.

Equilateral triangles were less cooperative, and at first Shier believed the algorithm would consistently fail with this shape. The triangles tended to form orderly arrays with the sharp point of one triangle pressed close against the broad side of another, leaving little “wiggle room.” Further efforts showed that the algorithm was not truly getting stuck but merely slowing down. With an appropriate choice of parameters in the Hurwitz zeta function, and with enough patience, the triangles did come together in boundlessly extendable space-filling patterns.

The casual exploration of diverse shapes eventually became a deliberate quest to stake out the limits of the space-filling process. Surely there must be *some* geometric forms that the algorithm would balk at, failing to pack an infinite number of objects into a finite area. Perhaps nonconvex shapes such as stars and snowflakes and flowers would expose a limitation—but no, the algorithm worked just fine with these figures, fitting smaller stars into the crevices between the points of larger stars. The next obvious test was “hollow” objects, such as annular rings, where an internal void is not part of the object and is therefore available to be filled with smaller copies. The image at right is my favorite example of this phenomenon. The bowls of the larger nines have smaller nines within them. It’s nines all the way down. When we let the process continue indefinitely, we have a whimsical visual proof of the proposition that \(.999\dots = 1\).

These successes with nonconvex forms and objects with holes led to an *Aha* moment, as Shier describes it. The search for a shape that would break the algorithm gave way to a suspicion that no such shape would be found, and then the suspicion gradually evolved into a conviction that any “reasonably compact” object is suitable for the *Fractalize That!* treatment. The phrase “reasonably compact” would presumably exclude shapes that are in fact dispersed sets of points, such as Cantor dust. But Shier has shown that shapes formed of disconnected pieces, such as the words in the pair of images below, present no special difficulty.

*Fractalize That!* is not all geometry and number theory. Shier is eager to explain the mathematics behind these curious patterns, but he also presents the algorithm as a tool for self-expression. MATH and ART both have their place.

Finally, I offer some notes on what’s needed to turn these algorithms into computer programs. Shier’s book includes a chapter for do-it-yourselfers that explains his strategy and provides some crucial snippets of code (written in C). My own source code (in JavaScript) is available on GitHub. And if you’d like to play with the programs without all the surrounding verbiage, try the GitHub Pages version.

The inner loop of a typical program looks something like this:

```
let attempt = 1;
while (attempt <= maxAttempts) {
disk.x = randomCoord();
disk.y = randomCoord();
if (isNotOverlapping(disk)) {
return disk;
}
attempt++;
}
return false;
```

We generate a pair of random \(x\) and \(y\) coordinates, which mark the center point of the new disk, and check for overlaps with other disks already in place. If no overlaps are discovered, the disk stays put and the program moves on. Otherwise the disk is discarded and we jump back to the top of the loop to try a new \(xy\) pair.

The main computational challenge lies in testing for overlaps. For any two specific disks, the test is easy enough: They overlap if the sum of their radii is greater than the distance between their centers. The problem is that the test might have to be repeated many millions of times. My program makes \(10\) million attempts to place a disk before giving up. If it has to test for overlap with \(100{,}000\) other disks on each attempt, that’s a trillion tests. A trillion is too many for an interactive program where someone is staring at the screen waiting for things to happen. To speed things up a little I divide the square into a \(32 \times 32\) grid of smaller squares. The largest disks—those whose diameter is greater than the width of a grid cell—are set aside in a special list, and all new candidate disks are checked for overlap with them. Below this size threshold, each disk is allocated to the grid cell in which its center lies. A new candidate is checked against the disks in its own cell and in that cell’s eight neighbors. The net result is an improvement by two orders of magnitude—lowering the worst-case total from \(10^{12}\) overlap tests to about \(10{10}\).

All of this works smoothly with circular disks. Devising overlap tests for the variety of shapes that Shier has been working with is much harder.

From a theoretical point of view, the whole rigmarole of overlap testing is hideously wasteful and unnecessary. If the box is already 90 percent full, then we know that 90 percent of the random probes will fail. A smarter strategy would be to generate random points only in the “black zone” where new disks can legally be placed. If you could do that, you would never need to generate more than one point per disk, and there’d be no need to check for overlaps. But keeping track of the points that comprise the black zone—scattered throughout multiple, oddly shaped, transient regions—would be a serious exercise in computational geometry.

For the actual drawing of the disks, Shier relies on the technology known as SVG, or scalable vector graphics. As the name suggests, these drawings retain full resolution at any size, and they are definitely the right choice if you want to create works of art. They are less suitable for the interactive programs embedded in this document, mainly because they consume too much memory. The images you see here rely on the HTML *canvas* element, which is simply a fixed-size pixel array.

Another point of possible interest is the evaluation of the zeta function. If we want to scale the disk sizes to match the box size (or vice versa), we need to compute a good approximation of the Riemann function \(\zeta(s)\) or the Hurwitz function \(\zeta(s, a)\). I didn’t know how to do that, and most of the methods I read about seemed overwhelming. Before I could get to zeta, I’d have to hack my way through thickets of polygamma functions and Stieltjes constants. For the Riemann zeta function I found a somewhat simpler algorithm published by Peter Borwein in 1995. It’s based on a polynomial approximation that yields ample precision and runs in less than a millisecond. For the Hurwitz zeta function I stayed with a straightforward translation of Shier’s code, which takes more of a brute-force approach. (There are alternatives for Hurwitz too, but I couldn’t understand them well enough to make them work.)

The JavaScript file in the GitHub repository has more discussion of implementation details.

Shier, John. 2018. *Fractalize That! A Visual Essay on Statistical Geometry*. Singapore: World Scientific. Publisher’s website.

Shier, John. Website: http://www.john-art.com/

Shier, John. 2011. The dimensionless gasket width \(b(c,n)\) in statistical geometry. http://www.john-art.com/gasket_width.pdf

Shier, John. 2012. Random fractal filling of a line segment. http://www.john-art.com/gasket_width.pdf

Dunham, Douglas, and John Shier. 2014. The art of random fractals. In *Proceedings of Bridges 2014: Mathematics, Music, Art, Architecture, Culture* pp. 79–86. PDF.

Shier, John. 2015. A new recursion for space-filling geometric fractals. http://www.john-art.com/gasket_width.pdf

Dunham, Douglas, and John Shier. 2015. An algorithm for creating aesthetic random fractal patterns. Talk delivered at the Joint Mathematics Meetings January 2015, San Antonio, Texas.

Dunham, Douglas, and John Shier. 2018. A property of area and perimeter. In *ICGG 2018: Proceedings of the 18th International Conference on Geometry and Graphics*, Milano, August 2018, pp. 228–237.

Dunham, Douglas, and John Shier. 2017. New kinds of fractal patterns. In *Proceedings of Bridges 2017: Mathematics, Art, Music, Architecture, Education, Culture*,

pp. 111–116. Preprint.

Shier, John, and Paul Bourke. 2013. An algorithm for random fractal filling of space. *Computer Graphics Forum* 32(8):89–97. PDF. Preprint.

Ennis, Christopher. 2016. (Always) room for one more. *Math Horizons* 23(3):8–12. PDF (paywalled).

Dodds, Peter Sheridan, and Joshua S. Weitz. 2002. Packing-limited growth. Physical Review E 65: 056108.

Lagarias, Jeffrey C., Colin L. Mallows, and Allan R. Wilks. 2001. Beyond the Descartes circle theorem. https://arxiv.org/abs/math/0101066. (Also published in *American Mathematical Monthly*, 2002, 109:338–361.)

Mackenzie, Dana. 2010. A tisket, a tasket, an Apollonian gasket. *American Scientist* 98:10–14. https://www.americanscientist.org/article/a-tisket-a-tasket-an-apollonian-gasket.

Manna, S. S. 1992. Space filling tiling by random packing of discs. *Physica A* 187:373–377.

Bailey, David H., and Jonathan M. Borwein. 2015. Crandall’s computation of the incomplete Gamma function and the Hurwitz zeta function, with applications to Dirichlet L-series. *Applied Mathematics and Computation*, 268, 462–477.

Borwein, Peter. 1995. An efficient algorithm for the Riemann zeta function. http://www.cecm.sfu.ca/personal/pborwein/PAPERS/P155.pdf

Coffey, Mark W. 2009. An efficient algorithm for the Hurwitz zeta and related functions. *Journal of Computational and Applied Mathematics* 225:338–346.

Hurwitz, Adolf. 1882. Einige Eigenschaften der Dirichletschen Funktionen \(F(s) = \sum \left(\frac{D}{n} \frac{1}{n^s}\right)\), die bei der Bestimmung der Klassenzahlen binärer quadratischer Formen auftreten. *Zeitschrift für Mathematik und Physik* 27:86–101. https://gdz.sub.uni-goettingen.de/id/PPN599415665_0027.

Oswald, Nicola, and Jörn Steuding. 2015. Aspects of zeta-function theory in the mathematical works of Adolf Hurwitz. https://arxiv.org/abs/1506.00856.

Xu, Andy. 2018. Approximating the Hurwitz zeta function. PDF.

Disclaimer: The investigations of the MAX 8 disasters are in an early stage, so much of what follows is based on secondary sources—in other words, on leaks and rumors and the speculations of people who may or may not know what they’re talking about. As for my own speculations: I’m not an aeronautical engineer, or an airframe mechanic, or a control theorist. I’m not even a pilot. Please keep that in mind if you choose to read on.

Early on the morning of October 29, 2018, Lion Air Flight 610 departed Jakarta, Indonesia, with 189 people on board. The airplane was a four-month-old 737 MAX 8—the latest model in a line of Boeing aircraft that goes back to the 1960s. Takeoff and climb were normal to about 1,600 feet, where the pilots retracted the flaps (wing extensions that increase lift at low speed). At that point the aircraft unexpectedly descended to 900 feet. In radio conversations with air traffic controllers, the pilots reported a “flight control problem” and asked about their altitude and speed as displayed on the controllers’ radar screens. Cockpit instruments were giving inconsistent readings. The pilots then redeployed the flaps and climbed to 5,000 feet, but when the flaps were stowed again, the nose dipped and the plane began to lose altitude. Over the next six or seven minutes the pilots engaged in a tug of war with their own aircraft, as they struggled to keep the nose level but the flight control system repeatedly pushed it down. In the end the machine won. The airplane plunged into the sea at high speed, killing everyone aboard.

The second crash happened March 8, when Ethiopian Airlines Flight 302 went down six minutes after taking off from Addis Ababa, killing 157. The aircraft was another MAX 8, just two months old. The pilots reported control problems, and data from a satellite tracking service showed sharp fluctuations in altitude. The similarities to the Lion Air crash set off alarm bells: If the same malfunction or design flaw caused both accidents, it might also cause more. Within days, the worldwide fleet of 737 MAX aircraft was grounded. Data recovered since then from the Flight 302 wreckage has reinforced the suspicion that the two accidents are closely related.

The grim fate of Lion Air 610 can be traced in brightly colored squiggles extracted from the flight data recorder. (The chart was published in November in a preliminary report from the Indonesian National Committee on Transportation Safety.)

The outline of the story is given in the altitude traces at the bottom of the chart. The initial climb is interrupted by a sharp dip; then a further climb is followed by a long, erratic roller coaster ride. At the end comes the dive, as the aircraft plunges 5,000 feet in a little more than 10 seconds. (Why are there two altitude curves, separated by a few hundred feet? I’ll come back to that question at the end of this long screed.)

All those ups and downs were caused by movements of the horizontal stabilizer, the small winglike control surface at the rear of the fuselage. The stabilizer controls the airplane’s pitch attitude—nose-up vs. nose-down. On the 737 it does so in two ways. A mechanism for pitch *trim* tilts the entire stabilizer, whereas pushing or pulling on the pilot’s control yoke moves the elevator, a hinged tab at the rear of the stabilizer. In either case, moving the trailing edge of the surface upward tends to force the nose of the airplane up, and vice versa. Here we’re mainly concerned with trim changes rather than elevator movements.

Commands to the pitch-trim system and their effect on the airplane are shown in three traces from the flight data, which I reproduce here for convenience:

The line labeled “trim manual” *(light blue)* reflects the pilots’ inputs, “trim automatic” *(orange)* shows commands from the airplane’s electronic systems, and “pitch trim position” *(dark blue)* represents the tilt of the stabilizer, with higher position on the scale denoting a nose-up command. This is where the tug of war between man and machine is clearly evident. In the latter half of the flight, the automatic trim system repeatedly commands nose down, at intervals of roughly 10 seconds. In the breaks between those automated commands, the pilots dial in nose-up trim, using buttons on the control yoke. In response to these conflicting commands, the position of the horizontal stabilizer oscillates with a period of 15 or 20 seconds. The see-sawing motion continues for at least 20 cycles, but toward the end the unrelenting automatic nose-down adjustments prevail over the briefer nose-up commands from the pilots. The stabilizer finally reaches its limiting nose-down deflection and stays there as the airplane plummets into the sea.

What’s to blame for the perverse behavior of the automatic pitch trim system? The accusatory finger is pointing at something called MCAS, a new feature of the 737 MAX series. MCAS stands for Maneuvering Characteristics Augmentation System—an impressively polysyllabic name that tells you nothing about what the thing is or what it does. As I understand it, MCAS is not a piece of hardware; there’s no box labeled MCAS in the airplane’s electronic equipment bays. MCAS consists entirely of software. It’s a program running on a computer.

MCAS has just one function. It is designed to help prevent an aerodynamic stall, a situation in which an airplane has its nose pointed up so high with respect to the surrounding airflow that the wings can’t keep it aloft. A stall is a little like what happens to a bicyclist climbing a hill that keeps getting steeper and steeper: Eventually the rider runs out of oomph, wobbles a bit, and then rolls back to the bottom. Pilots are taught to recover from stalls, but it’s not a skill they routinely practice with a planeful of passengers. In commercial aviation the emphasis is on *avoiding* stalls—forestalling them, so to speak. Airliners have mechanisms to detect an imminent stall and warn the pilot with lights and horns and a “stick shaker” that vibrates the control yoke. On Flight 610, the captain’s stick was shaking almost from start to finish.

Some aircraft go beyond mere warnings when a stall threatens. If the aircraft’s nose continues to pitch upward, an automated system intervenes to push it back down—if necessary overriding the manual control inputs of the pilot. MCAS is designed to do exactly this. It is armed and ready whenever two criteria are met: The flaps are up (generally true except during takeoff and landing) and the airplane is under manual control (not autopilot). Under these conditions the system is triggered whenever an aerodynamic quantity called angle of attack, or AoA, rises into a dangerous range.

Angle of attack is a concept subtle enough to merit a diagram:

The various angles at issue are rotations of the aircraft body around the pitch axis, a line parallel to the wings, perpendicular to the fuselage, and passing through the airplane’s center of gravity. If you’re sitting in an exit row, the pitch axis might run right under your seat. Rotation about the pitch axis tilts the nose up or down. *Pitch attitude* is defined as the angle of the fuselage with respect to a horizontal plane. The *flight-path angle* is measured between the horizontal plane and the aircraft’s velocity vector, thus showing how steeply it is climbing or descending. *Angle of attack* is the difference between pitch attitude and flight-path angle. It is the angle at which the aircraft is moving through the surrounding air (assuming the air itself is motionless, *i.e.*, no wind).

AoA affects both lift (the upward force opposing the downward tug of gravity) and drag (the dissipative force opposing forward motion and the thrust of the engines). As AoA increases from zero, lift is enhanced because of air impinging on the underside of the wings and fuselage. For the same reason, however, drag also increases. As the angle of attack grows even steeper, the flow of air over the wings becomes turbulent; beyond that point lift diminishes but drag continues increasing. That’s where the stall sets in. The critical angle for a stall depends on speed, weight, and other factors, but usually it’s no more than 15 degrees.

Neither the Lion Air nor the Ethiopian flight was ever in danger of stalling, so if MCAS was activated, it must have been by mistake. The working hypothesis mentioned in many press accounts is that the system received and acted upon erroneous input from a failed AoA sensor.

A sensor to measure angle of attack is conceptually simple. It’s essentially a weathervane poking out into the airstream. In the photo below, the angle-of-attack sensor is the small black vane just forward of the “737 MAX” legend. Hinged at the front, the vane rotates to align itself with the local airflow and generates an electrical signal that represents the vane’s angle with respect to the axis of the fuselage. The 737 MAX has two angle-of-attack vanes, one on each side of the nose. (The protruding devices above the AoA vane are pitot tubes, used to measure air speed. Another device below the word MAX is probably a temperature sensor.)

Angle of attack was not among the variables displayed to the pilots of the Lion Air 737, but the flight data recorder did capture signals derived from the two AoA sensors:

There’s something dreadfully wrong here. The left sensor is indicating an angle of attack about 20 degrees steeper than the right sensor. That’s a huge discrepancy. There’s no plausible way those disparate readings could reflect the true state of the airplane’s motion through the air, with the left side of the nose pointing sky-high and the right side near level. One of the measurements must be wrong, and the higher reading is the suspect one. If the true angle of attack ever reached 20 degrees, the airplane would already be in a deep stall. Unfortunately, on Flight 610 MCAS was taking data only from the left-side AoA sensor. It interpreted the nonsensical measurement as a valid indicator of aircraft attitude, and worked relentlessly to correct it, up to the very moment the airplane hit the sea.

The tragedies in Jakarta and Addis Ababa are being framed as a cautionary tale of automation run amok, with computers usurping the authority of pilots. The *Washington Post* editorialized:

A second fatal airplane accident involving a Boeing 737 MAX 8 may have been a case of man vs. machine…. The debacle shows that regulators should apply extra review to systems that take control away from humans when safety is at stake.

Tom Dieusaert, a Belgian journalist who writes often on aviation and computation, offered this opinion:

What can’t be denied is that the Boeing of Flight JT610 had serious computer problems. And in the hi-tech, fly-by-wire world of aircraft manufacturers, where pilots are reduced to button pushers and passive observers, these accidents are prone to happen more in the future.

The button-pushing pilots are particularly irate. Gregory Travis, who is both a pilot and software developer, summed up his feelings in this acerbic comment:

“Raise the nose, HAL.”

“I’m sorry, Dave, I can’t do that.”

Even Donald Trump tweeted on the issue:

Airplanes are becoming far too complex to fly. Pilots are no longer needed, but rather computer scientists from MIT. I see it all the time in many products. Always seeking to go one unnecessary step further, when often old and simpler is far better. Split second decisions are….

….needed, and the complexity creates danger. All of this for great cost yet very little gain. I don’t know about you, but I don’t want Albert Einstein to be my pilot. I want great flying professionals that are allowed to easily and quickly take control of a plane!

There’s considerable irony in the complaint that the 737 is too automated; in many respects the aircraft is in fact quaintly old-fashioned. The basic design goes back more than 50 years, and even in the latest MAX models quite a lot of 1960s technology survives. The primary flight controls are hydraulic, with a spider web of high-pressure tubing running directly from the control yokes in the cockpit to the ailerons, elevator, and rudder. If the hydraulic systems should fail, there’s a purely mechanical backup, with cables and pulleys to operate the various control surfaces. For stabilizer trim the primary actuator is an electric motor, but again there’s a mechanical fallback, with crank wheels near the pilots’ knees pulling on cables that run all the way back to the tail.

Other aircraft are much more dependent on computers and electronics. The 737′s principal competitor, the Airbus A320, is a thoroughgoing fly-by-wire vehicle. The pilot flies the computer, and the computer flies the airplane. Specifically, the pilot decides where to go—up, down, left, right—but the computer decides how to get there, choosing which control surfaces to deflect and by how much. Boeing’s own more recent designs, the 777 and 787, also rely on digital controls. Indeed, the latest models from both companies go a step beyond fly-by-wire to fly-by-network. Most of the communication from sensors to computers and onward to control surfaces consists of digital packets flowing through a variant of Ethernet. The airplane is a computer peripheral.

Thus if you want to gripe about the dangers and indignities of automation on the flight deck, the 737 is not the most obvious place to start. And a Luddite campaign to smash all the avionics and put pilots back in the seat of their pants would be a dangerously off-target response to the current predicament. There’s no question the 737 MAX has a critical problem. It’s a matter of life and death for those who would fly in it and possibly also for the Boeing Company. But the problem didn’t start with MCAS. It started with earlier decisions that made MCAS necessary. Furthermore, the problem may not end with the remedy that Boeing has proposed—a software update that will hobble MCAS and leave more to the discretion of pilots.

The 737 flew its first passengers in 1968. It was (and still is) the smallest member of the Boeing family of jet airliners, and it is also the most popular by far. More than 10,000 have been sold, and Boeing has orders for another 4,600. Of course there have been changes over the years, especially to engines and instruments. A 1980s update came to be known as 737 Classic, and a 1997 model was called 737 NG, for “next generation.” (Now, with the MAX, the NG has become the *previous* generation.) Through all these revisions, however, the basic structure of the airframe has hardly changed.

Ten years ago, it looked like the 737 had finally come to the end of its life. Boeing announced it would develop an all-new design as a replacement, with a hull built of lightweight composite materials rather than aluminum. Competitive pressures forced a change of course. Airbus had a head start on the A320neo, an update that would bring more efficient engines to their entry in the same market segment. The revised Airbus would be ready around 2015, whereas Boeing’s clean-slate project would take a decade. Customers were threatening to defect. In particular, American Airlines—long a Boeing loyalist—was negotiating a large order of A320neos.

In 2011 Boeing scrapped the plan for an all-new design and elected to do the same thing Airbus was doing: bolt new engines onto an old airframe. This would eliminate most of the up-front design work, as well as the need to build tooling and manufacturing facilities. Testing and certification by the FAA would also go quicker, so that the first deliveries might be made in five or six years, not too far behind Airbus.

*(left)* Bryan via Wikimedia, CC BY 2.0; *(right)* Steve Lynes via Wikimedia, CC BY 2.0.

The original 1960s 737 had two cigar-shaped engines, long and skinny, tucked up under the wings *(left photo above)*. Since then, jet engines have grown fat and stubby. They derive much of their thrust not from the jet exhaust coming out of the tailpipe but from “bypass” air moved by a large-diameter fan. Such engines would scrape on the ground if they were mounted under the wings of the 737; instead they are perched on pylons that extend forward from the leading edge of the wing. The engines on the MAX models *(right photo)* are the fattest yet, with a fan 69 inches in diameter. Compared with the NG series, the MAX engines are pushed a few inches farther forward and hang a few inches lower.

A New York Times article by David Gelles, Natalie Kitroeff, Jack Nicas, and Rebecca R. Ruiz describes the plane’s development as hurried and hectic.

Months behind Airbus, Boeing had to play catch-up. The pace of the work on the 737 Max was frenetic, according to current and former employees who spoke with

The New York Times…. Engineers were pushed to submit technical drawings and designs at roughly double the normal pace, former employees said.

The *Times* article also notes: “Although the project had been hectic, current and former employees said they had finished it feeling confident in the safety of the plane.”

Sometime during the development of the MAX series, Boeing got an unpleasant surprise. The new engines were causing unwanted pitch-up movements under certain flight conditions. When I first read about this problem, soon after the Lion Air crash, I found the following explanation is an article by Sean Broderick and Guy Norris in *Aviation Week and Space Technology* (Nov. 26–Dec. 9, 2018, pp. 56–57):

Like all turbofan-powered airliners in which the thrust lines of the engines pass below the center of gravity (CG), any change in thrust on the 737 will result in a change of flight path angle caused by the vertical component of thrust.

In other words, the low-slung engines not only push the airplane forward but also tend to twirl it around the pitch axis. It’s like a motorcycle doing wheelies. Because the MAX engines are mounted farther below and in front of the center of gravity, they act through a longer lever arm and cause more severe pitch-up motions.

I found more detail on this effect in an earlier *Aviation Week* article, a 2017 pilot report by Fred George, describing his first flight at the controls of the new MAX 8.

The aircraft has sufficient natural speed stability through much of its flight envelope. But with as much as 58,000 lb. of thrust available from engines mounted well below the center of gravity, there is pronounced thrust-versus-pitch coupling at low speeds, especially with aft center of gravity (CG) and at light gross weights. Boeing equips the aircraft with a speed-stability augmentation function that helps to compensate for the coupling by automatically trimming the horizontal stabilizer according to indicated speed, thrust lever position and CG. Pilots still must be aware of the effect of thrust changes on pitching moment and make purposeful control-wheel and pitch-trim inputs to counter it.

The reference to an “augmentation function” that works by “automatically trimming the horizontal stabilizer” sounded awfully familiar, but it turns out this is *not* MCAS. The system that compensates for thrust-pitch coupling is known as *speed-trim*. Like MCAS, it works “behind the pilot’s back,” making adjustments to control surfaces that were not directly commanded. There’s yet another system of this kind called *mach-trim* that silently corrects a different pitch anomally when the aircraft reaches transonic speeds, at about mach 0.6. Neither of these systems is new to the MAX series of aircraft; they have been part of the control algorithm at least since the NG came out in 1997. MCAS runs on the same computer as speed-trim and mach-trim and is part of the same software system, but it is a distinct function. And according to what I’ve been reading in the past few weeks, it addresses a different problem—one that seems more sinister.

Most aircraft have the pleasant property of static stability. When an airplane is properly trimmed for level flight, you can let go of the controls—at least briefly—and it will continue on a stable path. Moreover, if you pull back on the control yoke to point the nose up, then let go again, the pitch angle should return to neutral. The layout of the airplane’s various airfoil surfaces accounts for this behavior. When the nose goes up, the tail goes down, pushing the underside of the horizontal stabilizer into the airstream. The pressure of the air against this tail surface provides a restoring force that brings the tail back up and the nose back down. (That’s why it’s called a *stabilizer*!) This negative feedback loop is built in to the structure of the airplane, so that any departure from equilibrium creates a force that opposes the disturbance.

However, the tail surface, with its helpful stablizing influence, is not the only structure that affects the balance of aerodynamic forces. Jet engines are not designed to contribute lift to the airplane, but at high angles of attack they can do so, as the airstream impinges on the lower surface of each engine’s outer covering, or nacelle. When the engines are well forward of the center of gravity, the lift creates a pitch-up turning moment. If this moment exceeds the counterbalancing force from the tail, the aircraft is unstable. A nose-up attitude generates forces that raise the nose still higher, and positive feedback takes over.

Is the 737 MAX vulnerable to such runaway pitch excursions? The possibility had not occurred to me until I read a commentary on MCAS on the Boeing 737 Technical Site, a web publication produced by Chris Brady, a former 737 pilot and flight instructor. He writes:

MCAS is a longitudinal stability enhancement. It is not for stall prevention or to make the MAX handle like the NG; it was introduced to counteract the non-linear lift of the LEAP-1B engine nacelles and give a steady increase in stick force as AoA increases. The LEAP engines are both larger and relocated slightly up and forward from the previous NG CFM56-7 engines to accommodate their larger fan diameter. This new location and size of the nacelle cause the vortex flow off the nacelle body to produce lift at high AoA; as the nacelle is ahead of the CofG this lift causes a slight pitch-up effect (ie a reducing stick force) which could lead the pilot to further increase the back pressure on the yoke and send the aircraft closer towards the stall. This non-linear/reducing stick force is not allowable under

FAR = Federal Air Regulations. Part 25 deals with airworthiness standards for transport category airplanes. FAR §25.173 “Static longitudinal stability”. MCAS was therefore introduced to give an automatic nose down stabilizer input during steep turns with elevated load factors (high AoA) and during flaps up flight at airspeeds approaching stall.

Brady cites no sources for this statement, and as far as I know Boeing has neither confirmed nor denied. But *Aviation Week*, which earlier mentioned the thrust-pitch linkage, has more recently (issue of March 20) gotten behind the nacelle-lift instability hypothesis:

The MAX’s larger CFM Leap 1 engines create more lift at high AOA and give the aircraft a greater pitch-up moment than the CFM56-7-equipped NG. The MCAS was added as a certification requirement to minimize the handling difference between the MAX and NG.

Assuming the Brady account is correct, an interesting question is when Boeing noticed the instability. Were the designers aware of this hazard from the outset? Did it emerge during early computer simulations, or in wind tunnel testing of scale models? A story by Dominic Gates in the *Seattle Times* hints that Boeing may not have recognized the severity of the problem until flight tests of the first completed aircraft began in 2015.

According to Gates, the safety analysis that Boeing submitted to the FAA specified that MCAS would be allowed to move the horizontal stabilizer by no more than 0.6 degree. In the airplane ultimately released to the market, MCAS can go as far as 2.5 degrees, and it can act repeatedly until reaching the mechanical limit of motion at about 5 degrees. Gates writes:

That limit was later increased after flight tests showed that a more powerful movement of the tail was required to avert a high-speed stall, when the plane is in danger of losing lift and spiraling down.

The behavior of a plane in a high angle-of-attack stall is difficult to model in advance purely by analysis and so, as test pilots work through stall-recovery routines during flight tests on a new airplane, it’s not uncommon to tweak the control software to refine the jet’s performance.

The high-AoA instability of the MAX appears to be a property of the aerodynamic form of the entire aircraft, and so a direct way to suppress it would be to alter that form. For example, enlarging the tail surface might restore static stability. But such airframe modifications would have delayed the delivery of the airplane, especially if the need for them was discovered only after the first prototypes were already flying. Structural changes might also jeopardize inclusion of the new model under the old type certificate. Modifying software instead of aluminum must have looked like an attractive alternative. Someday, perhaps, we’ll learn how the decision was made.

By the way, according to Gates, the safety document filed with the FAA specifying a 0.6 degree limit has yet to be amended to reflect the true range of MCAS commands.

Instability is not necessarily the kiss of death in an airplane. There have been at least a few successful unstable designs, starting with the 1903 Wright Flyer. The Wright brothers deliberately put the horizontal stabilizer in front of the wing rather than behind it because their earlier experiments with kites and gliders had shown that what we call stability can also be described as sluggishness. The Flyer’s forward control surfaces (known as canards) tended to amplify any slight nose-up or nose-down motions. Maintaining a steady pitch attitude demanded high alertness from the pilot, but it also allowed the airplane to respond more quickly when the pilot *wanted* to pitch up or down. (The pros and cons of the design are reviewed in a 1984 paper by Fred E. C. Culick and Henry R. Jex.)

Another dramatically unstable aircraft was the Grumman X-29, a research platform designed in the 1980s. The X-29 had its wings on backwards; to make matters worse, the primary surfaces for pitch control were canards mounted in front of the wings, as in the Wright Flyer. The aim of this quirky project was to explore designs with exceptional agility, sacrificing static stability for tighter maneuvering. No unaided human pilot could have mastered such a twitchy vehicle. It required a digital fly-by-wire system that sampled the state of the airplane and adjusted the control surfaces up to 80 times per second. The controller was successful—perhaps too much so. It allowed the airplane to be flown safely, but in taming the instability it also left the plane with rather tame handling characteristics.

I have a glancing personal connection with the X-29 project. In the 1980s I briefly worked as an editor with members of the group at Honeywell who designed and built the X-29 control system. I helped prepare publications on the control laws and on their implementation in hardware and software. That experience taught me just enough to recognize something odd about MCAS: It is way too slow to be suppressing aerodynamic instability in a jet aircraft. Whereas the X-29 controller had a response time of 25 milliseconds, MCAS takes 10 seconds to move the 737 stabilizer through a 2.5-degree adjustment. At that pace, it cannot possibly keep up with forces that tend to flip the nose upward in a positive feedback loop.

There’s a simple explanation. MCAS is not meant to control an unstable aircraft. It is meant to restrain the aircraft from entering the regime where it becomes unstable. This is the same strategy used by other mechanisms of stall prevention—intervening before the angle of attack reaches the critical point. However, if Brady is correct about the instability of the 737 MAX, the task is more urgent for MCAS. Instability implies a steep and slippery slope. MCAS is a guard rail that bounces you back onto the road when you’re about to drive over the cliff.

Which brings up the question of Boeing’s announced plan to fix the MCAS problem. Reportedly, the revised system will not keep reactivating itself so persistently, and it will automatically disengage if it detects a large difference between the two AoA sensors. These changes should prevent a recurrence of the recent crashes. But do they provide adequate protection against the kind of mishap that MCAS was designed to prevent in the first place? With MCAS shut down, either manually or automatically, there’s nothing to stop an unwary or misguided pilot from wandering into the corner of the flight envelope where the MAX becomes unstable.

Without further information from Boeing, there’s no telling how severe the instability might be—if indeed it exists at all. The Brady article at the Boeing 737 Technical Site implies the problem is partly pilot-induced. Normally, to make the nose go higher and higher you have to pull harder and harder on the control yoke. In the unstable region, however, the resistance to pulling suddenly fades, and so the pilot may unwittingly pull the yoke to a more extreme position.

Is this human interaction a *necessary* part of the instability, or is it just an exacerbating factor? In other words, without the pilot in the loop, would there still be positive feedback causing runaway nose-up pitch? I have yet to find answers.

Another question: If the root of the problem is a deceptive change in the force resisting a nose-up movements of the control yoke, why not address that issue directly?

Even after the spurious activation of MCAS on Lion Air 610, the crash and the casualties would have been avoided if the pilots had simply turned the damn thing off. Why didn’t they? Apparently because they had never heard of MCAS, and didn’t know it was installed on the airplane they were flying, and had not received any instruction on how to disable it. There’s no switch or knob in the cockpit labeled “MCAS ON/OFF.” The Flight Crew Operation Manual does not mention it (except in a list of abbreviations), and neither did the transitional training program the pilots had completed before switching from the 737 NG to the MAX. The training consisted of either one or two hours (reports differ) with an iPad app.

Boeing’s explanation of these omissions was captured in a *Wall Street Journal* story:

One high-ranking Boeing official said the company had decided against disclosing more details to cockpit crews due to concerns about inundating average pilots with too much information—and significantly more technical data—than they needed or could digest.

To call this statement disingenuous would be disingenuous. What it is is preposterous. In the first place, Boeing did not withhold “more details”; they failed to mention the very existence of MCAS. And the too-much-information argument is silly. I don’t have access to the Flight Crew Operation Manual for the MAX, but the NG edition runs to more than 1,300 pages, plus another 800 for the Quick Reference Handbook. A few paragraphs on MCAS would not have sunk any pilot who wasn’t already drowning in TMI. Moreover, the manual carefully documents the speed-trim and mach-trim features, which seem to fall in the same category as MCAS: They act autonomously, and offer the pilot no direct interface for monitoring or adjusting them.

In the aftermath of the Lion Air accident, Boeing stated that the procedure for disabling MCAS was spelled out in the manual, even though MCAS itself wasn’t mentioned. That procedure is given in a checklist for “runaway stabilizer trim.” It is not complicated: Hang onto the control yoke, switch off the autopilot and autothrottles if they’re on; then, if the problem persists, flip two switches labeled “STAB TRIM” to the “CUTOUT” position. Only the last step will actually matter in the case of an MCAS malfunction.

This checklist is considered a “memory item”; pilots must be able to execute the steps without looking it up in the handbook. The Lion Air crew should certainly have been familiar with it. But could they recognize that it was the right checklist to apply in an airplane whose behavior was unlike anything they had seen in their training or previous 737 flying experience? According to the handbook, the condition that triggers use of the runaway checklist is “Uncommanded stabilizer trim movement occurs continuously.” The MCAS commands were not continuous but repetitive, so some leap of inference would have been needed to make this diagnosis.

By the time of the Ethiopian crash, 737 pilots everywhere knew all about MCAS and the procedure for disabling it. A preliminary report issued last week by Ethiopian Airlines indicates that after a few minutes of wrestling with the control yoke, the pilots on Flight 302 did invoke the checklist procedure, and moved the STAB TRIM switches to CUTOUT. The stabilizer then stopped responding to MCAS nose-down commands, but the pilots were unable to regain control of the airplane.

It’s not entirely clear why they failed or what was going on in the cockpit in those last minutes. One factor may be that the cutout switch disables not only automatic pitch trim movements but also manual ones requested through the buttons on the control yoke. The switch cuts all power to the electric motor that moves the stabilizer. In this situation the only way to adjust the trim is to turn the hand crank wheels near the pilots’ knees. During the crisis on Flight 302 that mechanism may have been too slow to correct the trim in time, or the pilots may have been so fixated on pulling the control yoke back with maximum force that they did not try the manual wheels. It’s also possible that they flipped the switches back to the NORMAL setting, restoring power to the stabilizer motor. The report’s narrative doesn’t mention this possibility, but the graph from the flight data recorder suggests it *(see below)*.

There’s room for debate on whether the MCAS system is a good idea when it is operating correctly, but when it activates *mistakenly* and sends an airplane diving into the sea, no one would defend it. By all appearances, the rogue behavior in both the Lion Air and the Ethiopian accidents was triggered by a malfunction in a single sensor. That’s not supposed to happen in aviation. It’s unfathomable that any aircraft manufacturer would knowingly build a vehicle in which the failure of a single part would lead to a fatal accident.

Protection against single failures comes from redundancy, and the 737 is so committed to this principle that it almost amounts to two airplanes wrapped up in a single skin. *three* of everything—sensors, computers, and actuators.

There’s one asterisk in this roster of redundancy: A device called the flight control computer, or FCC, apparently gets special treatment. There are two FCCs, but according to the Boeing 737 Technical Site only one of them operates during any given flight. All the other duplicated components run in parallel, receiving independent inputs, doing independent computations, emitting independent control actions. But for each flight just one FCC does all the work, and the other is put on standby. The scheme for choosing the active computer seems strangely arbitrary. Each day when the airplane is powered up, the left side FCC gets control for the first flight, then the right side unit takes over for the second flight of the day, and the two sides alternate until the power is shut off. After a restart, the alternation begins again with the left FCC.

Aspects of this scheme puzzle me. I don’t understand why redundant FCC units are treated differently from other components. If one FCC dies, does the other automatically take over? Can the pilots switch between them in flight? If so, would that be an effective way to combat MCAS misbehavior? I’ve tried to find answers in the manuals, but I don’t trust my interpretation of what I read.

I’ve also had a hard time learning anything about the FCC itself. I don’t know who makes it, or what it looks like, or how it is programmed. On a website called Closet Wonderfuls an item identified as a 737 flight control computer is on offer for $43.82, with free shipping.

In the context of the MAX crashes, the flight control computer is important for two reasons. First, it’s where MCAS lives; this is the computer on which the MCAS software runs. Second, the curious procedure for choosing a different FCC on alternating flights also winds up choosing which AoA sensor is providing input to MCAS. The left and right sensors are connected to the corresponding FCCs.

If the two FCCs are used in alternation, that raises an interesting question about the history of the aircraft that crashed in Indonesia. The preliminary crash report describes trouble with various instruments and controls on five flights over four days (including the fatal flight). All of the problems were on the left side of the aircraft or involved a disagreement between the left and right sides.

date | route | trouble reports | maintenance |
---|---|---|---|

Oct 26 | Tianjin → Manado | left side: no airspeed or altitude indications |
test left Stall Management and Yaw Damper computer; passed |

? | Manado → Denpasar | ? | ? |

Oct 27 | Denpasar → Manado | left side: no airspeed or altitude indications speed trim and mach trim warning lights |
test left Stall Management and Yaw Damper computer; failed reset left Air Data and Inertial Reference Unit retest left Stall Management and Yaw Damper computer; passed clean electrical connections |

Oct 27 | Manado → Denpasar | left side: no airspeed or altitude indications speed trim and mach trim warning lights autothrottle disconnect |
test left Stall Management and Yaw Damper computer; failed reset left Air Data and Inertial Reference Unit replace left AoA sensor |

Oct 28 | Denpasar → Jakarta | left/right disagree warning on airspeed and altitude stick shaker [MCAS activation] |
flush left pitot tube and static port clean electrical connectors on elevator “feel” computer |

Oct 29 | Jakarta → Pangkal Pinang | stick shaker [MCAS activation] |

Which of the five flights had the left-side FCC as active computer? The final two flights *(red)*, where MCAS activated, were both first-of-the-day flights and so presumably under control of the left FCC. For the rest it’s hard to tell, especially since maintenance operations may have entailed full shutdowns of the aircraft, which would have reset the alternation sequence.

The revised MCAS software will reportedly consult signals from both AoA sensors. What will it do with the additional information? Only one clue has been published so far: If the readings differ by more than 5.5 degrees, MCAS will shut down. What if the readings differ by 4 or 5 degrees?

The present MCAS system, with its alternating choice of left and right, has a 50 percent chance of disaster when a single random failure causes an AoA sensor to spew out falsely high data. With the same one-sided random failure, the updated MCAS will have a 100 percent chance of ignoring a pilot’s excursion into stall territory. Is that an improvement?

Although a faulty sensor should not bring down an airplane, I would still like to know what went wrong with the AoA vane.

It’s no surprise that AoA sensors can fail. They are mechanical devices operating in a harsh environment: winds exceeding 500 miles per hour and temperatures below –40. A common failure mode is a stuck vane, often caused by ice (despite a built-in de-icing heater). But a seized vane would produce a constant output, regardless of the real angle of attack, which is not the symptom seen in Flight 610. The flight data recorder shows small fluctuations in the signals from both the left and the right instruments. Furthermore, the jiggles in the two curves are closely aligned, suggesting they are both tracking the same movements of the aircraft. In other words, the left-hand sensor appears to be functioning; it’s just giving measurements offset by a constant deviation of roughly 20 degrees.

Is there some other failure mode that might produce the observed offset? Sure: Just bend the vane by 20 degrees. Maybe a catering truck or an airport jetway blundered into it. Another creative thought is that the sensor might have been installed wrong, with the entire unit rotated by 20 degrees. Several writers on a website called the Professional Pilots Rumour Network explored this possibility, but they ultimately concluded it was impossible. The manufacturer, doubtless aware of the risk, placed the mounting screws and locator pins asymmetrically, so the unit will only go into the hull opening one way.

You might get the same effect through an assembly error during the manufacture of the sensor. The vane could be incorrectly attached to the shaft, or else the internal transducer that converts angular position into an electrical signal might be mounted wrong. Did the designers also ensure that such mistakes are impossible? I don’t know; I haven’t been able to find any drawings or photographs of the sensor’s innards.

Looking for other ideas about what might have gone wrong, I made a quick, scattershot survey of FAA airworthiness directives that call for servicing or replacing AoA sensors. I found dozens of them, including several that discuss the same sensor installed on the 737 MAX (the Rosemount 0861). But none of the reports I read describes a malfunction that could cause a consistent 20-degree error.

For a while I thought that the fault might lie not in the sensor itself but farther along the data path. It could be something as simple as a bad cable or connector. Signals from the AoA sensor go to the Air Data and Inertial Reference Unit (ADIRU), where the sine and cosine components are combined and digitized to yield a number representing the measured angle of attack. The ADIRU also receives inputs from other sensors, including the pitot tubes for measuring airspeed and the static ports for air pressure. And it houses the gyroscopes and accelerometers of an inertial guidance system, which can keep track of aircraft motion without reference to external cues. (There’s a separate ADIRU for each side of the airplane.) Maybe there was a problem with the digitizer—a stuck bit rather than a stuck vane.

Further information has undermined this idea. For one thing, the AoA sensor removed by the Lion Air maintenance crew on October 27 is now in the hands of investigators. According to news reports, it was “deemed to be defective,” though I’ve heard no hint of what the defect might be. Also, it turns out that one element of the control system, the Stall Management and Yaw Damper (SMYD) computer, receives the raw sine and cosine voltages directly from the sensor, not a digitized angle calculated by the ADIRU. It is the SMYD that controls the stick-shaker function. On both the Lion Air and the Ethiopian flights the stick shaker was active almost continuously, so those undigitized sine and cosine voltages must have been indicating a high angle of attack. In other words the error already existed before the signals reached the ADIRU.

I’m still stumped by the fixed angular offset in the Lion Air data, but the question now seems a little less important. The release of the preliminary report on Ethiopian Flight 302 shows that the left-side AoA sensor on that aircraft also failed badly, but in a way that looks totally different. Here are the relevant traces from the flight data recorder:

The readings from the AoA sensors are the uppermost lines, red for the left sensor and blue for the right. At the left edge of the graph they differ somewhat when the airplane has just begun to move, but they fall into close coincidence once the roll down the runway has built up some speed. At takeoff, however, they suddenly diverge dramtically, as the left vane begins reading an utterly implausible 75 degrees nose up. Later it comes down a few degrees but otherwise shows no sign of the ripples that would suggest a response to airflow. At the very end of the flight there are some more unexplained excursions.

By the way, in this graph the light blue trace of automatic trim commands offers another clue to what might have happened in the last moments of Flight 302. Around the middle of the graph, the STAB TRIM switches were pulled, with the result that an automatic nose-down command had no effect on the stabilizer position. But at the far right, another automatic nose-down command does register in the trim-position trace, suggesting that the cutout switches may have been turned on again.

There’s so much I still don’t understand.

Puzzle 1. If the Lion Air and Ethiopian accidents were both caused by faulty AoA sensors, then there were three parts with similar defects in brand new aircraft (including the replacement sensor installed by Lion Air on October 27). A recent news item says the replacement was not a new part but one that had been refurbished by a Florida shop called XTRA Aerospace. This fact offers us somewhere else to point the accusatory finger, but presumably the two sensors installed by Boeing were not retreads, so XTRA can’t be blamed for all of them.

There are roughly 400 MAX aircraft in service, with 800 AoA sensors. Is a failure rate of 3 out of 800 unusual or unacceptable? Does that judgment depend on whether or not it’s the same defect in all three cases?

Puzzle 2. Let’s look again at the traces for pitch trim and angle of attack in the Lion Air 610 data. The conflicting manual and automatic commands in the second half of the flight have gotten lots of attention, but I’m also baffled by what was going on in the first few minutes.

During the roll down the runway, the pitch trim system was set near its maximum pitch-up position *(dark blue line)*. Immediately after takeoff, the automatic trim system began calling for further pitch-up movement, and the stabilizer probably reached its mechanical limit. At that point the pilots manually trimmed it in the pitch-down direction, and the automatic system replied with a rapid sequence of up adjustments. In other words, there was already a tug-of-war underway, but the pilots and the automated controls were pulling in directions opposite to those they would choose later on. All this happened while the flaps were still deployed, which means that MCAS could not have been active. Some other element of the control system must have been issuing those automatic pitch-up orders. Deepening the mystery, the left side AoA sensor was already feeding its spurious high readings to the left-side flight control computer. If the FCC was acting on that data, it should not have been commanding nose-up trim.

Puzzle 3. The AoA readings are not the only peculiar data in the chart from the Lion Air preliminary report. Here are the altitude and speed traces:

The left-side altitude readings *(red)* are low by at least a few hundred feet. The error looks like it might be multiplicative rather than additive, perhaps 10 percent. The left and right computed airspeeds also disagree, although the chart is too squished to allow a quantitative comparison. It was these discrepancies that initially upset the pilots of Flight 610; they could see them on their instruments. (They had no angle of attack indicators in the cockpit, so that conflict was invisible to them.)

Altitude, airspeed, and angle of attack are all measured by different sensors. Could they all have gone haywire at the same time? Or is there some common point of failure that might explain all the weird behavior? In particular, is it possible a single wonky AoA sensor caused all of this havoc? My guess is yes. The sensors for altitude and airspeed and even temperature are influenced by angle of attack. The measured speed and pressure are therefore adjusted to compensate for this confounding variable, using the output of the AoA sensor. That output was wrong, and so the adjustments allowed one bad data stream to infect all of the air data measurements.

Six months ago, I was writing about another disaster caused by an out-of-control control system. In that case the trouble spot was a natural gas distribution network in Massachusetts, where a misconfigured pressure-regulating station caused fires and explosions in more than 100 buildings, with one fatality and 20 serious injuries. I lamented: “The special pathos of technological tragedies is that the engines of our destruction are machines that we ourselves design and build.”

In a world where defective automatic controls are blowing up houses and dropping aircraft out of the sky, it’s hard to argue for *more* automation, for adding further layers of complexity to control systems, for endowing machines with greater autonomy. Public sentiment leans the other way. Like President Trump, most of us trust pilots more than we trust computer scientists. We don’t want MCAS on the flight deck. We want Chesley Sullenberger III, the hero of USAir Flight 1549, who guided his crippled A320 to a dead-stick landing in the Hudson River and saved all 155 souls on board. No amount of cockpit automation could have pulled off that feat.

Nevertheless, a cold, analytical view of the statistics suggests a different reaction. The human touch doesn’t always save the day. On the contrary, pilot error is responsible for more fatal crashes than any other cause. One survey lists pilot error as the initiating event in 40 percent of fatal accidents, with equipment failure accounting for 23 percent. No one is (yet) advocating a pilotless cockpit, but at this point in the history of aviation technology that’s a nearer prospect than a computer-free cockpit.

The MCAS system of the 737 MAX represents a particularly awkward compromise between fully manual and fully automatic control. The software is given a large measure of responsibility for flight safety and is even allowed to override the decisions of the pilot. And yet when the system malfunctions, it’s entirely up to the pilot to figure out what went wrong and how to fix it—and the fix had better be quick, before MCAS can drive the plane into the ground.

Two lost aircraft and 346 deaths are strong evidence that this design was not a good idea. But what to do about it? Boeing’s plan is a retreat from automatic control, returning more responsibility and authority to the pilots:

- Flight control system will now compare inputs from both AOA sensors. If the sensors disagree by 5.5 degrees or more with the flaps retracted, MCAS will not activate. An indicator on the flight deck display will alert the pilots.
- If MCAS is activated in non-normal conditions, it will only provide one input for each elevated AOA event. There are no known or envisioned failure conditions where MCAS will provide multiple inputs.
- MCAS can never command more stabilizer input than can be counteracted by the flight crew pulling back on the column. The pilots will continue to always have the ability to override MCAS and manually control the airplane.

A statement from Dennis Muilenburg, Boeing’s CEO, says the software update “will ensure accidents like that of Lion Air Flight 610 and Ethiopian Airlines Flight 302 never happen again.” I hope that’s true, but what about the accidents that MCAS was designed to prevent? I also hope we will not be reading about a 737 MAX that stalled and crashed because the pilots, believing MCAS was misbehaving, kept hauling back on the control yokes.

If Boeing were to take the opposite approach—not curtailing MCAS but enhancing it with still more algorithms that fiddle with the flight controls—the plan would be greeted with hoots of outrage and derision. Indeed, it seems like a terrible idea. MCAS was installed to prevent pilots from wandering into hazardous territory. A new supervisory system would keep an eye on MCAS, stepping in if it began acting suspiciously. Wouldn’t we then need another custodian to guard the custodians, ad infinitum? Moreoever, with each extra layer of complexity we get new side effects and unintended consequences and opportunities for something to break. The system becomes harder to test, and impossible to prove correct.

Those are serious objections, but the problem being addressed is also serious.

Suppose the 737 MAX didn’t have MCAS but did have a cockpit indicator of angle of attack. On the Lion Air flight, the captain would have felt the stick-shaker warning him of an incipient stall and would have seen an alarmingly high angle of attack on his instrument panel. His training would have impelled him to do the same thing MCAS did: Push the nose down to get the wings working again. Would he have continued pushing it down until the plane crashed? Surely not. He would have looked out the window, he would have cross-checked the instruments on the other side of the cockpit, and after some scary moments he would have realized it was a false alarm. (In darkness or low visibility, where the pilot can lose track of the horizon, the outcome might be worse.)

I see two lessons in this hypothetical exercise. First, erroneous sensor data is dangerous, whether the airplane is being flown by a computer or by Chesley Sullenberger. A prudently designed instrument and control system would take steps to detect (and ideally correct) such errors. At the moment, redundancy is the only defense against these failures—and in the unpatched version of MCAS even that protection is compromised. It’s not enough. One key to the superiority of human pilots is that they exercise judgment and sometimes skepticism about what the instruments tell them. That kind of reasoning is not beyond the reach of automated systems. There’s plenty of information to be exploited. For example, inconsistencies between AoA sensors, pitot tubes, static pressure ports, and air temperature probes not only signal that something’s wrong but can offer clues about *which* sensor has failed. The inertial reference unit provides an independent check on aircraft attitude; even GPS signals might be brought to bear. Admittedly, making sense of all this data and drawing a valid conclusion from it—a problem known as sensor fusion—is a major challenge.

Second, a closed-loop controller has yet another source of information: an implicit model of the system being controlled. If you change the angle of the horizontal stabilizer, the state of the airplane is expected to change in known ways—in angle of attack, pitch angle, airspeed, altitude, and in the rate of change in all these parameters. If the result of the control action is not consistent with the model, something’s not right. To persist in issuing the same commands when they don’t produce the expected results is not reasonable behavior. Autopilots include rules to deal with such situations; the lower-level control laws that run in manual-mode flight could incorporate such sanity checks as well.

I don’t claim to have the answer to the MCAS problem. And I don’t want to fly in an airplane I designed. (Neither do you.) But there’s a general principle here that I believe should be taken to heart: If an autonomous system makes life-or-death decisions based on sensor data, it ought to verify the validity of the data.

Boeing continues to insist that MCAS is “not a stall-protection function and not a stall-prevention function. It is a handling-qualities function. There’s a misconception it is something other than that.” This statement comes from Mike Sinnett, who is vice president of product development and future airplane development at Boeing; it appears in an *Aviation Week* article by Guy Norris published online April 9.

I don’t know exactly what “handling qualities” means in this context. To me the phrase connotes something that might affect comfort or aesthetics or pleasure more than safety. An airplane with different handling qualities would feel different to the pilot but could still be flown without risk of serious mishap. Is Sinnett implying something along those lines? If so—if MCAS is not critical to the safety of flight—I’m surprised that Boeing wouldn’t simply disable it temporarily, as a way of getting the fleet back in the air while they work out a permanent solution.

The Norris article also quote Sinnett as saying: “The thing you are trying to avoid is a situation where you are pulling back and all of a sudden it gets easier, and you wind up overshooting and making the nose higher than you want it to be.” That situation, with the nose higher than you want it to be, sounds to me like an airplane that might be approaching a stall.

A story by Jack Nicas, David Gelles, and James Glanz in today’s *New York Times* offers a quite different account, suggesting that “handling qualities” may have motivated the first version of MCAS, but stall risks were part of the rationale for later beefing it up.

The system was initially designed to engage only in rare circumstances, namely high-speed maneuvers, in order to make the plane handle more smoothly and predictably for pilots used to flying older 737s, according to two former Boeing employees who spoke on the condition of anonymity because of the open investigations.

For those situations, MCAS was limited to moving the stabilizer—the part of the plane that changes the vertical direction of the jet—about 0.6 degrees in about 10 seconds.

It was around that design stage that the F.A.A. reviewed the initial MCAS design. The planes hadn’t yet gone through their first test flights.

After the test flights began in early 2016, Boeing pilots found that just before a stall at various speeds, the Max handled less predictably than they wanted. So they suggested using MCAS for those scenarios, too, according to one former employee with direct knowledge of the conversations

Finally, another *Aviation Week* story by Guy Norris, published yesterday, gives a convincing account of what happened to the angle of attack sensor on Ethiopian Airlines Flight 302. According to Norris’s sources, the AoA vane was sheared off moments after takeoff, probably by a bird strike. This hypothesis is consistent with the traces extracted from the flight data recorder, including the strange-looking wiggles at the very end of the flight. I wonder if there’s hope of finding the lost vane, which shouldn’t be far from the end of the runway.

The moment I saw it, I had to stop in my tracks, grab a scratch pad, and check out the formula. The result made sense in a rough-and-ready sort of way. Since the multiplicative version of \(n!\) goes to infinity as \(n\) increases, the “divisive” version should go to zero. And \(\frac{n^2}{n!}\) does exactly that; the polynomial function \(n^2\) grows slower than the exponential function \(n!\) for large enough \(n\):

\[\frac{1}{1}, \frac{4}{2}, \frac{9}{6}, \frac{16}{24}, \frac{25}{120}, \frac{36}{720}, \frac{49}{5040}, \frac{64}{40320}, \frac{81}{362880}, \frac{100}{3628800}.\]

But why does the quotient take the particular form \(\frac{n^2}{n!}\)? Where does the \(n^2\) come from?

To answer that question, I had to revisit the long-ago trauma of learning to divide fractions, but I pushed through the pain. Proceeding from left to right through the formula in the tweet, we first get \(\frac{n}{n-1}\). Then, dividing that quantity by \(n-2\) yields

\[\cfrac{\frac{n}{n-1}}{n-2} = \frac{n}{(n-1)(n-2)}.\]

Continuing in the same way, we ultimately arrive at:

\[n \mathbin{/} (n-1) \mathbin{/} (n-2) \mathbin{/} (n-3) \mathbin{/} \cdots \mathbin{/} 1 = \frac{n}{(n-1) (n-2) (n-3) \cdots 1} = \frac{n}{(n-1)!}\]

To recover the tweet’s stated result of \(\frac{n^2}{n!}\), just multiply numerator and denominator by \(n\). (To my taste, however, \(\frac{n}{(n-1)!}\) is the more perspicuous expression.)

I am a card-carrying factorial fanboy. You can keep your fancy Fibonaccis; *this* is my favorite function. Every time I try out a new programming language, my first exercise is to write a few routines for calculating factorials. Over the years I have pondered several variations on the theme, such as replacing \(\times\) with \(+\) in the definition (which produces triangular numbers). But I don’t think I’ve ever before considered substituting \(\mathbin{/}\) for \(\times\). It’s messy. Because multiplication is commutative and associative, you can define \(n!\) simply as the product of all the integers from \(1\) through \(n\), without worrying about the order of the operations. With division, order can’t be ignored. In general, \(x \mathbin{/} y \ne y \mathbin{/}x\), and \((x \mathbin{/} y) \mathbin{/} z \ne x \mathbin{/} (y \mathbin{/} z)\).

The Fermat’s Library tweet puts the factors in descending order: \(n, n-1, n-2, \ldots, 1\). The most obvious alternative is the ascending sequence \(1, 2, 3, \ldots, n\). What happens if we define the divisive factorial as \(1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n\)? Another visit to the schoolroom algorithm for dividing fractions yields this simple answer:

\[1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n = \frac{1}{2 \times 3 \times 4 \times \cdots \times n} = \frac{1}{n!}.\]

In other words, when we repeatedly divide while counting up from \(1\) to \(n\), the final quotient is the reciprocal of \(n!\). (I wish I could put an exclamation point at the end of that sentence!) If you’re looking for a canonical answer to the question, “What do you get if you divide instead of multiplying in \(n!\)?” I would argue that \(\frac{1}{n!}\) is a better candidate than \(\frac{n}{(n - 1)!}\). Why not embrace the symmetry between \(n!\) and its inverse?

Of course there are many other ways to arrange the *n* integers in the set \(\{1 \ldots n\}\). How many ways? As it happens, \(n!\) of them! Thus it would seem there are \(n!\) distinct ways to define the divisive \(n!\) function. However, looking at the answers for the two permutations discussed above suggests there’s a simpler pattern at work. Whatever element of the sequence happens to come first winds up in the numerator of a big fraction, and the denominator is the product of all the other elements. As a result, there are really only \(n\) different outcomes—assuming we stick to performing the division operations from left to right. For any integer \(k\) between \(1\) and \(n\), putting \(k\) at the head of the queue creates a divisive \(n!\) equal to \(k\) divided by all the other factors. We can write this out as:

\[\cfrac{k}{\frac{n!}{k}}, \text{ which can be rearranged as } \frac{k^2}{n!}.\]

And thus we also solve the minor mystery of how \(\frac{n}{(n-1)!}\) became \(\frac{n^2}{n!}\) in the tweet.

It’s worth noting that all of these functions converge to zero as \(n\) goes to infinity. Asymptotically speaking, \(\frac{1^2}{n!}, \frac{2^2}{n!}, \ldots, \frac{n^2}{n!}\) are all alike.

Ta dah! Mission accomplished. Problem solved. Done and dusted. Now we know everything there is to know about divisive factorials, right?

Well, maybe there’s one more question. What does the computer say? If you take your favorite factorial algorithm, and do as the tweet suggests, replacing any appearance of the \(\times\) (or `*`

) operator with `/`

, what happens? Which of the \(n\) variants of divisive \(n!\) does the program produce?

Here’s *my* favorite algorithm for computing factorials, in the form of a Julia program:

```
function mul!(n)
if n == 1
return 1
else
return n * mul!(n - 1)
end
end
```

This is the algorithm that has introduced generations of nerds to the concept of recursion. In narrative form it says: If \(n\) is \(1\), then \(mul!(n)\) is \(1\). Otherwise, evaluate the function \(mul!(n-1)\), then multiply the result by \(n\).

The function can be written more tersely using Julia’s one-liner style of definition:.

`mul!(n) = n == 1 ? 1 : n * mul!(n - 1)`

The right side of the assignment statement is a conditional expression, or ternary operator, which has the form `a ? b : c`

. Here `a`

is a boolean test clause, which must return a value of either `true`

or `false`

. If `a`

is `true`

, clause `b`

is evaluated, and the result becomes the value of the entire expression. Otherwise clause `c`

is evaluated.

Just to be sure I’ve got this right, here are the first 10 factorials, as calculated by this program:

```
[mul!(n) for n in 1:10]
10-element Array{Int64,1}:
1
2
6
24
120
720
5040
40320
362880
3628800
```

Now let’s edit that definition and convert the single occurence of `*`

to a `/`

, leaving everything else (except the name of the function) unchanged.

`div!(n) = n == 1 ? 1 : n / div!(n - 1)`

And here’s what comes back when we run the program for values of \(n\) from \(1\) through \(20\):

```
[div!(n) for n in 1:20]
20-element Array{Real,1}:
1
2.0
1.5
2.6666666666666665
1.875
3.2
2.1875
3.657142857142857
2.4609375
4.063492063492063
2.70703125
4.432900432900433
2.9326171875
4.773892773892774
3.14208984375
5.092152292152292
3.338470458984375
5.391690662278897
3.523941040039063
5.675463855030418
```

Huh? That sure doesn’t look like it’s converging to zero—not as \(\frac{1}{n!}\) or as \(\frac{n}{n - 1}\). As a matter of fact, it doesn’t look like it’s going to converge at all. The graph below suggests the sequence is made up of two alternating components, both of which appear to be slowly growing toward infinity as well as diverging from one another.

In trying to make sense of what we’re seeing here, it helps to change the output type of the `div!`

function. Instead of applying the division operator `/`

, which returns the quotient as a floating-point number, we can substitute the `//`

operator, which returns an exact rational quotient, reduced to lowest terms.

`div!(n) = n == 1 ? 1 : n // div!(n - 1)`

Here’s the sequence of values for `n in 1:20`

:

```
20-element Array{Real,1}:
1
2//1
3//2
8//3
15//8
16//5
35//16
128//35
315//128
256//63
693//256
1024//231
3003//1024
2048//429
6435//2048
32768//6435
109395//32768
65536//12155
230945//65536
262144//46189
```

The list is full of curious patterns. It’s a double helix, with even numbers and odd numbers zigzagging in complementary strands. The even numbers are not just even; they are all powers of \(2\). Also, they appear in pairs—first in the numerator, then in the denominator—and their sequence is nondecreasing. But there are gaps; not all powers of \(2\) are present. The odd strand looks even more complicated, with various small prime factors flitting in and out of the numbers. (The primes *have* to be small—smaller than \(n\), anyway.)

This outcome took me by surprise. I had really expected to see a much tamer sequence, like those I worked out with pencil and paper. All those jagged, jitterbuggy ups and downs made no sense. Nor did the overall trend of unbounded growth in the ratio. How could you keep dividing and dividing, and wind up with bigger and bigger numbers?

At this point you may want to pause before reading on, and try to work out your own theory of where these zigzag numbers are coming from. If you need a hint, you can get a strong one—almost a spoiler—by looking up the sequence of numerators or the sequence of denominators in the Online Encyclopedia of Integer Sequences.

Here’s another hint. A small edit to the `div!`

program completely transforms the output. Just flip the final clause, changing `n // div!(n - 1)`

into `div!(n - 1) // n`

.

`div!(n) = n == 1 ? 1 : div!(n - 1) // n`

Now the results look like this:

```
10-element Array{Real,1}:
1
1//2
1//6
1//24
1//120
1//720
1//5040
1//40320
1//362880
1//3628800
```

This is the inverse factorial function we’ve already seen, the series of quotients generated when you march left to right through an ascending sequence of divisors \(1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n\).

It’s no surprise that flipping the final clause in the procedure alters the outcome. After all, we know that division is not commutative or associative. What’s not so easy to see is why the sequence of quotients generated by the original program takes that weird zigzag form. What mechanism is giving rise to those paired powers of 2 and the alternation of odd and even?

I have found that it’s easier to explain what’s going on in the zigzag sequence when I describe an iterative version of the procedure, rather than the recursive one. (This is an embarrassing admission for someone who has argued that recursive definitions are easier to reason about, but there you have it.) Here’s the program:

```
function div!_iter(n)
q = 1
for i in 1:n
q = i // q
end
return q
end
```

I submit that this looping procedure is operationally identical to the recursive function, in the sense that if `div!(n)`

and `div!_iter(n)`

both return a result for some positive integer `n`

, it will always be the same result. Here’s my evidence:

```
[div!(n) for n in 1:20] [div!_iter(n) for n in 1:20]
1 1//1
2//1 2//1
3//2 3//2
8//3 8//3
15//8 15//8
16//5 16//5
35//16 35//16
128//35 128//35
315//128 315//128
256//63 256//63
693//256 693//256
1024//231 1024//231
3003//1024 3003//1024
2048//429 2048//429
6435//2048 6435//2048
32768//6435 32768//6435
109395//32768 109395//32768
65536//12155 65536//12155
230945//65536 230945//65536
262144//46189 262144//46189
```

To understand the process that gives rise to these numbers, consider the successive values of the variables \(i\) and \(q\) each time the loop is executed. Initially, \(i\) and \(q\) are both set to \(1\); hence, after the first passage through the loop, the statement `q = i // q`

gives \(q\) the value \(\frac{1}{1}\). Next time around, \(i = 2\) and \(q = \frac{1}{1}\), so \(q\)’s new value is \(\frac{2}{1}\). On the third iteration, \(i = 3\) and \(q = \frac{2}{1}\), yielding \(\frac{i}{q} \rightarrow \frac{3}{2}\). If this is still confusing, try thinking of \(\frac{i}{q}\) as \(i \times \frac{1}{q}\). The crucial observation is that on every passage through the loop, \(q\) is inverted, becoming \(\frac{1}{q}\).

If you unwind these operations, and look at the multiplications and divisions that go into each element of the series, a pattern emerges:

\[\frac{1}{1}, \quad \frac{2}{1}, \quad \frac{1 \cdot 3}{2}, \quad \frac{2 \cdot 4}{1 \cdot 3}, \quad \frac{1 \cdot 3 \cdot 5}{2 \cdot 4} \quad \frac{2 \cdot 4 \cdot 6}{1 \cdot 3 \cdot 5}\]

The general form is:

\[\frac{1 \cdot 3 \cdot 5 \cdot \cdots \cdot n}{2 \cdot 4 \cdot \cdots \cdot (n-1)} \quad (\text{odd } n) \qquad \frac{2 \cdot 4 \cdot 6 \cdot \cdots \cdot n}{1 \cdot 3 \cdot 5 \cdot \cdots \cdot (n-1)} \quad (\text{even } n).

\]

The functions \(1 \cdot 3 \cdot 5 \cdot \cdots \cdot n\) for odd \(n\) and \(2 \cdot 4 \cdot 6 \cdot \cdots \cdot n\) for even \(n\) have a name! They are known as double factorials, with the notation \(n!!\). *n* is defined as the product of *n* and all smaller positive integers of the same parity. Thus our peculiar sequence of zigzag quotients is simply \(\frac{n!!}{(n-1)!!}\).

A 2012 article by Henry W. Gould and Jocelyn Quaintance (behind a paywall, regrettably) surveys the applications of double factorials. They turn up more often than you might guess. In the middle of the 17th century John Wallis came up with this identity:

\[\frac{\pi}{2} = \frac{2 \cdot 2 \cdot 4 \cdot 4 \cdot 6 \cdot 6 \cdots}{1 \cdot 3 \cdot 3 \cdot 5 \cdot 5 \cdot 7 \cdots} = \lim_{n \rightarrow \infty} \frac{((2n)!!)^2}{(2n + 1)!!(2n - 1)!!}\]

An even weirder series, involving the cube of a quotient of double factorials, sums to \(\frac{2}{\pi}\). That one was discovered by (who else?) Srinivasa Ramanujan.

Gould and Quaintance also discuss the double factorial counterpart of binomial coefficients. The standard binomial coefficient is defined as:

\[\binom{n}{k} = \frac{n!}{k! (n-k)!}.\]

The double version is:

\[\left(\!\binom{n}{k}\!\right) = \frac{n!!}{k!! (n-k)!!}.\]

Note that our zigzag numbers fit this description and therefore qualify as double factorial binomial coefficients. Specifically, they are the numbers:

\[\left(\!\binom{n}{1}\!\right) = \left(\!\binom{n}{n - 1}\!\right) = \frac{n!!}{1!! (n-1)!!}.\]

The regular binomial \(\binom{n}{1}\) is not very interesting; it is simply equal to \(n\). But the doubled version \(\left(\!\binom{n}{1}\!\right)\), as we’ve seen, dances a livelier jig. And, unlike the single binomial, it is not always an integer. (The only integer values are \(1\) and \(2\).)

Seeing the zigzag numbers as ratios of double factorials explains quite a few of their properties, starting with the alternation of evens and odds. We can also see why all the even numbers in the sequence are powers of 2. Consider the case of \(n = 6\). The numerator of this fraction is \(2 \cdot 4 \cdot 6 = 48\), which acquires a factor of \(3\) from the \(6\). But the denominator is \(1 \cdot 3 \cdot 5 = 15\). The \(3\)s above and below cancel, leaving \(\frac{16}{5}\). Such cancelations will happen in every case. Whenever an odd factor \(m\) enters the even sequence, it must do so in the form \(2 \cdot m\), but at that point \(m\) itself must already be present in the odd sequence.

Is the sequence of zigzag numbers a reasonable answer to the question, “What happens when you divide instead of multiply in \(n!\)?” Or is the computer program that generates them just a buggy algorithm? My personal judgment is that \(\frac{1}{n!}\) is a more intuitive answer, but \(\frac{n!!}{(n - 1)!!}\) is more interesting.

Furthermore, the mere existence of the zigzag sequence broadens our horizons. As noted above, if you insist that the division algorithm must always chug along the list of \(n\) factors in order, at each stop dividing the number on the left by the number on the right, then there are only \(n\) possible outcomes, and they all look much alike. But the zigzag solution suggests wilder possibilities. We can formulate the task as follows. Take the set of factors \(\{1 \dots n\}\), select a subset, and invert all the elements of that subset; now multiply all the factors, both the inverted and the upright ones. If the inverted subset is empty, the result is the ordinary factorial \(n!\). If *all* of the factors are inverted, we get the inverse \(\frac{1}{n!}\). And if every second factor is inverted, starting with \(n - 1\), the result is an element of the zigzag sequence.

These are only a few among the many possible choices; in total there are \(2^n\) subsets of \(n\) items. For example, you might invert every number that is prime or a power of a prime \((2, 3, 4, 5, 7, 8, 9, 11, \dots)\). For small \(n\), the result jumps around but remains consistently less than \(1\):

If I were to continue this plot to larger \(n\), however, it would take off for the stratosphere. Prime powers get sparse farther out on the number line.

Here’s a question. We’ve seen factorial variants that go to zero as \(n\) goes to infinity, such as \(1/n!\). We’ve seen other variants grow without bound as \(n\) increases, including \(n!\) itself, and the zigzag numbers. Are there any versions of the factorial process that converge to a finite bound other than zero?

My first thought was this algorithm:

```
function greedy_balance(n)
q = 1
while n > 0
q = q > 1 ? q /= n : q *= n
n -= 1
end
return q
end
```

We loop through the integers from \(n\) down to \(1\), calculating the running product/quotient \(q\) as we go. At each step, if the current value of \(q\) is greater than \(1\), we divide by the next factor; otherwise, we multiply. This scheme implements a kind of feedback control or target-seeking behavior. If \(q\) gets too large, we reduce it; too small and we increase it. I conjectured that as \(n\) goes to infinity, \(q\) would settle into an ever-narrower range of values near \(1\).

Running the experiment gave me another surprise:

That sawtooth wave is not quite what I expected. One minor peculiarity is that the curve is not symmetric around \(1\); the excursions above have higher amplitude than those below. But this distortion is more visual than mathematical. Because \(q\) is a ratio, the distance from \(1\) to \(10\) is the same as the distance from \(1\) to \(\frac{1}{10}\), but it doesn’t look that way on a linear scale. The remedy is to plot the log of the ratio:

Now the graph is symmetric, or at least approximately so, centered on \(0\), which is the logarithm of \(1\). But a larger mystery remains. The sawtooth waveform is very regular, with a period of \(4\), and it shows no obvious signs of shrinking toward the expected limiting value of \(\log q = 0\). Numerical evidence suggests that as \(n\) goes to infinity the peaks of this curve converge on a value just above \(q = \frac{5}{3}\), and the troughs approach a value just below \(q = \frac{3}{5}\). (The corresponding base-\(10\) logarithms are roughly \(\pm0.222\). I have not worked out why this should be so. Perhaps someone will explain it to me.

The failure of this greedy algorithm doesn’t mean we can’t find a divisive factorial that converges to \(q = 1\).

I have computed the optimal partitionings up to \(n = 30\), where there are a billion possibilities to choose from.

The graph is clearly flatlining. You could use the same method to force convergence to any other value between \(0\) and \(n!\).

And thus we have yet another answer to the question in the tweet that launched this adventure. What happens when you divide instead of multiply in n!? Anything you want.

]]>On my visit to Baltimore for the Joint Mathematics Meetings a couple of weeks ago, I managed to score a hotel room with a spectacular scenic view. My seventh-floor perch overlooked the Greene Street substation of the Baltimore Gas and Electric Company, just around the corner from the Camden Yards baseball stadium.

Some years ago, writing about such technological landscapes, I argued that you can understand what you’re looking at if you’re willing to invest a little effort:

At first glance, a substation is a bewildering array of hulking steel machines whose function is far from obvious. Ponderous tanklike or boxlike objects are lined up in rows. Some of them have cooling fins or fans; many have fluted porcelain insulators poking out in all directions…. If you look closer, you will find there is a logic to this mélange of equipment. You can make sense of it. The substation has inputs and outputs, and with a little study you can trace the pathways between them.

If I were writing that passage now, I would hedge or soften my claim that an electrical substation will yield its secrets to casual observation. Each morning in Baltimore I spent a few minutes peering into the Greene Street enclosure. I was able to identify all the major pieces of equipment in the open-air part of the station, and I know their basic functions. But making sense of the circuitry, finding the logic in the arrangement of devices, tracing the pathways from inputs to outputs—I have to confess, with a generous measure of chagrin, that I failed to solve the puzzle. I think I have the answers now, but finding them took more than eyeballing the hardware.

Basics first. A substation is not a generating plant. BGE does not “make” electricity here. The substation receives electric power in bulk from distant plants and repackages it for retail delivery. At Greene Street the incoming supply is at 115,000 volts (or 115 kV). The output voltage is about a tenth of that: 13.8 kV. How do I know the voltages? Not through some ingenious calculation based on the size of the insulators or the spacing between conductors. In an enlargement of one of my photos I found an identifying plate with the blurry and partially obscured but still legible notation “115/13.8 KV.”

The biggest hunks of machinery in the yard are the transformers *(photo below)*, which do the voltage conversion. Each transformer is housed in a steel tank filled with oil, which serves as both insulator and coolant. Immersed in the oil bath are coils of wire wrapped around a massive iron core. Stacks of radiator panels, with fans mounted underneath, help cool the oil when the system is under heavy load. A bed of crushed stone under the transformer is meant to soak up any oil leaks and reduce fire hazards.

Electricity enters and leaves the transformer through the ribbed gray posts, called bushings, mounted atop the casing. A bushing is an insulator with a conducting path through the middle. It works like the rubber grommet that protects the power cord of an appliance where it passes through the steel chassis. The high-voltage inputs attach to the three tallest bushings, with red caps; the low-voltage bushings, with dark gray caps, are shorter and more closely spaced. Notice that each high-voltage input travels over a single slender wire, whereas each low-voltage output has three stout conductors. That’s because reducing the voltage to one-tenth increases the current tenfold.

What about the three slender gray posts just to the left of the high-voltage bushings? They are lightning arresters, shunting sudden voltage surges into the earth to protect the transformer from damage.

Perhaps the most distinctive feature of this particular substation is what’s *not* to be seen. There are no tall towers carrying high-voltage transmission lines to the station. Clearing a right of way for overhead lines would be difficult and destructive in an urban center, so the high-voltage “feeders” run underground. In the photo at right, near the bottom left corner, a bundle of three metal-sheathed cables emerges from the earth. Each cable, about as thick as a human forearm, has a copper or aluminum conductor running down the middle, surrounded by insulation. I suspect these cables are insulated with layers of paper impregnated with oil under pressure; some of the other feeders entering the station may be of a newer design, with solid plastic insulation. Each cable plugs into the bottom of a ceramic bushing, which carries the current to a copper wire at the top. (You can tell the wire is copper because of the green patina.)

Connecting the feeder input to the transformer is a set of three hollow aluminum conductors called bus bars, held high overhead on steel stanchions and ceramic insulators. At both ends of the bus bars are mechanical switches that open like hinged doors to break the circuit. I don’t know whether these switches can be opened when the system is under power or whether they are just used to isolate components for maintenance after a feeder has been shut down. Beyond the bus bars, and hidden behind a concrete barrier, we can glimpse the bushings atop a different kind of switch, which I’ll return to below.

At this point you might be asking, why does everything come in sets of three—the bus bars, the feeder cables, the terminals on the transformer? It’s because electric power is distributed as three-phase alternating current. Each conductor carries a voltage oscillating at 60 Hertz, with the three waves offset by one-third of a cycle. If you recorded the voltage between each of the three pairs of conductors *(AB, AC, BC)*, you’d see a waveform like the one above at left.

At the other end of the conducting pathway, connected to three more bus bars on the low-voltage side of the transformer, is an odd-looking stack of three large drums. These

are current-limiting reactors (no connection with nuclear reactors). They are coils of thick conductors wound on a stout concrete armature. Under normal operating conditions they have little effect on the transmission of power, but in the milliseconds following a short circuit, the sudden rush of current generates a strong magnetic field in the coils, absorbing the energy of the fault current and preventing damage to other equipment.

So those are the main elements of the substation I was able to spot from my hotel window. They all made sense individually, and yet I realized over the course of a few days that I didn’t really understand how it all works together. My doubts are easiest to explain with the help of a bird’s eye view of the substation layout, cribbed from Google Maps:

My window vista was from off to the right, beyond the eastern edge of the compound. In the Google Maps view, the underground 115 kV feeders enter at the bottom or southern edge, and power flows northward through the transformers and the reactor coils, finally entering the building that occupies the northeast corner of the lot. Neither Google nor I can see inside this windowless building, but I know what’s in there, in a general way. That’s where the low-voltage (13.8 kV) distribution lines go underground and fan out to their various destinations in the neighborhood.

Let’s look more closely at the outdoor equipment. There are four high-voltage feeders, four transformers, and four sets of reactor coils. Apart from minor differences in geometry (and one newer-looking, less rusty transformer), these four parallel pathways all look alike. It’s a symmetric four-lane highway. Thus my first hypothesis was that four independent 115 kV feeders supply power to the station, presumably bringing it from larger substations and higher-voltage transmission lines outside the city.

However, something about the layout continued to bother me. If we label the four lanes of the highway from left to right, then on the high-voltage side, toward the bottom of the map view, it looks like there’s something connecting lanes 1 and 2 and, and there’s a similar link between lanes 3 and 4. From my hotel window the view of this device is blocked by a concrete barricade, and unfortunately the Google Maps image does not show it clearly either. (If you zoom in for a closer view, the goofy Google compression algorithm will turn the scene into a dreamscape where all the components have been draped in Saran Wrap.) Nevertheless, I’m quite sure of what I’m looking at. The device connecting the pairs of feeders is a high-voltage three-phase switch, or circuit breaker, something like the ones seen in the image at right (photographed at another substation, in Missouri.) The function of this device is essentially the same as that of a circuit breaker in your home electrical panel. You can turn it off manually to shut down a circuit, but it may also “trip” automatically in response to an overload or a short circuit. The concrete barriers flanking the two high-voltage breakers at Greene Street hint at one of the problems with such switches. Interrupting a current of hundreds of amperes at more than 100,000 volts is like stopping a runaway truck: It requires absorbing a lot of energy. The switch does not always survive the experience.

When I first looked into the Greene Street substation, I was puzzled by the *absence* of breakers at the input end of each main circuit. I expected to see them there to protect the transformers and other components from overloads or lightning strikes. I think there are breakers on the low-voltage side, tucked in just behind the transformers and thus not clearly visible from my window. But there’s nothing on the high side. I could only guess that such protection is provided by breakers near the output of the next substation upstream, the one that sends the 115 kV feeders into Greene Street.

That leaves the question of why pairs of circuits within the substation are cross-linked by breakers. I drew a simplified diagram of how things are wired up:

Two adjacent 115 kV circuits run from bottom to top; the breaker between them connects corresponding conductors—left to left, middle to middle, right to right. But what’s the point of doing so?

I had some ideas. If one transformer were out of commission, the pathway through the breaker could allow power to be rerouted through the remaining transformer (assuming it could handle the extra load). Indeed, maybe the entire design simply reflects a high level of redundancy. There are four incoming feeders and four transformers, but perhaps only two are expected to operate at any given time. The breaker provides a means of switching between them, so that you could lose a circuit (or maybe two) and still keep all the lights on. After all, this is a substation supplying power to many large facilities—the convention center (where the math meetings were held), a major hospital, large hotels, the ball park, theaters, museums, high-rise office buildings. Reliability is important here.

After further thought, however, this scheme seemed highly implausible. There are other substation layouts that would allow any of the four feeders to power any of the four transformers, allowing much greater flexibility in handling failures and making more efficient use of all the equipment. Linking the incoming feeders in pairs made no sense.

I would love to be able to say that I solved this puzzle on my own, just by dint of analysis and deduction, but it’s not true. When I got home and began looking at the photographs, I was still baffled. The answer eventually came via Google, though it wasn’t easy to find. Before revealing where I went wrong, I’ll give a couple of hints, which might be enough for you to guess the answer.

Hint 1. I was led astray by a biased sample. I am much more familiar with substations out in the suburbs or the countryside, partly because they’re easier to see into. Most of them are surrounded by a chain-link fence rather than a brick wall. But country infrastructure differs from the urban stuff.

Hint 2. I was also fooled by geometry when I should have been thinking about topology. To understand what you’re seeing in the Greene Street compound, you have to get beyond individual components and think about how it’s all connected to the rest of the network.

The web offers marvelous resources for the student of infrastructure, but finding them can be a challenge. You might suppose that the BGE website would have a list of the company’s facilities, and maybe a basic tutorial on where Baltimore’s electricity comes from. There’s nothing of the sort (although the utility’s parent company does offer thumbnail descriptions of some of their generating plants). Baltimore City websites were a little more helpful—not that they explained any details of substation operation, but they did report various legal and regulatory filings concerned with proposals for new or updated facilities. From those reports I learned the names of several BGE installations, which I could take back to Google to use as search terms.

One avenue I pursued was figuring out where the high-voltage feeders entering Greene Street come from. I discovered a substation called Pumphrey about five miles south of the city, near BWI airport, which seemed to be a major nexus of transmission lines. In particular, four 115 kV feeders travel north from Pumphrey to a substation in the Westport neighborhood, which is about a mile south of downtown. The Pumphrey-Westport feeders are overhead lines, and I had seen them already. Their right of way parallels the light rail route I had taken into town from the airport. Beyond the Westport substation, which is next to a light rail stop of the same name, the towers disappear. An obvious hypothesis is that the four feeders dive underground at Westport and come up at Greene Street. This guess was partly correct: Power does reach Greene Street from Westport, but not exclusively.

At Westport BGE has recently built a small, gas-fired generating plant, to help meet peak demands. The substation is also near the Baltimore RESCO waste-to-energy power plant *(photo above)*, which has become a local landmark. (It’s the only garbage burner I know that turns up on postcards sold in tourist shops.) Power from both of these sources could also make its way to the Greene Street substation, via Westport.

I finally began to make sense of the city’s wiring diagram when I stumbled upon some documents published by the PJM Interconnection, the administrator and coordinator of the power “pool” in the mid-Atlantic region. PJM stands for Pennsylvania–New Jersey–Maryland, but it covers a broader territory, including Delaware, Ohio, West Virginia, most of Virginia, and parts of Kentucky, Indiana, Michigan, and Illinois. Connecting to such a pool has important advantages for a utility. If an equipment failure means you can’t meet your customers’ demands for electricity, you can import power from elsewhere in the pool to make up the shortage; conversely, if you have excess generation, you can sell the power to another utility. The PJM supervises the market for such exchanges.

The idea behind power pooling is that neighbors can prop each other up in times of trouble; however, they can also knock each other down. As a condition of membership in the pool, utilities have to maintain various standards for engineering and reliability. PJM committees review plans for changes or additions to a utility’s network. It was a set of Powerpoint slides prepared for one such committee that first alerted me to my fundamental misconception. One of the slides included the map below, tracing the routes of 115 kV feeders *(green lines)* in and around downtown Baltimore.

I had been assuming—even though I should have known better—that the distribution network is essentially treelike, with lines radiating from each node to other nodes but never coming back together. For low-voltage distribution lines in sparsely settled areas, this assumption is generally correct. If you live in the suburbs or in a small town, there is one power line that runs from the local substation to your neighborhood; if a tree falls on it, you’re in the dark until the problem is fixed. There is no alternative route of supply. But that is *not* the topology of higher-voltage circuits. The Baltimore network consists of rings, where power can reach most nodes by following either of two pathways.

In the map we can see the four 115 kV feeders linking Pumphrey to Westport. From Westport, two lines run due north to Greene Street, then make a right turn to another station named Concord Street.

This double-ring architecture calls for a total reinterpretation of how the Greene Street substation works. I had imagined the four 115 kV inputs as four lanes of one-way traffic, all pouring into the substation and dead-ending in the four transformers. In reality we have just two roadways, both of which enter the substation and then leave again, continuing on to further destinations. And they are not one-way; they can both carry traffic in either direction. The transformers are like exit ramps that siphon off a portion of the traffic while the main stream passes by.

At Greene Street, two of the underground lines entering the compound come from Westport, but the other two proceed to Concord Street, the next station around the ring. What about the breakers that sit between the incoming and outgoing branches of each circuit? They open up the ring to isolate any section that experiences a serious failure. For example, a short circuit in one of the cables running between Greene Street and Concord Street would cause breakers at both of those stations to open up, but both stations would continue to receive power coming around the other branch of the loop.

This revised interpretation was confirmed by another document made available by PJM, this one written by BGE engineers as an account of their engineering practices for transmission lines and substations. It includes a schematic diagram of a typical downtown Baltimore substation. The diagram makes no attempt to reproduce the geometric layout of the components; it rearranges them to make the topology clearer.

The two 115 kV feeders that run through the substation are shown as horizontal lines; the solid black squares in the middle are the breakers that join the pairs of feeders and thereby close the two rings that run through all the downtown substations. The transformers are the W-shaped symbols at the ends of the branch lines.

A mystery remains. The symbol represents a disconnect switch, a rather simple mechanical device that generally cannot be operated when the power line is under load. The symbol is identified in the BGE document as a *circuit switcher*, a more elaborate device capable of interrupting a heavy current. In the Greene Street photos, however, the switches at the two ends of the high-voltage bus bars appear almost identical. I’m not seeing any circuit switchers there. But, as should be obvious by now, I’m capable of misinterpreting what I see.

Before digging into the dynamics, however, let us pause for a few words about the man himself, drawn largely from the obituaries in the *New York Times* and the *Harvard Crimson*.

Glauber was a member of the first class to graduate from the Bronx High School of Science, in 1941. From there he went to Harvard, but left in his sophomore year, at age 18, to work in the theory division at Los Alamos, where he helped calculate the critical mass of fissile material needed for a bomb. After the war he finished his degree at Harvard and went on to complete a PhD under Julian Schwinger. After a few brief adventures in Princeton and Pasadena, he was back at Harvard in 1952 and never left. A poignant aspect of his life is mentioned briefly in a 2009 interview, where Glauber discusses the challenge of sustaining an academic career while raising two children as a single parent.

Here’s a glimpse of Glauber dynamics in action. Click the *Go* button, then try fiddling with the slider.

In the computer program that drives this animation, the slider controls a variable representing temperature. At high temperature (slide the control all the way to the right), you’ll see a roiling, seething mass of colored squares, switching rapidly and randomly between light and dark shades. There are no large-scale or long-lived structures.

What we’re looking at here is a simulation of a model of a ferromagnet—the kind of magnet that sticks to the refrigerator. The model was introduced almost 100 years ago by Wilhelm Lenz and his student Ernst Ising. They were trying to understand the thermal behavior of ferromagnetic materials such as iron. If you heat a block of magnetized iron above a certain temperature, called the Curie point, it loses all traces of magnetization. Slow cooling below the Curie point allows it to spontaneously magnetize again, perhaps with the poles in a different orientation. The onset of ferromagnetism at the Curie point is an abrupt phase transition.

Lenz and Ising created a stripped-down model of a ferromagnet. In the two-dimensional version shown here, each of the small squares represents the spin vector of an unpaired electron in an iron atom. The vector can point in either of two directions, conventionally called *up* and *down*, which for graphic convenience are represented by two contrasting colors. There are \(100 \times 100 = 10{,}000\) spins in the array. This would be a minute sample of a real ferromagnet. On the other hand, the system has \(2^{10{,}000}\) possible states—quite an enormous number.

The essence of ferromagnetism is that adjacent spins “prefer” to point in the same direction. To put that more formally: The energy of neighboring spins is lower when they are parallel, rather than antiparallel. For the array as a whole, the energy is minimized if all the spins point the same way, either up or down. Each spin contributes a tiny magnetic moment. When the spins are parallel, all the moments add up and the system is fully magnetized.

If energy were the only consideration, the Ising model would always settle into a magnetized configuration, but there is a countervailing influence: Heat tends to randomize the spin directions. At infinite temperature, thermal fluctuations completely overwhelm the spins’ tendency to align, and all states are equally likely. Because the vast majority of those \(2^{10{,}000}\) configurations have nearly equal numbers of *up* and *down* spins, the magnetization is negligible. At zero temperature, nothing prevents the system from condensing into the fully magnetized state. The interval between these limits is a battleground where energy and entropy contend for supremacy. Clearly, there must be a transition of some kind. For Lenz and Ising in the 1920s, the crucial question was whether the transition comes at a sharply defined critical temperature, as it does in real ferromagnets. A more gradual progression from one regime to the other would signal the model’s failure to capture important aspects of ferromagnet physics.

In his doctoral dissertation Ising investigated the one-dimensional version of the model—a chain or ring of spins, each one holding hands with its two nearest neighbors. The result was a disappointment: He found no abrupt phase transition. And he speculated that the negative result would also hold in higher dimensions. The Ising model seemed to be dead on arrival.

It was revived a decade later by Rudolf Peierls, who gave suggestive evidence for a sharp transition in the two-dimensional lattice. Then in 1944 Lars Onsager “solved” the two-dimensional model, showing that the phase transition does exist. The phase diagram looks like this:

As the system cools, the salt-and-pepper chaos of infinite temperature evolves into a structure with larger blobs of color, but the *up* and *down* spins remain balanced on average (implying zero magnetization) down to the critical temperature \(T_C\). At that point there is a sudden bifurcation, and the system will follow one branch or the other to full magnetization at zero temperature.

If a model is classified as *solved*, is there anything more to say about it? In this case, I believe the answer is yes. The solution to the two-dimensional Ising model gives us a prescription for calculating the probability of seeing any given configuration at any given temperature. That’s a major accomplishment, and yet it leaves much of the model’s behavior unspecified. The solution defines the probability distribution at equilibrium—after the system has had time to settle into a statistically stable configuration. It doesn’t tell us anything about how the lattice of spins reaches that equilibrium when it starts from an arbitrary initial state, or how the system evolves when the temperature changes rapidly.

It’s not just the solution to the model that has a few vague spots. When you look at the finer details of how spins interact, the model itself leaves much to the imagination. When a spin reacts to the influence of its nearest neighbors, and those neighbors are also reacting to one another, does everything happen all at once? Suppose two antiparallel spins both decide to flip at the same time; they will be left in a configuration that is still antiparallel. It’s hard to see how they’ll escape repeating the same dance over and over, like people who meet head-on in a corridor and keep making mirror-image evasive maneuvers. This kind of standoff can be avoided if the spins act sequentially rather than simultaneously. But if they take turns, how do they decide who goes first?

Within the intellectual traditions of physics and mathematics, these questions can be dismissed as foolish or misguided. After all, when we look at the procession of the planets orbiting the sun, or at the colliding molecules in a gas, we don’t ask who takes the first step; the bodies are all in continuous and simultaneous motion. Newton gave us a tool, calculus, for understanding such situations. If you make the steps small enough, you don’t have to worry so much about the sequence of marching orders.

However, if you want to write a computer program simulating a ferromagnet (or simulating planetary motions, for that matter), questions of sequence and synchrony cannot be swept aside. With conventional computer hardware, “let everything happen at once” is not an option. The program must consider each spin, one at a time, survey the surrounding neighborhood, apply an update rule that’s based on both the state of the neighbors and the temperature, and then decide whether or not to flip. Thus the program must choose a sequence in which to visit the lattice sites, as well as a sequence in which to visit the neighbors of each site, and those choices can make a difference in the outcome of the simulation. So can other details of implementation. Do we look at all the sites, calculate their new spin states, and then update all those that need to be flipped? Or do we update each spin as we go along, so that spins later in the sequence will see an array already modified by earlier actions? The original definition of the Ising model is silent on such matters, but the programmer must make a commitment one way or another.

This is where Glauber dynamics enters the story. Glauber presented a version of the Ising model that’s somewhat more explicit about how spins interact with one another and with the “heat bath” that represents the influence of temperature. It’s a theory of Ising *dynamics* because he describes the spin system not just at equilibrium but also during transitional stages. I don’t know if Glauber was the first to offer an account of Ising dynamics, but the notion was certainly not commonplace in 1963.

There’s no evidence Glauber was thinking of his method as an algorithm suitable for computer implementation. The subject of simulation doesn’t come up in his 1963 paper, where his primary aim is to find analytic expressions for the distribution of *up* and *down* spins as a function of time. (He did this only for the one-dimensional model.) Nevertheless, Glauber dynamics offers an elegant approach to programming an interactive version of the Ising model. Assume we have a lattice of \(N\) spins. Each spin \(\sigma\) is indexed by its coordinates \(x, y\) and takes on one of the two values \(+1\) and \(-1\). Thus flipping a spin is a matter of multiplying \(\sigma\) by \(-1\). The algorithm for a updating the lattice looks like this:

Repeat \(N\) times:

- Choose a spin \(\sigma_{x, y}\) at random.
- Sum the values of the four neighboring spins, \(S = \sigma_{x+1, y} + \sigma_{x-1, y} + \sigma_{x, y+1} + \sigma_{x, y-1}\). The possible values of \(S\) are \(\{-4, -2, 0, +2, +4\}\).
- Calculate \(\Delta E = 2 \, \sigma_{x, y} \, S\), the change in interaction energy if \(\sigma_{x, y}\) were to flip.
- If \(\Delta E \lt 0\), set \(\sigma_{x, y} = -\sigma_{x, y}\).
- Otherwise, set \(\sigma_{x, y} = -\sigma_{x, y}\) with probability \(\exp(-\Delta E/T)\), where \(T\) is the temperature.
Display the updated lattice.

Step 4 says: If flipping a spin will reduce the overall energy of the system, flip it. Step 5 says: Even if flipping a spin raises the energy, go ahead and flip it in a randomly selected fraction of the cases. The probability of such spin flips is the Boltzmann factor \(\exp(-\Delta E/T)\). This quantity goes to \(0\) as the temperature \(T\) falls to \(0\), so that energetically unfavorable flips are unlikely in a cold lattice. The probability approaches \(1\) as \(T\) goes to infinity, which is why the model is such a seething mass of fluctuations at high temperature.

(If you’d like to take a look at real code rather than pseudocode—namely the JavaScript program running the simulation above—it’s on GitHub.)

Glauber dynamics belongs to a family of methods called Markov chain Monte Carlo algorithms (MCMC). The idea of Markov chains was an innovation in probability theory in the early years of the 20th century, extending classical probability to situations where the the next event depends on the current state of the system. Monte Carlo algorithms emerged at post-war Los Alamos, not long after Glauber left there to resume his undergraduate curriculum. He clearly kept up with the work of Stanislaw Ulam and other former colleagues in the Manhattan Project.

Within the MCMC family, the distinctive feature of Glauber dynamics is choosing spins at random. The obvious alternative is to march methodically through the lattice by columns and rows, examining every spin in turn. That procedure can certainly be made to work, but it requires care in implementation. At low temperature the Ising process is very nearly deterministic, since unfavorable flips are extremely rare. When you combine a deterministic flip rule with a deterministic path through the lattice, it’s easy to get trapped in recurrent patterns. For example, a subtle bug yields the same configuration of spins on every step, shifted left by a single lattice site, so that the pattern seems to slide across the screen. Another spectacular failure gives rise to a blinking checkerboard, where every spin is surrounded by four opposite spins and flips on every time step. Avoiding these errors requires much fussy attention to algorithmic details. (My personal experience is that the first attempt is never right.)

Choosing spins by throwing random darts at the lattice turns out to be less susceptible to clumsy mistakes. Yet, at first glance, the random procedure seems to have hazards of its own. In particular, choosing 10,000 spins at random from a lattice of 10,000 sites does *not* guarantee that every site will be visited once. On the contrary, a few sites will be sampled six or seven times, and you can expect that 3,679 sites (that’s \(1/e \times 10{,}000)\) will not be visited at all. Doesn’t that bias distort the outcome of the simulation? No, it doesn’t. After many iterations, all the sites will get equal attention.

The nasty bit in all Ising simulation algorithms is updating pairs of adjacent sites, where each spin is the neighbor of the other. Which one goes first, or do you try to handle them simultaneously? The column-and-row ordering maximizes exposure to this problem: Every spin is a member of such a pair. Other sequential algorithms—for example, visiting all the black squares of a checkerboard followed by all the white squares—avoid these confrontations altogether, never considering two adjacent spins in succession. Glauber dynamics is the Goldilocks solution. Pairs of adjacent spins do turn up as successive elements in the random sequence, but they are rare events. Decisions about how to handle them have no discernible influence on the outcome.

Years ago, I had several opportunities to meet Roy Glauber. Regrettably, I failed to take advantage of them. Glauber’s office at Harvard was in the Lyman Laboratory of Physics, a small isthmus building connecting two larger halls. In the 1970s I was a frequent visitor there, pestering people to write articles for *Scientific American*. It was fertile territory; for a few years, the magazine found more authors per square meter in Lyman Lab than anywhere else in the world. But I never knocked on Glauber’s door. Perhaps it’s just as well. I was not yet equipped to appreciate what he had to say.

Now I can let him have the last word. This is from the introduction to the paper that introduced Glauber dynamics:

]]>If the mathematical problems of equilibrium statistical mechanics are great, they are at least relatively well-defined. The situation is quite otherwise in dealing with systems which undergo large-scale changes with time. The principles of nonequilibrium statistical mechanics remain in largest measure unformulated. While this lack persists, it may be useful to have in hand whatever precise statements can be made about the time-dependent hehavior of statistical systems, however simple they may be.

Last spring a pair of robins built two-and-a-half nests on a sheltered beam just outside my office door. They raised two chicks that fledged by the end of June, and then two more in August. Both clutches of eggs were incubated in the same nest *(middle photo below)*, which was pretty grimy by the end of the season. A second nest *(upper photo)* served as a hangout for the nonbrooding parent. I came to think of it as the man-cave, although I’m not at all sure about the sex of those birds. As for the half nest, I don’t know why that project was abandoned, or why it was started in the first place.

Elsewhere, a light fixture in the carport has served as a nesting platform for a phoebe each summer we’ve lived here. Is it the same bird every year? I like to think so, but if I can’t even identify a bird’s sex I have little hope of recognizing individuals. This year, after the tenant decamped, I discovered an egg that failed to hatch.

We also had house wrens in residence—noisy neighbors, constantly partying or quarreling, I can never tell the difference. It was like living next to a frat house. I have no photo of their dwelling: It fell apart in my hands.

Under the eaves above our front door we hosted several small colonies of paper wasps. All summer I watched the slow growth of these structures with their appealing symmetries and their equally interesting imperfections. (Skilled labor shortage? Experiments in noneuclidean geometry?) I waited until after the first frost to cut down the nests, thinking they were abandoned, but I discovered a dozen moribund wasps still huddling behind the largest apartment block. They were probably fertile females looking for a place to overwinter. If they survive, they’ll likely come back to the same spot next year—or so I’ve learned from Howard E. Evans, my go-to source of wasp wisdom.

Another mysterious dwelling unit clung to the side of a rafter in the carport. It was a smooth, fist-size hunk of mud with no visible entrances or exits. When I cracked it open, I found several hollow chambers, some empty, some occupied by decomposing larvae or prey. Last year in the same place we had a few delicate tubes built by mud-dauber wasps, but this one is an industrial-strength creation I can’t identify. Any ideas?

The friends I’ll miss most are not builders but squatters. All summer we have shared our back deck with a population of minifrogs—often six or eight at a time—who took up residence in tunnel-like spaces under flowerpots. In nice weather they would join us for lunch alfresco.

As of today two frogs are still hanging on, and I worry they will freeze in place. I should move the flowerpots, I think, but it seems so inhospitable.

May everyone return next year.

]]>I had believed such a catastrophe was all but impossible. The natural gas industry has many troubles, including chronic leaks that release millions of tons of methane into the atmosphere, but I had thought that pressure regulation was a solved problem. Even if someone turned the wrong valve, failsafe mechanisms would protect the public. Evidently not. (I am not an expert on natural gas. While working on my book *Infrastructure*, I did some research on the industry and the technology, toured a pipeline terminal, and spent a day with a utility crew installing new gas mains in my own neighborhood. The pages of the book that discuss natural gas are online here.)

The hazards of gas service were already well known in the 19th century, when many cities built their first gas distribution systems. Gas in those days was not “natural” gas; it was a product manufactured by roasting coal, or sometimes the tarry residue of petroleum refining, in an atmosphere depleted of oxygen. The result was a mixture of gases, including methane and other hydrocarbons but also a significant amount of carbon monoxide. Because of the CO content, leaks could be deadly even if the gas didn’t catch fire.

Every city needed its own gasworks, because there were no long-distance pipelines. The output of the plant was accumulated in a gasholder, a gigantic tank that confined the gas at low pressure—less than one pound per square inch above atmospheric pressure (a unit of measure known as pounds per square inch gauge, or psig). The gas was gently wafted through pipes laid under the street to reach homes at a pressure of 1/4 or 1/2 psig. Overpressure accidents were unlikely because the entire system worked at the same modest pressure. As a matter of fact, the greater risk was underpressure. If the flow of gas was interrupted even briefly, thousands of pilot lights would go out; then, when the flow resumed, unburned toxic gas would seep into homes. Utility companies worked hard to ensure that would never happen.

Gas technology has evolved a great deal since the gaslight era. Long-distance pipelines carry natural gas across continents at pressures of 1,000 psig or more. At the destination, the gas is stored in underground cavities or as a cryogenic liquid. It enters the distribution network at pressures in the neighborhood of 100 psig. The higher pressures allow smaller diameter pipes to serve larger territories. But the pressure must still be reduced to less than 1 psig before the gas is delivered to the customer. Having multiple pressure levels complicates the distribution system and requires new safeguards against the risk of high-pressure gas going where it doesn’t belong. Apparently those safeguards didn’t work last month in the Merrimack valley.

The gas system in that part of Massachusetts is operated by Columbia Gas, a subsidiary of a company called NiSource, with headquarters in Indiana. At the time of the conflagration, contractors for Columbia were upgrading distribution lines in the city of Lawrence and in two neighboring towns, Andover and North Andover. The two-tier system had older low-pressure mains—including some cast-iron pipes dating back to the early 1900s—fed by a network of newer lines operating at 75 psig. Fourteen regulator stations handled the transfer of gas between systems, maintaining a pressure of 1/2 psig on the low side.

The NTSB preliminary report gives this account of what happened around 4 p.m. on September 13:

The contracted crew was working on a tie-in project of a new plastic distribution main and the abandonment of a cast-iron distribution main. The distribution main that was abandoned still had the regulator sensing lines that were used to detect pressure in the distribution system and provide input to the regulators to control the system pressure. Once the contractor crews disconnected the distribution main that was going to be abandoned, the section containing the sensing lines began losing pressure.

As the pressure in the abandoned distribution main dropped about 0.25 inches of water column (about 0.01 psig), the regulators responded by opening further, increasing pressure in the distribution system. Since the regulators no longer sensed system pressure they fully opened allowing the full flow of high-pressure gas to be released into the distribution system supplying the neighborhood, exceeding the maximum allowable pressure.

When I read those words, I groaned. The cause of the accident was not a leak or an equipment failure or a design flaw or a worker turning the wrong valve. The pressure didn’t just creep up beyond safe limits while no one was paying attention; the pressure was *driven* up by the automatic control system meant to keep it in bounds. The pressure regulators were “trying” to do the right thing. Sensor readings told them the pressure was falling, and so the controllers took corrective action to keep the gas flowing to customers. But the feedback loop the regulators relied on was not in fact a loop. They were measuring pressure in one pipe and pumping gas into another.

The NTSB’s preliminary report offers no conclusions or recommendations, but it does note that the contractor in Lawrence was following a “work package” prepared by Columbia Gas, which did not mention moving or replacing the pressure sensors. Thus if you’re looking for someone to blame, there’s a hint about where to point your finger. The clue is less useful, however, if you’re hoping to understand the disaster and prevent a recurrence. “Make sure all the parts are connected” is doubtless a good idea, but better still is building a failsafe system that will not burn the city down when somebody goofs.

Suppose you’re taking a shower, and the water feels too warm. You nudge the mixing valve toward cold, but the water gets hotter still. When you twist the valve a little further in the same direction, the temperature rises again, and the room fills with steam. In this situation, you would surely not continue turning the knob until you were scalded. At some point you would get out of the shower, shut off the water, and investigate. Maybe the controls are mislabeled. Maybe the plumber transposed the pipes.

Since you do so well controlling the shower, let’s put you in charge of regulating the municipal gas service. You sit in a small, windowless room, with your eyes on a pressure gauge and your hand on a valve. The gauge has a pointer indicating the measured pressure in the system, and a red dot (called a bug) showing the desired pressure, or set point. If the pointer falls below the bug, you open the valve a little to let in more gas; if the pointer drifts up too high, you close the valve to reduce the flow. (Of course there’s more to it than just open and close. For a given deviation from the set point, *how far* should you twist the valve handle? Control theory answers this question.)

It’s worth noting that you could do this job without any knowledge of what’s going on outside the windowless room. You needn’t give a thought to the nature of the “plant,” the system under control. What you’re controlling is the position of the needle on the gauge; the whole gas distribution network is just an elaborate mechanism for linking the valve you turn with the gauge you watch. Many automatic control system operate in exactly this mindless mode. And they work fine—until they don’t.

As a sentient being, you *do* in fact have a mental model of what’s happening outside. Just as the control law tells you how to respond to changes in the state of the plant, your model of the world tells you how the plant should respond to your control actions. For example, when you open the valve to increase the inflow of gas, you expect the pressure to increase. (Or, in some circumstances, to decrease more slowly. In any event, the sign of the second derivative should be positive.) If that doesn’t happen, the control law would call for making an even stronger correction, opening the valve further and forcing still more gas into the pipeline. But you, in your wisdom, might pause to consider the possible causes of this anomaly. Perhaps pressure is falling because a backhoe just ruptured a gas main. Or, as in Lawrence last month, maybe the pressure isn’t actually falling at all; you’re looking at sensors plugged into the wrong pipes. Opening the valve further could make matters worse.

Could we build an automatic control system with this kind of situational awareness? Control theory offers many options beyond the simple feedback loop. We might add a supervisory loop that essentially controls the controller and sets the set point. And there is an extensive literature on *predictive control*, where the controller has a built-in mathematical model of the plant, and uses it to find the best trajectory from the current state to the desired state. But neither of these techniques is commonly used for the kind of last-ditch safety measures that might have saved those homes in the Merrimack Valley. More often, when events get too weird, the controller is designed to give up, bail out, and leave it to the humans. That’s what happened in Lawrence.

Minutes before the fires and explosions occurred, the Columbia Gas monitoring center in Columbus, Ohio [probably a windowless room], received two high-pressure alarms for the South Lawrence gas pressure system: one at 4:04 p.m. and the other at 4:05 p.m. The monitoring center had no control capability to close or open valves; its only capability was to monitor pressures on the distribution system and advise field technicians accordingly. Following company protocol, at 4:06 p.m., the Columbia Gas controller reported the high-pressure event to the Meters and Regulations group in Lawrence. A local resident made the first 9-1-1 call to Lawrence emergency services at 4:11 p.m.

Columbia Gas shut down the regulator at issue by about 4:30 p.m.

I admit to a morbid fascination with stories of technological disaster. I read NTSB accident reports the way some people consume murder mysteries. The narratives belong to the genre of tragedy. In using that word I don’t mean just that the loss of life and property is very sad. These are stories of people with the best intentions and with great skill and courage, who are nonetheless overcome by forces they cannot master. The special pathos of *technological* tragedies is that the engines of our destruction are machines that we ourselves design and build.

Looking on the sunnier side, I suspect that technological tragedies are more likely than *Oedipus Rex* or *Hamlet* to suggest a practical lesson that might guide our future plans. Let me add two more examples that seem to have plot elements in common with the Lawrence gas disaster.

First, the meltdown at the Three Mile Island nuclear power plant in 1979. In that event, a maintenance mishap was detected by the automatic control system, which promptly shut down the reactor, just as it was supposed to do, and started emergency pumps to keep the uranium fuel rods covered with cooling water. But in the following minutes and hours, confusion reigned in the control room. Because of misleading sensor readings, the crowd of operators and engineers believed the water level in the reactor was too high, and they struggled mightily to lower it. Later they realized the reactor had been running dry all along.

Second, the crash of Air France 447, an overnight flight from Rio de Janeiro to Paris, in 2009. In this case the trouble began when ice at high altitude clogged pitot tubes, the sensors that measure airspeed. With inconsistent and implausible speed inputs, the autopilot and flight-management systems disengaged and sounded an alarm, basically telling the pilots “You’re on your own here.” Unfortunately, the pilots also found the instrument data confusing, and formed the erroneous opinion that they needed to pull the nose up and climb steeply. The aircraft entered an aerodynamic stall and fell tail-first into the ocean with the loss of all on board.

In these events no mechanical or physical fault made an accident inevitable. In Lawrence the pipes and valves functioned normally, as far as I can tell from press reports and the NTSB report. Even the sensors were working; they were just in the wrong place. At Three Mile Island there were multiple violations of safety codes and operating protocols; nevertheless, if either the automatic or the human controllers had correctly diagnosed the problem, the reactor would have survived. And the Air France aircraft over the Atlantic was airworthy to the end. It could have flown on to Paris if only there had been the means to level the wings and point it in the right direction.

All of these events feel like unnecessary disasters—if we were just a little smarter, we could have avoided them—but the fires in Lawrence are particularly tormenting in this respect. With an aircraft 35,000 feet over the ocean, you can’t simply press *Pause* when things don’t go right. Likewise a nuclear reactor has no safe-harbor state; even after you shut down the fission chain reaction, the core of the reactor generates enough heat to destroy itself. But Columbia Gas faced no such constraints in Lawrence. Even if the pressure-regulating system is not quite as simple as I have imagined it, there is always an escape route available when parameters refuse to respond to control inputs. You can just shut it all down. Safeguards built into the automatic control system could do that a lot more quickly than phone calls from Ohio. The service interruption would be costly for the company and inconvenient for the customers, but no one would lose their home or their life.

Control theory and control engineering are now embarking on their greatest adventure ever: the design of self-driving cars and trucks. Next year we may see the first models without a steering wheel or a brake pedal—there goes the option of asking the driver (passenger?) to take over. I am rooting for this bold undertaking to succeed. I am also reminded of a term that turns up frequently in discussions of Athenian tragedy: hubris.

]]>