Archive for November, 2007

Pulling the goalie

Thursday, November 29th, 2007

Don Elgee, a retired teacher of mathematics and computer science from Ottawa, sends the following inquiry:

In hockey, when a team is down by a goal with about one minute to go, the goalie is pulled in favor of another offensive player. This illustrates a key point in the strategy of most games.

The object is not to maximize points for, and minimize points against; rather, it is to win or tie the game. As the game nears its conclusion the winning team plays more conservatively and the losing team more radically. But when—and to what extent—should a team change its style of play? 

Ottawa Senators goal; photo by C. P. Storm; http://images.google.com/imgres?imgurl=http://upload.wikimedia.org/wikipedia/commons/thumb/3/36/Ice_hockey_goal.jpg/501px-Ice_hockey_goal.jpg&imgrefurl=http://commons.wikimedia.org/wiki/Image:Ice_hockey_goal.jpg&h=600&w=501&sz=73&hl=en&start=127&sig2=rm9lycYyPZku7E-qGdcpZg&tbnid=Ox2HSAN9YSjfAM:&tbnh=135&tbnw=113&ei=-0JPR6CfM5HqhwLny6ShDg&prev=/images%3Fq%3Dhockey%2Bempty%2Bnet%26start%3D120%26gbv%3D2%26ndsp%3D20%26svnum%3D10%26hl%3Den%26safe%3Doff%26sa%3DN

To clarify the problem I tried to conceive the simplest possible example. It turned out to be much more complex than I expected.

Consider the following simple game: Two players toss a die on alternate turns. The first one to reach 100 is the winner. Players have a choice of two dice—one is normal (6, 5, 4, 3, 2, 1) and the other has faces with 6, 6, 5, 1, 1, 1. When should the losing player switch to the second die?

Is there a mathematical solution? Could you determine this by experimental computer simulation? Does the best strategy depend on the strategy the opponent is likely to choose next?

This simple problem is difficult enough, but the sports situation could better be simulated by adding a third die with 4, 4, 3, 3, 3, 3.

Has this issue been subject to mathematical study? With all the statistics used in professional sport has anyone tried to include this factor? Has it ever been incorporated in computer simulations?

Hockey is not my sport, but these are interesting questions. Inspired by Elgee’s letter, I’ve run a few of the obvious simulations. For the time being, however, I’m not going to say much about the results. What I’ve learned so far does not lead to any simple, pithy rule of thumb that a hockey coach could use to decide when to pull the goalie. Nor do I have a clear grasp of how best to play the dice game. I’m hoping someone else will offer a sharper analysis. Toward that end I have some further notes and observations.

Under the stated rules—alternating turns, with victory going to the first player who reaches 100—the privilege of moving first brings a substantial advantage. In a 100-point game with standard dice, the first player can expect to win about 55 percent of the time. If two players are tied at 99, the first to roll is definitely going to win, regardless of which die is chosen. Because of this bias, we probably need to consider different strategies for the first and second players (like white and black in a chess game). An alternative is to add a further element of randomness to the game: In each round, the player to move is determined by the toss of a fair coin.

Elgee’s two nonstandard dice have faces that sum to 20, less than the standard die’s 21. As a model of hockey tactics, this bias makes sense: It reflects the risk of leaving the goal untended. But if we ignore hockey and just consider the dice game on its own merits, it might be more interesting to explore a symmetrical contest with dice marked 6, 6, 6, 1, 1, 1 and 4, 4, 4, 3, 3, 3. Then we’d be looking at three distributions that have the same mean but different shapes.

Generalizing further, we could allow dice marked with any six non-negative integers that sum to 21. What would happen to the game if we allowed dice such as 100, -16, -16, -16, -16, -15? Or, we could allow the players to choose continuous probability density functions.

It’s easy to imagine circumstances where a “high-risk” die should be advantageous. If the score is 99 to 94, then the trailing player should be willing to accept any trade-off that improves the odds of rolling a 6; even a 6, 6, 0, 0, 0, 0 die is preferable to the standard die. Conversely, if you’re ahead near the end of the game, the “low-risk” die seems favorable. All this suggests there ought to be some fairly simple guidelines for choosing the best die in any situation. But, as Elgee noted, it’s more complicated than you might expect.

In my simulations I found that with the score tied at 94, switching to the 6, 6, 5, 1, 1, 1 die (while your opponent continues to play the standard die) yields an advantage of 2.9 percent. But when the score is 96 to 96, making the same switch produces a deficit of 3.7 percent. Interestingly, choosing the 4, 4, 3, 3, 3, 3 die brings similar results: a gain at 94 to 94 but a loss at 96 to 96. (The simulations used the coin-flip protocol, but the same kind of anomalies appear under other rules.)

It gets even more complicated. The results cited above are for players who always choose the same die, no matter what the circumstances. But Elgee’s rules allow a player to choose a different die on each roll. Presumably, players who adapt or learn will do better than any player with a fixed strategy. I find that a very simple adaptive strategy—use 6, 6, 5, 1, 1, 1 when behind, 4, 4, 3, 3, 3, 3 when ahead, and 6, 5, 4, 3, 2, 1 when the score is even—gives mostly but uniformly good results in end-game situations.

I echo Elgee’s questions about whether this problem has been analyzed previously. As it happens, I’ve been able to find a bit of literature on the original hockey issue. Tom Benjamin’s NHL Weblog has pointers to three papers published in the 1980s in Interfaces, an operations research journal. In those papers several authors—fans of rival teams—derive a formula for the optimal time to pull the goalie. (They find that coaches almost always wait too long.)

I would have thought that Elgee’s nonstandard dice would also have been investigated before. Some years ago Bradley Efron invented a wonderful set of nontransitive dice, which Martin Gardner wrote about (Scientific American, December 1970, pp. 110–114). Gardner also had a column on various kinds of rigged and trick dice (Scientific American, November 1968, pp. 140–146). A game called Piggy has certain elements reminiscent of Elgee’s game, but those elements don’t include dice that differ in variance.

Perhaps I’ve just failed to find the right search term on Google or MathSciNet, and a reader will supply a reference.

Update: Minutes after posting the above, I discovered an article by B. De Schuymera, H. De Meyera and B. De Baetsb, “Optimal strategies for equal-sum dice games” (Discrete Applied Mathematics 154 (2006) 2565–2576), that covers some of the territory. They analyze a game in which players can choose any die whose n faces have the same sum σ. But they consider only games in which players bet independently on each throw of the dice. Under these conditions, they show that with six-sided dice summing to 21, the optimal die is the standard one, with faces marked 6, 5, 4, 3, 2, 1. Again, however, if I understand correctly, this result applies only to individual throws of the die.

Last name first

Tuesday, November 20th, 2007

Saturday’s New York Times had a story by Sam Roberts about a newly released Census Bureau study of the frequency of surnames in the U.S. The Times story was mainly about the names at the top of the list, and especially the increasing prominence of Hispanic names (Garcia and Rodriguez have made it into the top ten). But what caught my attention was a passing comment about the bottom of the frequency distribution:

Altogether, the census found six million surnames in the United States. Among those, 151,000 were shared by a hundred or more Americans. Four million were held by only one person.

I was not surprised to learn that the distribution of name frequencies is steeply skewed, with a few common names and a great many rare ones. But could it be true that two-thirds of the names occur just once in the population—that four million people in the U.S. have a unique family name they share with no one else?

Looking through the lens of personal experience, I found it hard to believe those numbers. Over the years I’ve met some people whose family names are surely rare, but I am not aware of a single acquaintance who is the holder of a unique name—if only because everyone I know shares a name with parents or children or siblings or a spouse. After all, family names tend to run in families! To have a unique name, you’ve got to be the first of your line or the last of your line or both.

The study of name distributions has a long history. In the 1870s Francis Galton and Henry William Watson looked into the longevity of family names, concluding:

All the surnames, therefore, tend to extinction in an indefinite time, and this result might have been anticipated generally, for a surname once lost can never be recovered, and there is an additional chance of loss in every successive generation.

The argument sounds good, but it’s not quite as broadly applicable as Galton and Watson thought it was. Extinction is inevitable only in a static or shrinking population. If the population is growing, names and families can become all but immortal. In the 1920s Alfred Lotka calculated that American family names had about an 18 percent chance of surviving indefinitely. More recently, Susanna C. Manrubia, Bernard Derrida and Damián H. Zanette have developed a more refined computer model of name evolution (see arXiv preprint 1 and 2; there’s also a splendid American Scientist article, but annoyingly it’s only accessible to subscribers). Manrubia, Derrida and Zanette describe an equilibrium state where the distribution of names follows a power law. If we define a “clan” as the set of all people who have a surname in common (whether or not they are actually related), then the predicted number of clans of size m is proportional to m–β. Manrubia, Derrida and Zanette argue that β = 2. Thus, for example, clans 10 times larger should be 100 times rarer.

How do the new Census Bureau findings stack up against these predictions? Here is the frequency table included in the summary report (.pdf):

Table of frequencies of last names

For this data set the cumulative numbers are easier to work with because of the nonuniform bin sizes. Here’s how they look in a graph:

graph of cumulative name frequencies

Graphs of this kind can be confusing. I find it helpful to keep in mind that a point at coordinates x,y indicates there are y clans with x members or more.

If clan frequencies were governed by a strict power law, the graph would trace a straight line on these log-log scales. Overall, the curve is indeed fairly straight, tending to support the power-law model. But a few features of the curve seem to depart from the predictions. For one thing, the slope of the line gives an exponent closer to β = 1 than β = 2, as Manrubia, Derrida and Zanette would lead us to expect. I can’t explain that. A steepening of the curve at the large-clan end could be an artifact of finite sample size. Most interesting of all is the sudden uptick at the opposite end of the curve, where clans of size 1 are much more abundant than the power law predicts. On a logarithmic scale it’s easy to misjudge the magnitude of such a trivial-looking excursion: If the two leftmost data points (for clans of size 1 and size 2 through 4) were restored to the trend line of the data from clan sizes of 10 through 1,000, the total number of names in the survey would be about three million instead of six million, and there would be only one million unique names instead of four million.

I’ll not keep you in suspense any longer about the cause of this anomaly. When I downloaded the Census Bureau report, I found that the authors (David L. Word, Charles D. Coleman, Robert Nunziata and Robert Kominski) are also skeptical about those four million solo monikers. They explain that the data came from census forms on which respondents were asked to print the first, middle and last names of all household residents; the forms were then electronically scanned, and the answers were extracted by optical character recognition. Errors at any point in the process could turn a common name into a unique (but fictitious) one—making a MLLLER out of a MILLER, say. Some of these errors were corrected in later processing, but others apparently slipped through. One particularly troublesome problem arose whenever a respondent printed an entire name in the space intended for the surname. The OCR software simply concatenated all the parts of such a response, leading to spurious surnames such as PETERJDAVIS. The report states that “many” of the four million unique names are products of such data-entry errors, but there is no attempt to quantify the effect.

For privacy reasons, the Census Bureau has released only the 151,671 names (.zip) occurring at least 100 times, so there’s no way to get a look at the unique names. You might think, though, that if three-fourths of them are malformed in some way, that fact would stand out prominently and would have been noticed even before this study was undertaken. You might even think that if 1 percent of respondents are entering names incorrectly, the Census Bureau would have discovered that fact in preliminary testing and would have redesigned the form before circulating it to 300 million people.

Still, I suppose the Bureau’s explanation must be true. There’s spotty suggestive evidence even in the list of names appearing 100 times or more. For example, the list includes surnames such as VANBURKLEO and JOHNSONWILLIAM. And either there are 160 people in the U.S. whose surname is JOHNOSN, or there are 160 JOHNSONs who all made the same transposition error when entering their name on a census form. (Or some combination of the above.)

Even if there are only a million unique names, that still seems like a lot—one out of every 300 people. Galton and Watson looked upon such lonely surnames as dying embers, the last hope of families on the brink of extinction. But some of the rare names are surely newborns rather than expiring elders. Immigration brings names that are new to the U.S. even if they are far from unique globally. And processes akin to mutation and recombination are creating new names all the time. In particular, recombination has become more important now that the purely patrilineal model of name transmission is no longer universal; surnames have broken free from their linkage to the Y chromosome. As a matter of fact, now that I think of it, I was wrong when I said that I have never known a person with a unique surname. I have friends who named their daughter Nina Auslander-Padgham, and her surname surely has a good chance at uniqueness. Or at least it did until Nina’s brother Milo was born.

Out of curiosity, I opened up the Boston-Cambridge phone book, selected a few pages at random, and counted up unique names as a proportion of all names. In a sample of 458 surnames, 254 were listed for one person only, or about 55 percent. This result isn’t too far from the two-thirds ratio in the Census Bureau report, but I’m not sure how to interpret it. The geographic area covered by the Boston directory includes a population of roughly a million, or about 1/300th of the national population. When you select a small sample of this kind—supposing it to be a random sample—what does the selection process do to the frequency distribution of names? If a name occurs 300 times nationally, it could well be unique in Boston, thereby apparently boosting the number of unique names. On the other hand, for every 300 names that truly are unique nationally, only one is likely to be represented in Boston, so in this way the number of unique names is greatly diminished. The question I leave you with is this: How best can we estimate the national (or global) proportion of unique names from a small random sample?

Until NEXPTIME

Sunday, November 18th, 2007

I have a few questions for the complexity theorists among us.

Have you ever tried to explain to your grandmother why NP is named NP? Does she get it when you say that problems labeled NP-complete are the hardest problems in NP, but NP-hard problems might be harder, and not in NP?

My concern here is not with the difficulty of the concepts—there’s not much we can do about that—but with the notation and terminology. Do locutions like P#P and NISZK and (NP ∩ coNP)/poly roll trillingly off your tongue? How about EXP, EEXP, NEXP, PEXP and SUBEXP? And while we’re on the subject of EXP and friends, I’ve been wondering how to pronounce NEXPTIME. (I’m kind of hoping the “P” is silent, as in Pterodactyl.)

Presumably, the argot of complexity theory works well enough for communication among members of the club—those of you, say, who know that “IP” is neither “internet protocol” nor “intellectual property” but “interactive proof.” If you’d ever like to talk to the rest of the world, however, I think there might be room for improvement.

Let’s start with the term “complexity” itself. The root meaning of “complex” is “made up of many interconnected parts”; for example, the innards of a mechanical clock are complex. Is this the best word to describe the property of being hard to compute? To me, it seems the difficulty of an exponential algorithm is not the complexity of the task but the mere mind-numbing repetition of the same operations. An exponential heap of manure is no more complex than a polynomial heap of manure, it just takes longer to shovel it.

Another gripe: WHAT’S WITH ALL THE CAPITAL LETTERS? IS IT REALLY NECESSARY TO SHOUT ALL THE TIME?

But the thing that worries me most is a certain opacity in the names of complexity classes. According to the Complexity Zoo, there are currently 465 named classes, which is quite a menagerie. It’s enough that you can’t really expect people to keep track of meanings and relations without some internal cues from the structure of the names themselves.

There are hints of structure in the naming scheme, with terms assembled from stems, prefixes and suffixes, and combined according to quasi-grammatical rules. For example, the superscript in P#P indicates the action of an “oracle”: The class encompasses problems that can be solved in polynomial time with the help of a #P (”sharp-P” or “counting-P”) oracle. The prefix “co-” in “coNP” designates the complement of NP: If a “Does there exist…?” question is in NP, then the corresponding “Does there not exist…?” question is in coNP. These devices are applied consistently; as far as I can tell, superscripting and the “co-” prefix always mean the same thing. Unfortunately, other elements of the names are not so easy to parse. The letter P generally stands for “polynomial” (except where it’s “probabilistic”). N usually denotes “nondeterministic” (but NC is “Nick’s Class”). Likewise the prefix D is for “deterministic” (except that it’s usually omitted, and sometimes it means “difference” or “dynamical” instead). B stands for “bounded-error” (except that BH is “Boolean hierarchy” and “BPd(P)” is “Polynomial Size d-Times-Only Branching Program”). Q is for “quantum” (except “QH” is the “query hierarchy” and “QP” is “quasi-polynomial time”).

The sad truth is, the naming conventions for furniture at Ikea make for a more consistent language than those of complexity theory.

How did we get into this fix? Back in 1974, when complexity theory was still young and impressionable, Don Knuth published “A Terminological Proposal” in SIGACT News, the quarterly newsletter of what was then the ACM Special Interest Group on Automata and Computability Theory. (By the way, the issue in which Knuth’s letter appears, Vol. 6, No. 1, seems to be missing from the ACM Digital Library. Can anyone supply a copy for scanning?) Knuth had polled 30 of his friends, asking for advice on the best adjective to describe problems “at least as difficult to solve in polynomial time as those of the Cook-Karp class NP.” His own suggestions were “Herculean,” “formidable” and “arduous,” but none of those got strong support. The people he consulted had proposals of their own: “impractical,” “intractable,” “prodigious,” “exorbitant,” “Sisyphean” and several others. The idea I like best came from Shen Lin, who suggested the abbreviation PET:

PET stands for “probably exponential time”. But if this gets proved, PET stands for “provably exponential time”. And if the proof goes the other way, it stands for “previously exponential time”.

On the whole, I think it’s just as well that we are not now speaking of Herculean or Sisyphean or PET problems, but I haven’t much enthusiasm for the solution that was adopted, either:

The “winning” write-in vote is the term NP-hard, which was put forward mainly by several people at Bell Labs, after what I understand was considerable discussion. Similar if not identical proposals were made by Steve Cook, by Ron Rivest, and in earlier publications by Sakti Sahni. This term is intended for use together with another new one, NP-complete, which abbreviates ‘polynomial complete’ and is at the same time more exact.

Knuth endorses “NP-hard” and “NP-complete” primarily because they have precise definitions and thus meet the needs of the insiders, the “automata-theorists.” But he also goes on to address the issue of communicating with a wider audience, and his thoughts are worth quoting at some length:

I don’t consider it a major goal to invent completely descriptive terminology for every conceivably interesting question of the type considered here; the major goal is to have a good term to use for the masses, in the one case which experience shows is almost omnipresent. To say NP-hard actually smacks of being a little too technical for a mass audience, but it’s not so bad as to be unusable; fortunately the term does easily generalize to a lot of other cases considered by automata theorists, so it appears to be a good compromise. When more technicalities are introduced, there is less need for a universally accepted term or notation. Although it’s very interesting to consider, e.g., log-space reductions, I don’t think it’s necessary to come up with a special short name for them.

If I understand this passage correctly, Knuth is arguing that although “NP-complete” and “NP-hard” may be a bit abstruse for casual coffeeshop chatter, they are the only terms that nonspecialists will ever need to learn, and so they won’t be too much of a burden. But that’s not quite the way it has worked out.

I suppose that if I’m going to complain about somebody else’s notation, I ought to propose an alternative of my own that I believe to be better. I can’t do that, but I would like to point out that certain other fields of study offer models that might be emulated. The most obvious model is organic chemistry, where the naming problem is orders of magnitude larger than the collection of 465 items in the complexity zoo. The sesquipedalian names of organic molecules are not exactly pretty, but the notational system has the wonderful property that if you know the name of a thing, you also know its composition and structure. Wouldn’t it be grand to have the same kind of perspicuous scheme for complexity classes, so that just naming a class would tell you where it lies in the inclusion diagram?

A saving grace of complexity theorists everywhere is that they are not without a sense of humor when it comes to the obscurity of their own jargon. For example, there is a complexity class named PINC, which the Zoo translates as “Polynomial Ignorance of Names of Classes.” And you don’t want to miss Alan T. Sherman’s discourse (or descant) on superpolylogarithmic subexponential functions (.ps).

Finally, I should mention a discovery I’ve made at the end of Knuth’s terminological proposal. Everyone knows that settling the P = NP question will earn you a prize of $1 million from the Clay Mathematics Institute. I suspect that many are unaware of another prize offer, made more than 30 years ago. Knuth writes:

I’m willing to give a prize of one live turkey to the first person who proves that P = NP.

Note that this appears to be a one-way-only offer. Proving that P ≠ NP won’t win you the turkey. (You have three days left until Thanksgiving.)

A New Yorker theorem

Tuesday, November 6th, 2007

Barry Cipra, my friend and former neighbor, and a frequent bit-player commentator, is the Talk of the Town this week. A story in The New Yorker by Lizzie Widdicombe highlights Barry’s work on the mathematics of the following calendrical problem.

Doing all arithmetic modulo 100, if you were born in the year X, you reach age X in the year 2X. For example, if you were born in ’54, you’ll be age 54 next year, in (2 × 54) mod 100 = ’08. But another group will also be reaching their birth-year age in ’08, namely the children born in ’04. So here’s the question Barry tackles: In any given year, which cohorts have already reached their birth-year age, and which are still looking forward to this event?

For an account of the human side of this problem, explaining why it might interest New Yorker readers, see Widdicombe’s article. For more on the mathematics, see Barry’s paper. (As far as I know, this is the only mathematics paper you’ll find with a URL beginning “http://www.newyorker.com/”.)

Boidland

Friday, November 2nd, 2007

starlings_2059.JPG

Above: A throbbing, wheeling mob of several thousand restless starlings, near a strip mall in Clayton, North Carolina, 27 October 2007. Below: Snow geese on maneuvers near Ashburn, Missouri, 12 November 2004.

geese_9869.JPG

In the 1930s, Edmund Selous argued that flocking behavior could be explained only through some form of animal ESP: “thought transference” was the only way that birds could communicate their intentions quickly enough to coordinate their movements. Others suggested there must be some designated leader along the birds, a conductor or drill sergeant whose cues the rest of the flock followed. By the 1980s a much simpler view emerged. Flocking birds (and also schooling fish and swarming insects) could maintain their formations without any need for leaders or a synchronizing central authority if each individual followed a few simple rules.

This idea of a leaderless flock with distributed intelligence was proposed by several authors at about the same time, but the version that made the biggest splash came from Craig J. Reynolds, whose famous “boids” made their premier just 20 years ago, at the SIGGRAPH meeting in 1987. The simulation was described in a paper in the conference proceedings, and the source code is also available online. The program comes to about 3,000 lines of Lisp, written in the Zetalisp dialect of the Symbolics Lisp machine. These days, you can achieve the same effect with much less effort in multiagent programming environments such as StarLogo, NetLogo and breve.

All of the boids in a flock were guided by the same three rules:

  • Separation: Avoid collisions, and try to equalize distance to nearest neighbors.
  • Alignment: Turn to match the average heading of nearby boids.
  • Cohesion: Move toward the mean position (or center of mass) of the flock.

When I first saw boids in action, I found them marvelously lifelike. Now, after spending a fair amount of time playing with models of this kind, I find the concept so familiar and so thoroughly internalized that I have a hard time seeing flocks in any other context. As I stand at the edge of an autumn field with clouds of starlings swirling around me, I think: How boidlike they are! Just as earlier observers were sure they saw leaders in the flock, or signs of spooky avian brainwaves, I can’t help seeing Separation, Alignment, Cohesion.

This weekend I am off to Florida to observe the self-organizing rituals of another flock. I’m attending the annual meeting of Sigma Xi, the society that publishes American Scientist. I’m at the meeting mainly to cheer friends who will be inducted as honorary members of the society. But it’s also interesting to observe how any such flock orients itself and chooses a direction for the future.