JMM pixel dump

January 7th, 2008

There’s a lot of mathematics going on here in San Diego, and I’m taking notes. But for now, what I have to offer is a bucket of pixels:

sailshall1358.jpg

It’s hard to get excited about the architecture of convention centers. They’re like airports: You’re pleasantly surprised if they’re merely functional. In many respects the San Diego convention center is no better than any other. It squats on an enormous tract of bayside property, cutting off a downtown neighborhood from the waterfront. But one tent-like wing (above) has genuine visual interest. (As the trash cans and “Wet Floor” signs at right indicate, it also seems to have a leaky roof.) The adjacent stairway (below) offers the public (arduous) access to the waterfront. No one seems to be using it, but it does make a rather handsome abstraction.

stairs1348.jpg

Much as we despise airports, we all love railroad terminals. Below is a bit of the old Santa Fe Depot, still in use by Amtrak. It is seen from (and reflected in) the trolley terminal next door.

The Santa Fe Depot, seen from and reflected in the adjacent trolley terminal

I often gripe about a meeting schedule that has talks starting at 7:45. On the other hand, that agenda gets you out and about in time to see aspects of city life you might otherwise miss.

pressurewash1321.jpg

San Diego has some of the most distinctive street plumbing in the country (below). Note the Dalek fire hydrant. The chromed pipe in the foreground is labeled “suction connection.” Like the hydrant, it’s for use in firefighting, but whereas the hydrant supplies water under pressure, the suction connection merely provides access to a water reservoir and needs a pumper truck to supply pressure.

plumbing1334.jpg

Of course that silvery cylinder (bent into a segment of a torus) is an irresistible target for experiments in projective geometry.

fisheye1338.jpg

In the exhibit hall (below), legendary hacker hunter and astronomer Cliff Stoll is hawking the Acme Klein Bottle. The sales pitch is a high-energy performance that hasn’t been equalled since the glory days of the Vegematic.

cliffstoll1366.jpg

More to come.

JMMing

January 6th, 2008

The fellow across the aisle is a stranger to me, but I know we share a destination. He’s scanning the index of the meeting program, presumably looking for friends or colleagues delivering a paper. When I unfold myself from seat 22D and stroll toward the back of the plane, I spot a few more of us. Someone is working on foils in LaTeX. Someone else is doodling on squared paper, making lots of little stairstep diagrams—they could be Young tableaux or they could be some kind of self-avoiding walks. The group of three college kids playing cards in row 32 might be on their way to a football game, but when I overhear a snippet of conversation that includes the phrase “convex hull,” I conclude they are not just Chargers fans.

floatedmanhole200px1308.jpg

It’s time again for the Joint Mathematics Meeting, held this year in sunny San Diego—which is actually quite soggy at the moment. (In front of my hotel this morning I found a manhole cover that had floated free of its foundation sometime during last night’s downpour.)

We’ll muddle through (and try not to fall in). Sometime over the next few days, maybe I’ll even learn what all those little stairstep diagrams were all about.

The new toy

December 25th, 2007

The XO laptop computer, showing screen and keyboard

In this season of giving and getting, you’ve got to admire the marketing savvy of the One Laptop Per Child project. They named their introductory sales campaign “Give One, Get One.” The computers cost $200 apiece. For a donation of $400, you send one machine to a child somewhere far away (Afghanistan, Cambodia, Haiti, Mongolia and Rwanda are mentioned as possible destinations), and you get another for your own gadget-happy life.

The question that everyone seems to ask about OLPC is whether laptop computers are really what the children of Afghanistan, Cambodia, etc., need most urgently. Well, reason not the need! What I find most appealing (and most subversive) about the whole idea is that it leapfrogs over matters of brute survival. If children lack adequate food and shelter, is that grounds for denying them intellectual diversion and enrichment as well?

In connection with an earlier attempt to bring computers to children, Alan Kay wrote (pdf):

[A computer] can be like a piano: (a product of technology, yes), but one which can be a tool, a toy, a medium of expression, a source of unending pleasure and delight… and, as with most gadgets in unenlightened hands, a terrible drudge!!

So let’s not deny the world’s children either delight or drudgery. Once OLPC achieves its goal, I say we start a One Piano Per Village program.

But enough about giving. What about getting? What’s the new computer like?

They call it the XO. I’ve had mine for only a few days, and I’ve hardly begun to play with it, so all I can report here are first impressions. Basically, I think it’s brilliant, but I also wish the system software had been built on a somewhat different model.

The XO closed up.

The hardware is undeniably cute. No one will mistake it for a Sony Vaio or a Dell Inspiron. I wouldn’t be surprised if the XO becomes something of a fashion statement, the laptop to be seen with at Starbucks, or for liveblogging NANOG or SODA. I’ve heard complaints that the keyboard is too small for adult fingers, but typing on these little green keys can hardly be worse than thumbing a Treo or Blackberry. (Disclaimer: I type with two fingers, so I’m not much of a judge of keyboard ergonomics.)

If the styling has a whiff of Fisher-Price about it, there’s also some thoughtful ingenuity at work here, and designers of machines for grownups might learn something from it. The screen is gorgeous—crisp and bright in its standard color mode, with a secondary black-and-white mode that’s legible in full sunlight. The screen also swivels 360 degrees and can be laid face-up over the keyboard, so that the XO can be used as a tablet for reading e-books and the like.

There’s no disk drive; the machine has 256 megabytes of RAM and a gigabyte of flash memory for nonvolatile storage. Three USB ports and a slot for SD memory cards offer opportunities for supplemental storage. One consequence of the no-moving-parts architecture is that the computer is absolutely silent in use—no clicking or whirring from a disk drive, no fan noise.

The wifi transceiver is amazing. I never knew I had so many well-connected neighbors—people named linksys and netgear, for example. No other computer I’ve had in the house has ever detected any of these networks. Why is the wireless card in my other laptop (which cost an order of magnitude more, by the way) so wimpy in comparison? The XO can also form ad hoc “mesh” networks with other XOs, but evidently I’m the first XOwner in my neighborhood, so I haven’t been able to experiment with that capability.

When it comes to software, I can’t be quite so wholeheartedly enthusiastic. There are some bright spots. A set of four programs called TamTam provide first-rate tools for creating and playing music; the calculator is very elegantly done; there’s a cool built-in oscilloscope, and a “Write” program that seems just right for the intended audience. Also a Python interpreter named Pippi. And the software called Etoys is a highly ambitious system, willed into existence by Alan Kay and a bunch of his friends, which looks like a plaything but has a complete Smalltalk development environment inside.

Some other spots are not so bright. For one thing, the software is just not finished yet. Some basic capabilities (printing, a sleep mode) are not yet implemented, and there are various buttons that don’t yet have functions. The web browser is primitive (no tabs, very limited facilities for bookmarks). There’s an RSS reader that doesn’t seem to work. An update planned for early next year is supposed to fix some of these problems.

Anyway, software is always behind schedule; I can wait. As a matter of fact, I’d be happier if they had taken longer and built the software in a different way.

I have opined elsewhere that computing is in a horrible rut, unable to escape the grip of precedents established in the distant past. A case in point is the Unix operating system, which is now almost 40 years old but is still with us in the form of Linux and OS X. The longevity of Unix argues that it must be quite a solid product, and I won’t argue with that. On the other hand, it’s a sad commentary on progress in computer science if no one has had a better idea since 1969.

OLPC offered a rare opportunity. It was a chance to start fresh, to throw away the past, to build a computer that has no legacy code to run; there was no need to worry about backward compatibility. But that’s not the way the project evolved. The XO software turns out to be a sugar-coated version of the Linux operating system. I can understand the reasons for choosing this path. Building an entirely new software system from scratch would have slowed the project down, and it would have raised the risk of catastrophic failure. Adopting Linux provided a major head start and allowed the project to draw on a global community of experienced programmers. But the choice also had an opportunity cost. A multiuser operating system designed in the era of timesharing and dumb terminals, and tuned to the needs of professional programmers, is not an ideal match for a laptop built for kids.

When I say the XO software is sugar-coated Linux, that’s just what I mean: The user-interface layer is named Sugar. It’s an interesting program, which tries to escape the ubiquitous desktop metaphor by emphasizing community and interaction. Instead of launching a program or application, the user starts an “activity.” Many activities can be shared with other XOwners; for example, two XOs can be set up to use acoustic signals to measure the distance between them. One intriguing “feature” of Sugar is the absence of a conventional file system. You don’t create documents and then decide where to put them and what to name them. Pointers to all activities are saved automatically in reverse chronological order in a linear list called the journal.

The sugar layer is nifty, but it’s also perilously thin. Underneath lies the Unix command line, and at present there are many things that can be done only by reciting the proper command-line incantations (many of them preceded by a magic ’sudo’). Will the kids master that kind of wizardry?

Maybe I shouldn’t worry; I’m sure they’re better equipped and quicker learners than I am. In any case, it’s their judgment that counts, not mine.

Googling for graphs

December 12th, 2007

The latest cute trick from Google is a service for producing quantitative graphics on demand. For example, here’s a bar chart of the traffic to this web site over the past year:

bar chart from Google Charts

The cute part is how you get Google to produce the graphic. The whole specification—including data, scales, labels and stylistic preferences—gets packed into a URL. In this case, the URL reads as follows (with line breaks added for clarity):

http://chart.apis.google.com/chart?
chs=450x300
&cht=bvs
&chbh=27,6
&chtt=bit-player.org+hits+per+month
&chco=ee2233
&chf=bg,s,efedea
&chxt=x,y
&chxl=0:|J|F|M|A|M|J|J|A|S|O|N|D|
      1:|0|20,000|40,000|60,000|80,000|100,000
&chd=t:54.9,59.1,61.3,69.5,83.6,77.6,60.3,60.5,69.2,77.3,86.0,-1

(If you want the key to decode all this curious bric-a-brac, see the Google Chart API Developer’s Guide.)

There’s something vaguely unsettling about this whole idea. When the web was young, a URL was a resource locator—in other words, the address of a document stored on a server somewhere. Times have changed, and most web servers now assemble pages by issuing queries to a database rather than by grabbing documents from a simple file system; the “back end” server fetches some text here, an image there, an ad from a third source. Still, the basic paradigm remains information retrieval; the vast majority of web sites are in the business of serving up some kind of pre-cooked static content, even when it’s embellished with dynamic sauce. But I really don’t think that Google Charts works that way. I don’t think they’ve produced an archive of all possible graphs and were just waiting for me to send the URL that retrieved the one you see above.

The charts service raises anew a question that comes up all the time in computing: What should we compute just once and store away in case it’s needed again later, and what should we recompute every time? Years ago we had precomputed tables of logarithms and trig functions, but now it’s easier to just calculate those numbers on the fly. Evidently graphs have now entered that realm as well—things so cheap to compute it’s not worth hanging onto them. Note that the graph you’re looking at above is not one that I produced when I wrote and published this page; instead, when you opened the page, your web browser (or RSS reader) sent a new request to Google’s servers, which then regenerated the graph from scratch. If a thousand people read the page, the graph will be recreated a thousand times. It’s monstrously profligate and inefficient, but I guess that’s what makes computers so wonderful.

One constraint on this mechanism is the length of the URL. Is there a limit? I didn’t know the answer, and so I turned—where else?—to Google. I quickly found a helpful web page that says the standards are silent on this issue, but some web browsers and servers impose a limit of as few as 2,000 characters. The Google Charts designers probably had this limit in mind when they created their scheme for cramming data into a URL. There are actually three encodings available. Above I chose a method in which each data point is a number between 0.0 and 100.0 and values are separated by commas; this allows for a few hundred data points before you start bumping into the 2,000-character limit. The most compact encoding packs each data item into a single ASCII character in the ranges from A=0 to Z=25, a=26 to z=51 and 0=52 to 9=61. Not the most intuitive encoding, but it allows for a further cute trick: graphing words and phrases.

http://chart.apis.google.com/chart?chs=200x200&cht=lc&chd=s:Agraphisworth1000words
bar chart from Google Charts

Copy the URL into the location bar of a browser window, change the text at the end of the line, and you can see a graph of your name and address.

To P or NP, that is the question

December 10th, 2007

It’s time for my bimonthly self-promotional horn toot. The new issue of American Scientist is now available online, and paper copies should soon be stuffing mailboxes everywhere. My “Computing Science” column is on a new class of “holographic” algorithms invented a few years ago by Leslie G. Valiant of Harvard. The ideas have been further developed by Jin-Yi Cai of the University of Wisconsin (who is currently visiting Harvard as a Radcliffe Fellow) and several students and colleagues both at Wisconsin and in China.

If you want to know what holographic algorithms are all about, you’ll have to go read the column. I’m not going to rehash it here. Instead, I’m going to recycle a few paragraphs of a version of the column that might have been but never will be. This was a false start, an attempt at a lead that wound up leading me in the wrong direction:

It was the final day of a workshop, and I joined a group for dinner at a farmhouse restaurant in the village of Muggia, around the corner from Trieste, where Italy meets Istria. As the evening was winding down, we sat drinking the last of the wine, watching the sun sink into the Adriatic, and debating the P = NP question. The debate was somewhat lopsided: No one at the table was willing to defend the proposition that P is indeed equal to NP. But there was plenty of room for argument between hard-liners who insisted that P is most certainly not equal to NP and those who took a more agnostic view.

What are P and NP? They are classes of computational tasks or problems. Roughly speaking (I’ll try to speak less roughly in a moment), the class P includes all the things we know how to compute quickly. NP is a larger class, encompassing all the problems known to be in P and many others besides; for these additional tasks we don’t have efficient algorithms, and yet no one has proved that such algorithms don’t exist. If it should turn out that P = NP, then thousands of seemingly hard problems actually have some quick shortcut solution, if only we could find the secret.

A mathematician at my end of the table took an inflexible position: P = NP is not only wrong but unthinkable—a notion akin to time travel or perpetual motion. I had been lurking silently through most of this conversation, but at this point I spoke up. Time travel seems implausible, I said, because it would undermine causality, and perpetual motion would violate the conservation of energy. What bedrock principle would be overturned by the discovery that P = NP? The mathematician replied: “Do you believe in miracles? No? Well, wouldn’t it be miraculous if we lived in a universe where every last one of those NP problems has an easy solution?”

This was good enough for me on that mild summer night in Muggia. The fact is, however, the miracle argument cuts both ways. There’s a subclass of NP problems, designated NP-complete, with a special property: If there’s an efficient algorithm for any one of these tasks, the same method can be used to solve all NP problems efficiently. Thus for P to be different from NP requires the “miracle” that not even one out of thousands of NP-complete problems has a quick solution….

One more note on the column. This is difficult subject matter, and my grasp of it is tenuous. I could not have written the piece without a lot of help and hand-holding from Valiant and Cai, who were reading drafts and answering phone calls up until the very moment the issue went to press. I have a reason for thanking them here rather than in the column itself: If I’ve goofed in spite their efforts, I don’t want them implicated.

Measure twice, average once

December 7th, 2007

plywood panel with seven measurements in crayon or magic marker

Whenever Norm Abram tells me to “measure twice, cut once,” I wonder what I’m supposed to do if the two measurements disagree. Perhaps I should measure a third time, in hope of settling the question by majority rule; but then I might well wind up with three discrepant values.

Strolling by a construction site the other day, I came upon the plywood panel shown above. There was no one around to help me interpret these curious scrawled measurements, but I could easily enough imagine the scene. A carpenter—Skilsaw at the ready—is surrounded by a group of statisticians and decision theorists eager to advise him on where to make the cut.

“Obviously,” says the first consultant, “we take the average—the arithmetic mean. Gauss proved 200 years ago that the sample mean is always the best estimator for a measurement subject to normally distributed random errors.”

“Actually, he proved just the opposite,” says another hardhatted and hardheaded savant. “He started by assuming that the mean is the most probable value, and then he invented the normal distribution as a way of ensuring that this rule will hold.”

“Whatever. But we’ve come a long ways since 1805. We know that the mean is an admissible estimator. Even without assuming a normal distribution, the sample mean is the estimator that minimizes the sum of the squared errors.”

“But who says the sum of the squared errors is the function we want to optimize? It’s just one of many possibilities. And it gives undue influence to the extremes of the distribution. In this case, the presence of that peculiar-looking eight-and-an-eighth value pulls the mean down to 55.875. Is that really where we should saw the board?”

“That 8.125 is obviously an outlier. Somebody was reading the wrong end of the tape measure. Excluding that bogus value, the mean is 63.833.”

“If you’re going to be picking and choosing which data points to trust, what about the one at the upper right? I’m not even sure I can read it: 64 and seven-eighths? And somebody seems to have crossed it out. Maybe we should drop that one, too.”

“And 64 is the only other item that isn’t circled. That must mean something.”

Another direction is suggested: “Instead of Gauss’s sum of the squared errors, we could adopt the criterion of Laplace, the sum of the absolute errors. With this choice, the favored estimator is the median rather than the mean. The median of our data is 63.625. And the median is much less sensitive to outliers and strangely shaped distributions. Whether we include or exclude the eight-and-an-eighth measurement makes only a minor difference.”

“What makes you all so sure we’re seeing several attempts to measure the same quantity? I think we actually have three distinct sets of measurements here, which just happen to be scribbled on the same piece of wood. The eight-and-an-eighth is clearly on its own. The two uncircled measurements form another set. And then we have four circled values all clustering around 63-and-something. If we want to simultaneously optimize the least-squares error for all three sets, we should be using a James-Stein estimator, which shrinks the average of each set toward the overall average.”

At this point a Bayesian is heard from. Others mention maximum likelihood, Pitman’s measure of closeness, minimum variance, the method of moments….

The conversation does not end here, but the rest is lost in the whine of the power saw. The carpenter has cut off the plank somewhere out beyond 64 inches and explains this choice as follows: Cutting long may mean cutting twice, but cutting short means buying twice.

*       *       *

One lesson you might draw from this little farce and fable is that if you have a hard decision to make, you should call a carpenter rather than a statistician. But that’s not the conclusion I intended.

You sometimes get the impression that statistics is a dry and lifeless discipline, where all the interesting questions were answered long ago, and all that remains now is to memorize some formulas and learn when to apply them. I think not!

Problems in statistics don’t get much simpler than this one. It concerns a small set of observations, with one variable in one dimension and one parameter to be estimated. It’s a problem that would have been perfectly familiar to Gauss and Laplace, Legendre and Adrain. And yet there’s still room for doubt and controversy about how best to approach such questions.

I found the plywood puzzle challenging enough that I was led to do some reading. Most of it is well above my grade level, and so I can’t claim to have absorbed everything the authors have to say. But I’ll offer a few pointers in case anyone else wants to follow along:

  • Colin R. Blyth (1951) directly confronts the Norm Abram question: How do you decide when to stop measuring and start cutting? I gather that this paper was a major landmark in estimation theory. R. H. Farrell (1964) follows up on related themes. (A number of other papers could be mentioned in the same context; I draw attention to these two because they are freely available online through Cornell’s Project Euclid.)
  • There’s an “Introduction to Estimation Theory” by Don Johnson of Rice at the Connexions web site. The context is signal processing, but there’s plenty of use to carpenters.
  • For the history of statistics, Stephen Stigler is always the place to start. His article on “Gauss and the Invention of Least Squares” is chapter 17 in Statistics on the Table (Harvard University Press, 1999). The original 1981 version from Annals of Statistics is online here through Project Euclid.
  • For a gentle introduction to the James-Stein estimator, I recommend a Scientific American article by Bradley Efron and Carl Morris, “Stein’s Paradox in Statistics” (Vol. 236 No. 5, May 1977, pp. 119–127). (Disclaimer: I was the editor of that article.)
  • Finally, at the moment I’m halfway through Pitman’s Measure of Closeness: A Comparison of Statistical Estimators, by Jerome P. Keating, Robert L. Mason and Pranab K. Sen (SIAM, 1993). I really don’t yet know what to make of this, but it has opened up a world I knew nothing about.

Pulling the goalie

November 29th, 2007

Don Elgee, a retired teacher of mathematics and computer science from Ottawa, sends the following inquiry:

In hockey, when a team is down by a goal with about one minute to go, the goalie is pulled in favor of another offensive player. This illustrates a key point in the strategy of most games.

The object is not to maximize points for, and minimize points against; rather, it is to win or tie the game. As the game nears its conclusion the winning team plays more conservatively and the losing team more radically. But when—and to what extent—should a team change its style of play? 

Ottawa Senators goal; photo by C. P. Storm; http://images.google.com/imgres?imgurl=http://upload.wikimedia.org/wikipedia/commons/thumb/3/36/Ice_hockey_goal.jpg/501px-Ice_hockey_goal.jpg&imgrefurl=http://commons.wikimedia.org/wiki/Image:Ice_hockey_goal.jpg&h=600&w=501&sz=73&hl=en&start=127&sig2=rm9lycYyPZku7E-qGdcpZg&tbnid=Ox2HSAN9YSjfAM:&tbnh=135&tbnw=113&ei=-0JPR6CfM5HqhwLny6ShDg&prev=/images%3Fq%3Dhockey%2Bempty%2Bnet%26start%3D120%26gbv%3D2%26ndsp%3D20%26svnum%3D10%26hl%3Den%26safe%3Doff%26sa%3DN

To clarify the problem I tried to conceive the simplest possible example. It turned out to be much more complex than I expected.

Consider the following simple game: Two players toss a die on alternate turns. The first one to reach 100 is the winner. Players have a choice of two dice—one is normal (6, 5, 4, 3, 2, 1) and the other has faces with 6, 6, 5, 1, 1, 1. When should the losing player switch to the second die?

Is there a mathematical solution? Could you determine this by experimental computer simulation? Does the best strategy depend on the strategy the opponent is likely to choose next?

This simple problem is difficult enough, but the sports situation could better be simulated by adding a third die with 4, 4, 3, 3, 3, 3.

Has this issue been subject to mathematical study? With all the statistics used in professional sport has anyone tried to include this factor? Has it ever been incorporated in computer simulations?

Hockey is not my sport, but these are interesting questions. Inspired by Elgee’s letter, I’ve run a few of the obvious simulations. For the time being, however, I’m not going to say much about the results. What I’ve learned so far does not lead to any simple, pithy rule of thumb that a hockey coach could use to decide when to pull the goalie. Nor do I have a clear grasp of how best to play the dice game. I’m hoping someone else will offer a sharper analysis. Toward that end I have some further notes and observations.

Under the stated rules—alternating turns, with victory going to the first player who reaches 100—the privilege of moving first brings a substantial advantage. In a 100-point game with standard dice, the first player can expect to win about 55 percent of the time. If two players are tied at 99, the first to roll is definitely going to win, regardless of which die is chosen. Because of this bias, we probably need to consider different strategies for the first and second players (like white and black in a chess game). An alternative is to add a further element of randomness to the game: In each round, the player to move is determined by the toss of a fair coin.

Elgee’s two nonstandard dice have faces that sum to 20, less than the standard die’s 21. As a model of hockey tactics, this bias makes sense: It reflects the risk of leaving the goal untended. But if we ignore hockey and just consider the dice game on its own merits, it might be more interesting to explore a symmetrical contest with dice marked 6, 6, 6, 1, 1, 1 and 4, 4, 4, 3, 3, 3. Then we’d be looking at three distributions that have the same mean but different shapes.

Generalizing further, we could allow dice marked with any six non-negative integers that sum to 21. What would happen to the game if we allowed dice such as 100, -16, -16, -16, -16, -15? Or, we could allow the players to choose continuous probability density functions.

It’s easy to imagine circumstances where a “high-risk” die should be advantageous. If the score is 99 to 94, then the trailing player should be willing to accept any trade-off that improves the odds of rolling a 6; even a 6, 6, 0, 0, 0, 0 die is preferable to the standard die. Conversely, if you’re ahead near the end of the game, the “low-risk” die seems favorable. All this suggests there ought to be some fairly simple guidelines for choosing the best die in any situation. But, as Elgee noted, it’s more complicated than you might expect.

In my simulations I found that with the score tied at 94, switching to the 6, 6, 5, 1, 1, 1 die (while your opponent continues to play the standard die) yields an advantage of 2.9 percent. But when the score is 96 to 96, making the same switch produces a deficit of 3.7 percent. Interestingly, choosing the 4, 4, 3, 3, 3, 3 die brings similar results: a gain at 94 to 94 but a loss at 96 to 96. (The simulations used the coin-flip protocol, but the same kind of anomalies appear under other rules.)

It gets even more complicated. The results cited above are for players who always choose the same die, no matter what the circumstances. But Elgee’s rules allow a player to choose a different die on each roll. Presumably, players who adapt or learn will do better than any player with a fixed strategy. I find that a very simple adaptive strategy—use 6, 6, 5, 1, 1, 1 when behind, 4, 4, 3, 3, 3, 3 when ahead, and 6, 5, 4, 3, 2, 1 when the score is even—gives mostly but uniformly good results in end-game situations.

I echo Elgee’s questions about whether this problem has been analyzed previously. As it happens, I’ve been able to find a bit of literature on the original hockey issue. Tom Benjamin’s NHL Weblog has pointers to three papers published in the 1980s in Interfaces, an operations research journal. In those papers several authors—fans of rival teams—derive a formula for the optimal time to pull the goalie. (They find that coaches almost always wait too long.)

I would have thought that Elgee’s nonstandard dice would also have been investigated before. Some years ago Bradley Efron invented a wonderful set of nontransitive dice, which Martin Gardner wrote about (Scientific American, December 1970, pp. 110–114). Gardner also had a column on various kinds of rigged and trick dice (Scientific American, November 1968, pp. 140–146). A game called Piggy has certain elements reminiscent of Elgee’s game, but those elements don’t include dice that differ in variance.

Perhaps I’ve just failed to find the right search term on Google or MathSciNet, and a reader will supply a reference.

Update: Minutes after posting the above, I discovered an article by B. De Schuymera, H. De Meyera and B. De Baetsb, “Optimal strategies for equal-sum dice games” (Discrete Applied Mathematics 154 (2006) 2565–2576), that covers some of the territory. They analyze a game in which players can choose any die whose n faces have the same sum σ. But they consider only games in which players bet independently on each throw of the dice. Under these conditions, they show that with six-sided dice summing to 21, the optimal die is the standard one, with faces marked 6, 5, 4, 3, 2, 1. Again, however, if I understand correctly, this result applies only to individual throws of the die.

Last name first

November 20th, 2007

Saturday’s New York Times had a story by Sam Roberts about a newly released Census Bureau study of the frequency of surnames in the U.S. The Times story was mainly about the names at the top of the list, and especially the increasing prominence of Hispanic names (Garcia and Rodriguez have made it into the top ten). But what caught my attention was a passing comment about the bottom of the frequency distribution:

Altogether, the census found six million surnames in the United States. Among those, 151,000 were shared by a hundred or more Americans. Four million were held by only one person.

I was not surprised to learn that the distribution of name frequencies is steeply skewed, with a few common names and a great many rare ones. But could it be true that two-thirds of the names occur just once in the population—that four million people in the U.S. have a unique family name they share with no one else?

Looking through the lens of personal experience, I found it hard to believe those numbers. Over the years I’ve met some people whose family names are surely rare, but I am not aware of a single acquaintance who is the holder of a unique name—if only because everyone I know shares a name with parents or children or siblings or a spouse. After all, family names tend to run in families! To have a unique name, you’ve got to be the first of your line or the last of your line or both.

The study of name distributions has a long history. In the 1870s Francis Galton and Henry William Watson looked into the longevity of family names, concluding:

All the surnames, therefore, tend to extinction in an indefinite time, and this result might have been anticipated generally, for a surname once lost can never be recovered, and there is an additional chance of loss in every successive generation.

The argument sounds good, but it’s not quite as broadly applicable as Galton and Watson thought it was. Extinction is inevitable only in a static or shrinking population. If the population is growing, names and families can become all but immortal. In the 1920s Alfred Lotka calculated that American family names had about an 18 percent chance of surviving indefinitely. More recently, Susanna C. Manrubia, Bernard Derrida and Damián H. Zanette have developed a more refined computer model of name evolution (see arXiv preprint 1 and 2; there’s also a splendid American Scientist article, but annoyingly it’s only accessible to subscribers). Manrubia, Derrida and Zanette describe an equilibrium state where the distribution of names follows a power law. If we define a “clan” as the set of all people who have a surname in common (whether or not they are actually related), then the predicted number of clans of size m is proportional to m–β. Manrubia, Derrida and Zanette argue that β = 2. Thus, for example, clans 10 times larger should be 100 times rarer.

How do the new Census Bureau findings stack up against these predictions? Here is the frequency table included in the summary report (.pdf):

Table of frequencies of last names

For this data set the cumulative numbers are easier to work with because of the nonuniform bin sizes. Here’s how they look in a graph:

graph of cumulative name frequencies

Graphs of this kind can be confusing. I find it helpful to keep in mind that a point at coordinates x,y indicates there are y clans with x members or more.

If clan frequencies were governed by a strict power law, the graph would trace a straight line on these log-log scales. Overall, the curve is indeed fairly straight, tending to support the power-law model. But a few features of the curve seem to depart from the predictions. For one thing, the slope of the line gives an exponent closer to β = 1 than β = 2, as Manrubia, Derrida and Zanette would lead us to expect. I can’t explain that. A steepening of the curve at the large-clan end could be an artifact of finite sample size. Most interesting of all is the sudden uptick at the opposite end of the curve, where clans of size 1 are much more abundant than the power law predicts. On a logarithmic scale it’s easy to misjudge the magnitude of such a trivial-looking excursion: If the two leftmost data points (for clans of size 1 and size 2 through 4) were restored to the trend line of the data from clan sizes of 10 through 1,000, the total number of names in the survey would be about three million instead of six million, and there would be only one million unique names instead of four million.

I’ll not keep you in suspense any longer about the cause of this anomaly. When I downloaded the Census Bureau report, I found that the authors (David L. Word, Charles D. Coleman, Robert Nunziata and Robert Kominski) are also skeptical about those four million solo monikers. They explain that the data came from census forms on which respondents were asked to print the first, middle and last names of all household residents; the forms were then electronically scanned, and the answers were extracted by optical character recognition. Errors at any point in the process could turn a common name into a unique (but fictitious) one—making a MLLLER out of a MILLER, say. Some of these errors were corrected in later processing, but others apparently slipped through. One particularly troublesome problem arose whenever a respondent printed an entire name in the space intended for the surname. The OCR software simply concatenated all the parts of such a response, leading to spurious surnames such as PETERJDAVIS. The report states that “many” of the four million unique names are products of such data-entry errors, but there is no attempt to quantify the effect.

For privacy reasons, the Census Bureau has released only the 151,671 names (.zip) occurring at least 100 times, so there’s no way to get a look at the unique names. You might think, though, that if three-fourths of them are malformed in some way, that fact would stand out prominently and would have been noticed even before this study was undertaken. You might even think that if 1 percent of respondents are entering names incorrectly, the Census Bureau would have discovered that fact in preliminary testing and would have redesigned the form before circulating it to 300 million people.

Still, I suppose the Bureau’s explanation must be true. There’s spotty suggestive evidence even in the list of names appearing 100 times or more. For example, the list includes surnames such as VANBURKLEO and JOHNSONWILLIAM. And either there are 160 people in the U.S. whose surname is JOHNOSN, or there are 160 JOHNSONs who all made the same transposition error when entering their name on a census form. (Or some combination of the above.)

Even if there are only a million unique names, that still seems like a lot—one out of every 300 people. Galton and Watson looked upon such lonely surnames as dying embers, the last hope of families on the brink of extinction. But some of the rare names are surely newborns rather than expiring elders. Immigration brings names that are new to the U.S. even if they are far from unique globally. And processes akin to mutation and recombination are creating new names all the time. In particular, recombination has become more important now that the purely patrilineal model of name transmission is no longer universal; surnames have broken free from their linkage to the Y chromosome. As a matter of fact, now that I think of it, I was wrong when I said that I have never known a person with a unique surname. I have friends who named their daughter Nina Auslander-Padgham, and her surname surely has a good chance at uniqueness. Or at least it did until Nina’s brother Milo was born.

Out of curiosity, I opened up the Boston-Cambridge phone book, selected a few pages at random, and counted up unique names as a proportion of all names. In a sample of 458 surnames, 254 were listed for one person only, or about 55 percent. This result isn’t too far from the two-thirds ratio in the Census Bureau report, but I’m not sure how to interpret it. The geographic area covered by the Boston directory includes a population of roughly a million, or about 1/300th of the national population. When you select a small sample of this kind—supposing it to be a random sample—what does the selection process do to the frequency distribution of names? If a name occurs 300 times nationally, it could well be unique in Boston, thereby apparently boosting the number of unique names. On the other hand, for every 300 names that truly are unique nationally, only one is likely to be represented in Boston, so in this way the number of unique names is greatly diminished. The question I leave you with is this: How best can we estimate the national (or global) proportion of unique names from a small random sample?

Until NEXPTIME

November 18th, 2007

I have a few questions for the complexity theorists among us.

Have you ever tried to explain to your grandmother why NP is named NP? Does she get it when you say that problems labeled NP-complete are the hardest problems in NP, but NP-hard problems might be harder, and not in NP?

My concern here is not with the difficulty of the concepts—there’s not much we can do about that—but with the notation and terminology. Do locutions like P#P and NISZK and (NP ∩ coNP)/poly roll trillingly off your tongue? How about EXP, EEXP, NEXP, PEXP and SUBEXP? And while we’re on the subject of EXP and friends, I’ve been wondering how to pronounce NEXPTIME. (I’m kind of hoping the “P” is silent, as in Pterodactyl.)

Presumably, the argot of complexity theory works well enough for communication among members of the club—those of you, say, who know that “IP” is neither “internet protocol” nor “intellectual property” but “interactive proof.” If you’d ever like to talk to the rest of the world, however, I think there might be room for improvement.

Let’s start with the term “complexity” itself. The root meaning of “complex” is “made up of many interconnected parts”; for example, the innards of a mechanical clock are complex. Is this the best word to describe the property of being hard to compute? To me, it seems the difficulty of an exponential algorithm is not the complexity of the task but the mere mind-numbing repetition of the same operations. An exponential heap of manure is no more complex than a polynomial heap of manure, it just takes longer to shovel it.

Another gripe: WHAT’S WITH ALL THE CAPITAL LETTERS? IS IT REALLY NECESSARY TO SHOUT ALL THE TIME?

But the thing that worries me most is a certain opacity in the names of complexity classes. According to the Complexity Zoo, there are currently 465 named classes, which is quite a menagerie. It’s enough that you can’t really expect people to keep track of meanings and relations without some internal cues from the structure of the names themselves.

There are hints of structure in the naming scheme, with terms assembled from stems, prefixes and suffixes, and combined according to quasi-grammatical rules. For example, the superscript in P#P indicates the action of an “oracle”: The class encompasses problems that can be solved in polynomial time with the help of a #P (”sharp-P” or “counting-P”) oracle. The prefix “co-” in “coNP” designates the complement of NP: If a “Does there exist…?” question is in NP, then the corresponding “Does there not exist…?” question is in coNP. These devices are applied consistently; as far as I can tell, superscripting and the “co-” prefix always mean the same thing. Unfortunately, other elements of the names are not so easy to parse. The letter P generally stands for “polynomial” (except where it’s “probabilistic”). N usually denotes “nondeterministic” (but NC is “Nick’s Class”). Likewise the prefix D is for “deterministic” (except that it’s usually omitted, and sometimes it means “difference” or “dynamical” instead). B stands for “bounded-error” (except that BH is “Boolean hierarchy” and “BPd(P)” is “Polynomial Size d-Times-Only Branching Program”). Q is for “quantum” (except “QH” is the “query hierarchy” and “QP” is “quasi-polynomial time”).

The sad truth is, the naming conventions for furniture at Ikea make for a more consistent language than those of complexity theory.

How did we get into this fix? Back in 1974, when complexity theory was still young and impressionable, Don Knuth published “A Terminological Proposal” in SIGACT News, the quarterly newsletter of what was then the ACM Special Interest Group on Automata and Computability Theory. (By the way, the issue in which Knuth’s letter appears, Vol. 6, No. 1, seems to be missing from the ACM Digital Library. Can anyone supply a copy for scanning?) Knuth had polled 30 of his friends, asking for advice on the best adjective to describe problems “at least as difficult to solve in polynomial time as those of the Cook-Karp class NP.” His own suggestions were “Herculean,” “formidable” and “arduous,” but none of those got strong support. The people he consulted had proposals of their own: “impractical,” “intractable,” “prodigious,” “exorbitant,” “Sisyphean” and several others. The idea I like best came from Shen Lin, who suggested the abbreviation PET:

PET stands for “probably exponential time”. But if this gets proved, PET stands for “provably exponential time”. And if the proof goes the other way, it stands for “previously exponential time”.

On the whole, I think it’s just as well that we are not now speaking of Herculean or Sisyphean or PET problems, but I haven’t much enthusiasm for the solution that was adopted, either:

The “winning” write-in vote is the term NP-hard, which was put forward mainly by several people at Bell Labs, after what I understand was considerable discussion. Similar if not identical proposals were made by Steve Cook, by Ron Rivest, and in earlier publications by Sakti Sahni. This term is intended for use together with another new one, NP-complete, which abbreviates ‘polynomial complete’ and is at the same time more exact.

Knuth endorses “NP-hard” and “NP-complete” primarily because they have precise definitions and thus meet the needs of the insiders, the “automata-theorists.” But he also goes on to address the issue of communicating with a wider audience, and his thoughts are worth quoting at some length:

I don’t consider it a major goal to invent completely descriptive terminology for every conceivably interesting question of the type considered here; the major goal is to have a good term to use for the masses, in the one case which experience shows is almost omnipresent. To say NP-hard actually smacks of being a little too technical for a mass audience, but it’s not so bad as to be unusable; fortunately the term does easily generalize to a lot of other cases considered by automata theorists, so it appears to be a good compromise. When more technicalities are introduced, there is less need for a universally accepted term or notation. Although it’s very interesting to consider, e.g., log-space reductions, I don’t think it’s necessary to come up with a special short name for them.

If I understand this passage correctly, Knuth is arguing that although “NP-complete” and “NP-hard” may be a bit abstruse for casual coffeeshop chatter, they are the only terms that nonspecialists will ever need to learn, and so they won’t be too much of a burden. But that’s not quite the way it has worked out.

I suppose that if I’m going to complain about somebody else’s notation, I ought to propose an alternative of my own that I believe to be better. I can’t do that, but I would like to point out that certain other fields of study offer models that might be emulated. The most obvious model is organic chemistry, where the naming problem is orders of magnitude larger than the collection of 465 items in the complexity zoo. The sesquipedalian names of organic molecules are not exactly pretty, but the notational system has the wonderful property that if you know the name of a thing, you also know its composition and structure. Wouldn’t it be grand to have the same kind of perspicuous scheme for complexity classes, so that just naming a class would tell you where it lies in the inclusion diagram?

A saving grace of complexity theorists everywhere is that they are not without a sense of humor when it comes to the obscurity of their own jargon. For example, there is a complexity class named PINC, which the Zoo translates as “Polynomial Ignorance of Names of Classes.” And you don’t want to miss Alan T. Sherman’s discourse (or descant) on superpolylogarithmic subexponential functions (.ps).

Finally, I should mention a discovery I’ve made at the end of Knuth’s terminological proposal. Everyone knows that settling the P = NP question will earn you a prize of $1 million from the Clay Mathematics Institute. I suspect that many are unaware of another prize offer, made more than 30 years ago. Knuth writes:

I’m willing to give a prize of one live turkey to the first person who proves that P = NP.

Note that this appears to be a one-way-only offer. Proving that P ≠ NP won’t win you the turkey. (You have three days left until Thanksgiving.)

A New Yorker theorem

November 6th, 2007

Barry Cipra, my friend and former neighbor, and a frequent bit-player commentator, is the Talk of the Town this week. A story in The New Yorker by Lizzie Widdicombe highlights Barry’s work on the mathematics of the following calendrical problem.

Doing all arithmetic modulo 100, if you were born in the year X, you reach age X in the year 2X. For example, if you were born in ’54, you’ll be age 54 next year, in (2 × 54) mod 100 = ’08. But another group will also be reaching their birth-year age in ’08, namely the children born in ’04. So here’s the question Barry tackles: In any given year, which cohorts have already reached their birth-year age, and which are still looking forward to this event?

For an account of the human side of this problem, explaining why it might interest New Yorker readers, see Widdicombe’s article. For more on the mathematics, see Barry’s paper. (As far as I know, this is the only mathematics paper you’ll find with a URL beginning “http://www.newyorker.com/”.)