Archive for the ‘computing’ Category

In the zone

Tuesday, August 24th, 2010

Mt. Shasta, from a nearby hilltop owned by Coca Cola

Before leaving on a trip to the West Coast, I copied my return flight information onto my Google calendar: SFO to BOS, 12:50 p.m. to 9:30 p.m. Now that I’m nearing the end of my visit to the Mythical State of Jefferson (see local landmark above), I’ve just checked the departure details by calling up the calendar on my cell phone. It tells me the flight departs at 9:50 a.m. and arrives at 6:30 p.m.

It could be worse, of course. As an eastbound traveler all that I risk is wasting three hours at the airport. If I were westbound, I might well miss my flight.

The developers of Google calendar would doubtless argue that automatic time-zone conversion is a feature, not a bug. If the event in question had been a conference call scheduled for 12:50 p.m. Eastern Daylight Time, then 9:50 a.m. Pacific Daylight Time would indeed be the moment to dial in. Or, if I had a series of pills to be taken at fixed intervals, it might be helpful to have the program remind me at the correct times as I wander across continents. But in the case of today’s flight home, the result is just plain wrong.

It’s not only a Google calendar problem. A few weeks ago a friend was entering a schedule of talks into a Drupal web page for a conference that begins November 7, 2010. All the times were mysteriously shifted by an hour. A 1:00 p.m. talk on the input form became a 2:00 p.m. talk on the displayed web page. The key to solving this mystery is knowing that November 7 is the date daylight saving time ends in (most of) the U.S.

I am certainly not the first to encounter such problems. Peter Neumann’s RISKS Digest reports hundreds of computational mishaps involving time zones or daylight saving time, going back over the past 25 years. The issue is known to Google. But what is the right fix?

Some other calendar software offers the option of specifying a time zone for an event. To handle the airline case correctly, the program needs to allow for different zones for the start and the end times. And the Drupal problem suggests we may also need some means of indicating whether or not to adjust for daylight saving time. It gets very messy. Note that an event scheduled for 1:30 a.m. on November 7, 2010, will happen twice. An event at 2:30 a.m. on March 13, 2011, will never take place.

For years my own makeshift solution to these complexities was to live in a single time zone, no matter where I was. I carried a laptop, but I never changed its time-zone setting, even when I was away from home for weeks or months. When conversion was needed, it happened in my head. Even now the Google calendar display on my laptop gives the correct departure time from SFO, because the laptop doesn’t know that it ever left Boston. But in our new world of location-aware devices, pretending to stay home is no longer an option.

I begin to wonder if the whole railroad-age concept of time zones hasn’t outlived its usefulness. But I haven’t time just now to consider the alternatives. Right now it’s time to leave for the airport. I think.

The ormat game

Monday, August 16th, 2010

Here’s the deal. I’m going to give you a square grid, with some of the cells colored and others possibly left blank. We’ll call this a template. Perhaps the grid will be one of these 3×3 templates:

colored 3x3 ormat grids

You have a supply of transparent plastic overlays that match the grid in size and shape and that also bear patterns of black dots:

dot patterns for the six 3x3 permutation matrices

Note that each of these patterns has exactly three dots, with one dot in each row and each column. The six overlays shown are the only 3×3 grids that have this property.

Your task is to assemble a subset of the overlays and lay them on the template in such a way that dots cover all the colored squares but none of the blank squares. You are welcome to superimpose multiple dots on any colored square, but overall you want to use as few overlays as possible. To make things interesting, I’ll suggest a wager. I’ll pay you $3 for a correct covering of a 3×3 template, but you have to pay me $1 for each overlay you use. Is this a good bet?

Before going further, I should mention that not every conceivable template can be covered under these rules. To take an obvious example, no 3×3 template with fewer than three colored squares can possibly be covered by any combination of the six overlays. But I promise to submit only templates that can be covered by some combination of the given dot patterns; if I err about this, I forfeit the bet.

How does the game play out? If I give you the template marked “1″ above, you can easily win; just choose permutations a and b, which together cover all the colored squares and no others. You pay $2 and get $3. Template 2, with all nine squares colored, looks like it might be the toughest challenge. Clearly, it cannot be covered with fewer than three overlays, since we need a total of nine dots; and it turns out that exactly three overlays are required. Indeed, there are two ways of covering the template with three overlays: a + d + e and b + c + f. Thus this template is a breakeven proposition: You earn $3 and pay $3.

Now we come to template 3, which has eight colored squares and one blank. Surely if you can cover the full nine squares with just three overlays, then you should also be able to cover eight squares—no? I invite you to try it. In fact the only covering that works requires four overlays: b + d + e + f. Thus you shouldn’t take my bet, since I can always give you a template with just one blank, and you’ll have a net loss of $1.

Some background. I’ll return to the gaming table momentarily, but first let me explain what this is all about and where it came from. A few weeks ago, I was writing about “ranges of rankings,” which led me into the topic of permutation matrices. To recapitulate:

  • A permutation matrix is a square matrix with a single 1 in each column and each row, and all the rest of the elements 0.
  • An ormat is a superposition of permutation matrices, formed by applying the Boolean OR function to corresponding elements of the permutation matrices. For example: matrix-or-sum.png
  • Not all square matrices with (0,1) entries can be formed by OR-ing permutation matrices, but there’s an efficient algorithm for deciding whether or not a given matrix is an ormat. (I thank some helpful commenters for enlightening me on this point.)
  • Given an ormat, the total number of distinct permutation paths that can be threaded through the 1 entries of the matrix is equal to the permanent of the matrix. Calculating the permanent is known to be a hard computational problem.

In a comment, Barry Cipra posed the following query:

The permanent tells us the maximum number of different permutations that can be OR-summed to produce a given ormat, but what is the corresponding minimum number? Also, in how many different ways can the minimum be achieved?

The connection between ormats and my little game is probably apparent by now. The template of colored and blank squares is an ormat; the dotted overlays represent permutation matrices; to maximize your payoff in the game (or to minimize your loss), you need to answer Barry’s first question, finding the minimum number of permutations that can be combined to yield the given ormat.

For 3×3 matrices, we can solve this problem by exhaustive search, calculating the OR-sums of all possible combinations of the six 3×3 permutation matrices taken 1, 2, 3, …, 6 at a time. I did this with pencil and paper on a recent airplane trip. Here is a summary of the results:

number of ormats generated by various combinations of permutation matrices

Some of the numbers on this card are easy to explain. The six ormats with just three 1 entries are the permutation matrices themselves. There are six of them because there are 3! = 6 permutations of three things. There are no ormats with four 1 entries for a reason that bears thinking about: There can be no permutations that differ from one another in just one element. When you superimpose any of the six overlays shown above, you can wind up with three, five or six dots, but never four.

At the other end of the scale, it’s no surprise that there’s exactly one ormat with nine 1 entries, and that it takes three permutations to produce it. And then there are the nine ormats with eight 1 entries, which each require four permutations to be OR-ed. These are the single-blank patterns like template 3 above.

Based on these results, I began speculating about what I would see in a tabulation of all 4×4 ormats.

guesses about stats for 4x4 ormats

There would have to be 4! = 24 patterns with four 1 entries, and just one pattern with all 1s, generated by OR-ing four permutations. And there should be 16 ormats that require five permutations, namely the 16 matrices with a single 0 element. This last prediction seemed a little less self-evident than the others.

Pocket change and Cheerios. My thoughts about the single-zero (or single-blank) case went something like this. To cover 15 squares with sets of four dots each, we need at least four sets, or else we simply won’t have enough dots. So a useful starting point is one of the optimal arrangements that cover all 16 squares without gaps or overlaps. By this time I had grown tired of drawing zillions of dots, and so I started working with sets of coins.

initial configuration of four permutations of coins

In this arrangement each coin denomination forms a permutation, with no two pennies, nickels, dimes or quarters in the same row or the same column. We have successfully covered all the colored squares, but unfortunately we’ve also covered the blank at the lower right. Thus this pattern of coins is not an acceptable solution, but maybe we can fix it up somehow?

adjusted configuration after one coin is moved

Moving the penny from the blank square to another square in the same column solves one problem but creates another: Now the arrangement of pennies is no longer a permutation. There are two pennies in the third row.

coins after second adjustment to restore permutation

So now we have to shift another penny to restore the one-per-row-and-column property. Inevitably, this leaves a colored square uncovered. The only way we can cover that exposed square is to introduce a fifth permutation. Since I had run out of coin denominations, I chose a popular brand of breakfast toroids. Voila:

coins and cheerios -- the five-permutation solution

There’s nothing special about the particular moves I chose in this sequence. If you try some alternatives, you should be able to persuade yourself that moving the penny that covers the blank to any other square in the fourth column (or in the fourth row) would lead to essentially the same situation. Likewise the game would come out the same if the single blank square were placed anywhere else in the grid. And you could also start with a different set of initial permutations (provided they cover all the squares).

This coin-shuffling exercise demonstrates that we can cover any 4×4 template that has a single blank by combining no more than five permutations, but how do we know that five are actually needed? Maybe there’s some totally different arrangement that would do the job with just four permutations? Well, think about what such an arrangement would look like. It would have to differ at exactly one position from some other layout of permutations that covers the full 16-square grid. But no two permutations can differ at one and only one place. Thus the reason there can be no four-permutation cover of 15 squares is essentially the same as the reason no 4×4 ormat pattern can cover just five squares.

This argument generalizes to k×k matrices: For any integer k, there must be at least k ormat patterns that cannot be covered with fewer than k+1 permutations. But then comes the bigger speculative leap: Perhaps k+1 is an upper bound. Perhaps part of the answer to Barry’s question is that no k×k ormat pattern requires more than k+1 permutations. At one point I even had a “proof” of this conjecture. Then I wrote a program to check it, doing much the same thing I did with the dots on the airplane.

Out of bounds. My program found the expected 16 ormat patterns that require five permutations—and it found many more as well. In all it identified 2,032 4×4 ormats that can’t be composed from fewer than five permutations. And then came a bigger surprise: The program also found 480 patterns that require six permutations. So much for my proposed upper bound.

One of those problematic 480 ormats takes this form:

((1 1 1 1) (1 1 1 1) (1 1 1 1) (1 1 0 0))

Looking over this pattern, I thought I understood where my earlier reasoning had gone awry. This matrix is just like the single-zero pattern, but with two zeros! (I do mean for that statement to make sense. Bear with me.) Suppose we start again with a set of four permutations that completely cover the grid, including the two blanks.

starting configuration of 16 coins on 4x4 template with two blanks

Then we can uncover each blank just as we did in the coin-shuffling procedure above, although we have to be careful the two sets of movements don’t interfere with each other. (Not much point in removing the penny from a blank square, then putting the nickel there.) Here is a strategy for clearing both blank squares while maintaining the one-per-column-and-row permutation property:

four coins and two blanks: first solution

Inevitably, when we uncover the two blank squares, we also remove coins from two colored squares, which now have to be filled in again. The key point is that no single permutation can repair that damage, because the two open colored squares are in the same row. To cover both of those squares we need two additional permutations.

Other ways of reshuffling the coins avoid putting the two open squares in the same row or column, but they still foil all attempts to complete the covering with just five permutations. Try adding four Cheerios to the diagram below. If you cover both of the open blue squares, then either you also cover one of the blank squares or you wind up with two Cheerios in the same row.

four coins and two blanks: second solution

So now it’s clear we need as many as six permutations to cover a 4×4 ormat. Does that suggest that the general upper bound might be k+2 rather than k+1? Or perhaps the appropriate formula is 2k–2? In support of this latter possibility I offer these two ormats, which require 8 and 10 permutations respectively:

2k-2ormats.png

Another wager. Having fooled myself several times about the upper bound on minimal ormat coverings, I feel I should build in a little margin for error before I invite you to make a further wager. We already have direct evidence that covering a k×k ormat can take as many as 2k–2 permutations. So I’ll be generous and offer a full $2k for a proper covering, while charging $1 per permutation. If k=3 or k=4, you can definitely make money on this deal. But is it a good bet for larger k? (Hint: I’d be willing to play the game on these terms for real money.)

•     •     •

Update 2010-08-19: No takers for my bet, eh? Too bad; I had already spent my winnings.

Barry Cipra, who raised the question about minimal ormat covers in the first place, sends this illuminating letter:

I’m going to tiptoe a short ways out on a long long limb and conjecture (really just guess) that the “worst case” behavior, in terms of the minimum number of permutations it takes to produce a given ormat, occurs for ormats of the following form, shown here for k=7:

((1 1 1 1 1 1 1) (1 1 1 1 1 1 1) (0 1 1 1 1 1 1) (0 0 1 1 1 1 1) (0 0 0 1 1 1 1) (0 0 0 0 1 1 1) (0 0 0 0 0 1 1))

So as not to abuse existing matrix terminology, I’ll call any (square) matrix of this type—i.e., whose entries below the main subdiagonal are all 0—”uppity triangular.” I can (and will!) show that this uppity triangular ormat for k=7 requires (at least) 16 permutations—and the number appears to grow concavely upwards from that, so I, for one, will definitely not take you up on your $2k wager.

The trick, I realized, is to view each ormat as the “shadow” of what I’ll call an “addmat.” If you let P1, P2, …, Pr be k×k permutation matrices, their addmat is simply the ordinary result of addition: S = P1 + P2 + … + Pr, whose entries are positive integers wherever one or more of the constituent permutations has a 1 and otherwise 0. The associated ormat is obtained by changing each of these entries to a 1, while leaving the 0’s alone. In this sense, the ormat’s 1’s are the “shadows” of the addmat’s positive entries.

What’s crucial is that addmats have a lovely little property not shared with their shadows: the row and columns sums of the entries of an addmat all equal the number of permutations that produce them, r.

Come now, let us reason together…. The uppity triangular ormat example above (for k=7) must come from an addmat of the form

{{

where a, b, and all the *’s are positive integers. In particular, each * is at least 1. Since all row and column sums must be equal, the sum a+b must equal the sum of b and all 6 *’s above it. Hence a is at least 6. Likewise a+b must equal the sum of a and all 6 *’s above it, so b is also at least 6. Hence a+b is at least 12, which means the OR-sum that produced the given ormat involves at least 12 permutations.

This clearly generalizes to arbitrary k, which is more than “direct evidence” that covering a k×k ormat can take as many as 2k–2 permutations, it’s rigorous proof! But we can immediately do better, at least on a case-by-case basis. If we try to get by with just 12 permutations for this uppity triangular ormat, we quickly run into trouble. We obviously must have a = b = 6, and it follows that all the *’s above them are 1’s (to make those column sums 12). That is, we have the addmat

{{

where I now wish to call your attention to the entry labeled “@”. To make its row-sum equal 12, we need @ = 10. But that means its column sum (with the 5 *’s above it) is at least 15, which cannot be! So we are forced to try larger values of a and/or b—which is to say, we need more permutation matrices to produce this addmat.

It turns out you can’t satisfy the row and column sum condition until you get to a = b = 8. I won’t take you through all the steps, but just give you a taste with the penultimate possibility, a = 7, b = 8. The best you can hope for in this case is

{{

Note that I put as much of the “weight” in the last two columns as close to the 7 and 8 as possible, so that I could use the smallest possible value (10) as the entry with 5 positive entries above it. This makes the last three rows, and the right three columns all have the same sum, 15, but now we see a problem in the 12’s column: Its sum is at least 16. So once again, we’re screwed. It’s only with the next attempt that we avoid contradiction:

{{8, 3, 1, 1, 1, 1, 1}, {8, 3, 1, 1, 1, 1, 1}, {0, 10, 2, 1, 1, 1,<br />
  1}, {0, 0, 12, 1, 1, 1, 1}, {0, 0, 0, 12, 2, 1, 1}, {0, 0, 0, 0, 10, 3, 3}, {0, 0, 0, 0, 0, 8, 8}}

This matrix finally has all its row and column sums equal. Please note, this may or may not be an actual addmat of a set of 16 permutation matrices—I suspect it probably is, but I haven’t bothered to check. All we know is that it satisfies a necessary condition of being an addmat, namely that its row and column sums are all equal. (It’d be nice if that were also a sufficient condition, but something tells me it isn’t.)

This example, which can clearly be played out for larger values of k, suggests that not only are you safe with a $2k wager, but with a $(2k+2) wager and higher I’ve played around with this a bit, and persuaded myself that the number of permutation matrices will go to 2k + a lot—for k=10, if I did things correctly, you need 24 permutations (or possibly more, if the uppity triangular matrix the analysis leads to is not an actual addmat). I am entirely convinced that some additional careful thought can streamline the analysis into a nice, slick proof. I’m just not sure I haven’t already made a mistake, and built an elaborate house of cards….

Does any of this jibe with what you’ve already found to be the case?

It does indeed jibe.

First of all, to answer a small question Barry left open, here is a set of 16 permutations that will successfully cover his 7×7 “uppity triangular” matrix:

{1,2,3,4,5,6,7} {2,1,4,3,6,7,5} {1,3,2,5,4,7,6} {1,3,4,2,6,5,7}
{2,3,1,5,6,4,7} {1,2,3,5,6,7,4} {1,2,4,5,3,6,7} {1,2,4,5,6,3,7}
{1,2,4,5,6,7,3} {1,3,4,5,2,6,7} {1,3,4,5,6,2,7} {1,3,4,5,6,7,2}
{2,3,4,1,5,6,7} {2,3,4,5,1,6,7} {2,3,4,5,6,1,7} {2,3,4,5,6,7,1}

This was found with a simple greedy search.

My own attempts to find an upper bound have focused not on uppity triangular matrices but on matrices I’ve been calling “flags,” like this 7×7 case:

{{0, 0, 0, 1, 1, 1, 1}, {0, 0, 0, 1, 1, 1, 1}, {0, 0, 0, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1, 1}}

This matrix also requires 16 permutations for a proper covering. To see why, try threading permutations through the columns of the matrix, starting at the left edge and in each column choosing a 1 element (never a 0) from a different row. Because of the block of zeros at the upper left, the first three elements of every permutation must lie in rows 4 through 7. Thus each permutation “uses up” three of the last four rows in the first three columns, and the rest of the permutation can revisit this range of rows only once. It follows that each permutation can touch only one element in the 4×4 block of 1s in the lower right corner of the matrix, and at least 16 permutations are needed to cover all the 1s in the matrix. Showing that 16 are sufficient is not hard.

This kind of analysis works for any odd k, and thus we know that such matrices can require as many as

{\biggl\lceil\frac{k}{2}\biggr\rceil}^2

permutations. (For even k the situation is a little less symmetrical, and I haven’t worked out the exact details.)

These results give us a lower bound on the upper bound on the number of permutations that may be needed to cover a k×k ormat. But we haven’t proved it’s the true upper bound. Are there other ormats that require even more permutations? My guess is no, but keep in mind that almost all my conjectures along these lines have turned out to be wrong.

The state of the spamosphere

Sunday, August 8th, 2010

BP has finally stoppered the Macondo well with a plug of mud and cement, but the gusher of spam continues to pollute inboxes everywhere. Maybe we need a relief well?

spam statistics Aug 2008 through July 2010

It’s been six months since my last spam update. The good news, I suppose, is that last summer’s huge spurt of spam has subsided. But I’m still getting 2,000 inanities a month. That’s six or seven times the rate I was seeing when I first started monitoring my spam intake in 2003. For the two years covered by the graph above, the cumulative total is 111,945 unwanted emails received.

Needless to say, most of those messages are utterly unremarkable. But as I reviewed the most recent batch of dreck, one series of emails caught my eye. In the past 24 hours I’ve received five copies of a spam with the subject line: “Solve this if you could…!!!!”:

For all Math’s champs, Accounting Experts & Number Numbssss… (including future champs too)

Find the 6th Number

1, 2, 6, 42, 1806, ____?

6th number is the password of the attachment.

The attachment mentioned in the last line is a zip archive that unpacks to reveal an .exe file, which I would not run even if I could. But I confess that I did stop to solve the little puzzle sequence. It’s not difficult, although it is a little harder than any of the series I use in the spambot filter on the bit-player comment form. Maybe I should add it to my repertory.

Four questions about fuzzy rankings

Saturday, July 24th, 2010

The National Research Council is getting ready to release a new assessment of graduate-education programs in the U.S. The previous study, published in 1995, gave each Ph.D.-granting department a numerical score between 0 and 5, then listed all the programs in each discipline in rank order. For example, here’s the top-10 list for doctoral programs in mathematics (as presented by H. J. Newton of Texas A&M University):

 rank    school                 score   
    1    Princeton               4.94
    2    Cal Berkeley            4.94
    3    MIT                     4.92
    4    Harvard                 4.90
    5    Chicago                 4.69
    6    Stanford                4.68
    7    Yale                    4.55
    8    NYU                     4.49
    9    Michigan                4.23
   10    Columbia                4.23

Note that the scores of the first two schools are identical (to two decimal places), and the first four scores differ by less than 1 percent. Given the uncertainties in the data, it seems reasonable to suppose that the ranking could have turned out differently. If the whole survey had been repeated, the first few schools might have appeared in a different order. Doctoral candidates in mathematics are presumably sophisticated enough to understand this point. Nevertheless, the spot at top of the list still carries undeniable prestige, even when you know that the distinction could be merely an artifact of statistical noise.

The committee appointed by the NRC to conduct the new graduate-school study wants to avoid this “spurious precision problem.” They’ve adopted some jazzy statistical methods—mainly a technique called resampling—to model the uncertainty in the data, and they’ve also decreed that the results will be presented differently. There will be no sorted master list showing overall ranks in descending order. Instead the programs in each discipline will be listed alphabetically, and each program will be given a range of possible ranks. For example, a program might be estimated to rank between fifth place and ninth place. Let’s call such a range of ranks a rank-interval, and denote it {5, 6, 7, 8, 9} or {5–9}.

For a hypothetical set of 10 institutions, A through J, here’s what a set of rank-intervals might look like.

bar graph showing ranges of rankings for schools A through J.png

Acknowledging the uncertainty in your findings is commendable. But let’s be realistic. If you actually want to make use of these results—for example, if you’re a student choosing a grad-school program—the first thing you’re going to do is sort those bars into some sort of rank order, trying to figure out which school is best and how they all stack up against one another. In other words, you’re going to undo all the elaborate efforts the NRC committee has put into obscuring that information.

Below is one possible ordering of the bars. I have sorted first on the top of the rank-intervals, then, if two columns have the same top rank, I’ve sorted on the bottom rank. Other sorting rules give similar but not identical results. For example, sorting on the midpoints of the intervals would interchange columns B and F.

bar graphs showing rank-ranges sorted into one canonical order.png

Question 1. Does sorting a set of rank-intervals by one of these simple rules yield a consistent and meaningful total ordering of the data? To put it another way, can you trust this attempt to reconstruct a ranking?

I hasten to add that this is not really a practical question about finding the best grad school. If you’re facing such a choice in real life, the NRC rank-intervals are not the only available source of information. But, for the sake of the mathematical puzzle, let’s pretend that all we know about schools A through J is embodied in those ranges of rankings.

It turns out that rank-intervals have some fairly peculiar behavior. Ranges of ratings are not a problem. If the NRC merely gave each school a fuzzy rating on the 0-to-5 scale, no one would have much trouble interpreting the results. But when you turn fuzzy ratings into fuzzy rankings, there are hidden constraints. For example, not all sets of rank-intervals are well-formed.

two impossible sets of rank-ranges

The set at left is impossible because there’s no one in last place. (We can’t all be above average.) The example at right is also nonsensical because D has no ranking at all. For a set of rank-intervals to be valid, there has to be at least one entry in each row and each column.

That’s a necessary condition, but not a sufficient one, as the two graphs below illustrate.

two more impossible rank-intervals

Do you see the problem with the example at left? Column B has a rank-interval of {1–2}, but in fact B can never rank first because A has no alternative to being first. The case at right is conceptually similar but a little subtler: If B is ranked third, then either first place or second place will have to remain vacant.

The underlying issue here is the presence of constraints or linkages within a set of rankings. Suppose you have calculated ratings and rankings of several schools, and then some new information turns up about one school. You can change the rating of that school without any need to adjust other ratings, but not so the ranking. If a school goes from third place to fourth place, the old fourth-place school has to move to some other rung of the ladder, and somebody has to fill the vacancy in third place. These interdependencies are obvious in a non-fuzzy ranking, but they also exist in the fuzzy case. You can’t just assign arbitrary rank-intervals to the items in a set and assume they’ll all fit together. This observation leads to a second question:

Question 2. What are the admissible sets of rank-intervals? How do we characterize them?

I have a partial answer to this question. It goes like this. Any ranking of k things must be a permutation of the integers from 1 through k. A permutation can be embodied in a permutation matrix—a square k × k matrix in which every row has a single 1, every column has a single 1, and all the other entries are 0. For example, here are the six possible 3 × 3 permutation matrices:

3x3-permutation-matrices.png

They correspond to the rankings (1, 2, 3), (1, 3, 2), (2, 1, 3), (3, 1, 2), (2, 3, 1) and (3, 2, 1).

Since a permutation matrix represents a specific (non-fuzzy) ranking, we can build up a set of rank-intervals by taking the OR-sum of two or more permutation matrices. What do I mean by an OR-sum? It’s just the element-by-element sum of the matrices using the boolean OR operator, ∨, instead of ordinary addition. OR has the following addition table:

                      0 ∨ 0 = 0
                      0 ∨ 1 = 1
                      1 ∨ 0 = 1
                      1 ∨ 1 = 1

For the first two 3 × 3 matrices shown above the arithmetic sum is:

matrix-addition.png

whereas the OR-sum looks like this:

matrix-or-sum.png

Every valid set of rank-intervals must correspond to an OR-sum of permutation matrices, simply because a set of rank-intervals is in fact a collection of permutations. The converse also holds: Any OR-sum of permutation matrices yields an admissible set of rank-intervals. Thus the OR-sums of permutation matrices—let’s call them ormats for brevity—are in one-to-one correspondence with the admissible sets of rank-intervals. (There’s just one catch when applying this idea to the NRC study. The columns of an ormat may well have “gaps,” as in the column pattern (0 1 1 0 0 1 1), which corresponds to the rank-interval {2–3, 6–7}. Will the NRC allow such discontinuous ranges in their grad-school assessments? Perhaps the issue will never come up in practice. In any case, I’m ignoring it here.)

Arithmetic sums of permutation matrices form an open-ended, infinite series; in contrast, there are only finitely many distinguishable OR-sums. The reason is easy to see: Ormats have k2 entries, each of which can take on only two possible values, and so there can’t be more than \(2^{k^{2}}\) distinct matrices. Because of the various constraints on the arrangement of the entries, the actual number of ormats is smaller. For example, at k = 3 the \(2^{k^{2}}\) upper bound allows for 512 ormats, but there are only 49:

the-49-3-by-3-or-sums.png

Thus we come to the next question.

Question 3. For each k ≥ 1, how many distinct ormats can we build by OR-ing subsets of k × k permutation matrices? Is there a closed-form expression for this number?

I have answers only for puny values of k.

   k       upper bound        # of ormats  
   1                 1                  1
   2                16                  3
   3               512                 49
   4            65,536              7,443
   5        33,554,432          6,092,721
   6    68,719,476,736                  ?

The tallies of ormats were calculated by direct enumeration, which is not a promising approach for larger k. (I note—to spare folks the bother of looking—that the sequence 1, 3, 49, 7443, 6092721 does not yet appear in the OEIS.)

To extend this series, we might try to exploit the internal structure and symmetries of the ormats. By sorting the columns and rows of the matrices, we can reduce the 49 3×3 ormats to just six equivalence classes, with the following exemplars:

exemplars of six ormat equivalence classes

Enumerating just these reduced sets of matrices should make it possible to reach larger values of k, but I have not pursued this idea. (Furthermore, the two-dimensional sorting of matrices looks to be a curiously challenging task in itself.)

By the way, I think the number of ormats will approach the \(2^{k^{2}}\) upper bound asymptotically as k increases. Many of the features that disqualify a matrix from ormathood—such as all-zero rows or columns—become rarer when k is large. I have tested this conjecture by generating random (0,1) matrices and then counting how many of them turn out to be ormats.

fraction-of-ormats.png

For k = 1 through 5 the results are in close agreement with the actual counts of ormats, and up to k = 10 the trend is clearly upward. But continuing this inquiry to larger values of k will depend on a positive answer to the next question.

Question 4. Given a square matrix with (0,1) entries, is there an efficient algorithm for deciding whether or not it is an OR-sum of permutation matrices, and thus an admissible set of rank-intervals?

The question asks for a recognition predicate—a procedure that will return true if a matrix is an ormat and otherwise false. If efficiency doesn’t matter, there’s no question such an algorithm exists. At worst, we can generate all the k × k ormats and see if a given matrix is among them. But that’s like saying we can factor integers by producing a complete multiplication table. It just won’t do in practice. Isn’t there a quick and easy shortcut, some distinctive property of ormats that will let us recognize them at a glance?

If we could replace the OR-sum with the ordinary arithmetic sum, the answer would be yes. Permutation matrices have the handy property that all rows and columns sum to 1. An arithmetic sum of r permutation matrices has rows and columns that all sum to r. (It is a semi-magic square.) The converse is also true (though harder to prove): If a matrix of nonnegative integers has rows and columns that all sum to r, it is a sum of r permutation matrices. This fact yields a simple test: Sum the rows and the columns and check for equality.

Unfortunately, the trick won’t work for ormats, because the boolean OR operation throws away even more information than summing does. Because 0 ∨ 1 = 1 ∨ 0 = 1 ∨ 1, infinitely many sets of operands map into the same result, and there’s no obvious way to recover the operands or even to determine how many permutation matrices entered into the OR-sum.

Maybe there’s some other clever trick for recognizing ormats, but I haven’t found it. Let me make the question more concrete. Below are three (0,1) square matrices. Two of them are ormats but the third is not. Can you tell the difference?

three-puzzle-matrices.png

If it’s so hard to recognize an ormat, how did I count the ormats among a bunch of randomly generated (o,1) matrices? By hard work: I reconstructed the set of permutations allowed by each matrix. Visualize a permutation as a path threading its way through the matrix from left to right, connecting only non-zero elements and touching each column and each row just once. When you have drawn all possible permutation paths, check to see if every non-zero element is included in at least one path; if so, then the matrix is an ormat. Note that this is not an efficient recognition procedure. In the worst case (namely, an all-ones matrix), there are k! permutations, so this method has exponential running time. But k! is better than \(2^{k^2}\); and, besides, for sparse matrices the number of permutations is much smaller than k!. The 10 × 10 matrix presented as an example at the start of this post gives rise to 580 permutations, a manageable number. Here’s what they look like, plotted as a spider web of red paths across the bar chart.

ranges-with-paths

Every nonzero site is visited by at least one permutation path, so this set of rank-intervals is indeed valid.

This process of lacing permutations through a matrix finally brings me back to Question 1, about how to make sense of the NRC’s fuzzy ranking scheme. Let’s take a small example:

probability-example-1.png

Examining the graph above shows that A must rank either first or second—but which is more likely? In the absence of more-detailed information, it seems reasonable to assume the two cases are equally likely; we assign them each a probability of 1/2. Similarly, B has the rank-interval {1–3}, and so we might suppose that each of these three cases has probability 1/3. Continuing in the same way, we assign probabilities to every element of the matrix.

probability-example-2.png

But wait! This can’t be right; our probabilities have sprung a leak. Any proper set of probabilities has to sum to 1. Our procedure assures that each column obeys this rule, but there is no such guarantee for the rows. In row 1, we’re missing one-sixth of our probability, and in row 2 we have an excess of 1/2; row 4 comes up short by 1/3.

Is there any self-consistent assignment of probabilities for the elements of this matrix? Sure. As a matter of fact, there are infinitely many such assignments, including this one:

probability-example-3.png

I’ll return in a moment to the question of how I plucked those particular numbers out of the air, but note first what they imply about the ranking of items A through D. For item A, with the rank-interval {1–2}, the odds are two-to-one that it ranks first rather than second. B has the behavior we expected from the outset, with probability uniformly distributed over the three cases. But if you pick either C or D, each with the rank-interval {2–4}, your chance of getting second place is only 1/6, and half the time you’ll be in last place.

Where do these numbers come from? Instead of starting with the assumption that probability is uniformly distributed over each rank-interval, assume that each possible permutation of the ranks is equiprobable. For this matrix there are six allowed permutations: (1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 2, 4), (1, 3, 4, 2), (2, 1, 3, 4) and (2, 1, 4, 3). Observe that four of the six ordering put A first, and only two permutations place A second. We can also tally up such “occupation numbers” for all the other matrix elements:

probability-example-4.png

Dividing these numbers by the total number of permutations, 6, yields the probabilities given above.

We can do the same computation for the 10 × 10 example matrix, which turns out to allow 580 permutations:

ranges-with-path-weights.png

If you care to check, you’ll find that each column and each row sums to 580; dividing all the entries by this number yields a probability matrix with columns and rows that sum to 1 (also known as a doubly stochastic matrix).

This process of tabulating permutation paths recovers some of the information we would have gotten from the arithmetic sum of the permutation matrices—information that was lost in the OR-ing operation. But we get back only some of the information because we have to assume that each permutation included in the OR-sum appears only once. (This is just another way of saying that the allowed permutations are equiprobable.) There’s no particularly good reason to make this assumption, but at least it leads to a feasible probability matrix.

Is there any way of calculating the entries in the doubly stochastic matrix without explicitly tracing out all the permutation paths? I’m sure there is. I think the construction of the matrix can be approached as an integer-programming problem, and perhaps through other kinds of optimization technology. What seems less likely is that there’s some simple and efficient shortcut algorithm. But I could be wrong about that; there’s a lot of mathematics connected with this subject that I don’t understand well enough to write about (e.g., the Birkhoff polytope). I hope others will fill in the gaps.

Getting back to the assessment of grad schools—have we finally found the right way to understand those rank-intervals that the NRC promises to publish any day now? My sense is that a semi-magic square (or, equivalently, a doubly stochastic matrix) will give a less-misleading impression than a simple eyeball sorting on the spans or midpoints of the rank-intervals. But what a lot of bother to get to that point! How many prospective grad students are going to repeat this analysis?

Acknowledgment: Thanks to Geoff Davis of PhDs.org for introducing me to this story. PhDs.org will have the new ratings as soon as the NRC releases them, and may even find a way to make them intelligible! Disclaimer: I’ve done paid work for the PhDs.org web site (but this is not a paid endorsement).

Update 2010-07-27: If you’ve gotten this far, please read the comments as well. A number of commenters have provided important insights and context, which have helped me understand what’s going on in the matrices I’ve been calling ormats. But I’m still a bit murky about the best way to recognize and count them. I’m not sure that publishing my still-murky thoughts is terribly helpful, but maybe someone else will read what follows and give us a dazzling, gemlike synthesis.

For the ormat-recognition problem (Question 4 above), three basic approaches have been mentioned: enumerating the permutation paths through the matrix, examining matrix minors, and looking for perfect matchings in a bipartite graph defined by the matrix. It seems to me that all of these methods are doing the same thing.

Start with Barry Cipra’s method of minors. The basic operation is to choose a nonzero matrix element, then delete the row and the column in which that element occurs. You then apply the same operation to the remaining, smaller matrix.

In tracing permutation paths, we’re looking for sequences of nonzero elements, drawing one element from each column and each row. A way of organizing this search is to choose a nonzero element and then, after recording its location, delete the corresponding column and row, so that no other elements can be chosen from that column or row.

In the method based on Hall’s theorem, as explained by John R., we view the ormat as the adjacency matrix of a bipartite graph, where every nonzero element designates an edge connecting a row vertex to a column vertex. To find a matching, we delete an edge, along with the two vertices it connects (and also all the other edges incident on those vertices). Then we recurse on the smaller remaining graph. (See further update below.) If you translate this operation on the graph back into the language of matrices, deleting an edge and its endpoints amounts to deleting a row and a column of the adjacency matrix.

I am not asserting that these three algorithms are all identical, but they all rely on the same underlying operation. To say more, we would need to consider the control structure of the algorithms—how the basic operations are organized, how the recursion works, all the details of the bookkeeping. I don’t trust myself to make those comparisons without trying to implement the three methods, which I have not yet done. However, at this point I just don’t see how any method can guarantee correct results without something resembling backtracking (or else exhaustive search through an exponential space). After all, we’re not looking for just one matching in the graph, or one decomposition into matrix minors, or one permutation path; we have to examine them all.

Here’s a further hand-wavy argument for the essential difficulty of the task. For a (0,1) matrix, the number of permutation paths that avoid all zero entries is equal to the permanent of the matrix. Computing the permanent of such a matrix is known to be #P-complete.

Update 2010-07-31: With lots of help from my friends, I think I finally get it. Although there could be as many as k! permutation paths in a k × k matrix, you don’t need to examine all of the paths to decide whether or not the matrix is an ormat. It’s enough to establish that one such path passes through each nonzero element. This is what the algorithm based on Hall’s theorem does. As Frans points out in a comment below, I misunderstood the essential nature of that algorithm (in spite of having it explained to me several times). There is no recursive deconstruction into progressively smaller matrix minors; instead, we just loop over all the nonzero elements of the matrix, find the minor associated with each such element, then check for a perfect matching in the minor. (Still more refinements are possible—but already we have a polynomial algorithm.)

With this efficient recognizer predicate, it’s easy to measure the proportion of ormats in random matrices at larger values of k:

fraction-of-ormats-k25.png

As expected, the fraction of ormats approaches 1 beyond about k = 20.

So much for identifying ormats. I am still unable to extend the series of exact counts beyond k = 5. The tabulations for random (0,1) matrices suggest that for k = 6 there should be about 20 billion ormats, and counting that high is just too painful. I need to work out the symmetries of the problem.

As far as I can tell, assigning exact probabilities to the nonzero matrix elements requires a full enumeration of all the permutation paths, and thus a calculation equivalent to the permanent. There may be a useful approximation.

Barry Cipra asks a really good question: The permanent tells us the maximum number of permutations that could possibly be included in a given ormat, but what is the minimum number? A naive upper bound is the number of 1s in the matrix, but I don’t see an easy path to an exact count. But enough for now.

Disentangling Gaussians

Thursday, June 10th, 2010

The printed program for the recent STOC meeting in Cambridge announced the following talk:

Talk by Kalai, Moitra, Valiant and Erdos on printed program, STOC 2010

Unfortunately, the fourth author could not be present (and he is not listed as an author on the paper itself), so the talk was given by Gregory Valiant.

Here is a motivating example. Take a tape measure, and go record the heights of a few thousand adults chosen at random. You’ll come back with a distribution that looks something like this:

combined-distribution.png

How come the curve is so lumpy and lopsided? Isn’t height a variable with an approximately normal distribution? Of course it is. The problem is that we have mixed up two subpopulations—men and women—with different height distributions:

three-distributions.png

Question: Given the lumpy combined distribution, and the knowledge that it represents a mixture of exactly two normal distributions, can we somehow recover the component distributions? Specifically, can we determine the means and standard deviations and the Gaussian curves, as well as their relative weights, or contributions to the total?

The answer is yes. In a series of papers published in the 1950s and 60s, Henry Teicher (then at Purdue, now emeritus at Rutgers) proved a kind of unique-factorization theorem for distributions. He showed that no mixed distribution can be decomposed into two or more different sets of normal distributions, just as no composite whole number can be formed as the product of two or more distinct sets of primes. (Teicher’s theorem generalizes to many distributions other than normal ones. There are also a few caveats; for example, none of the component distributions can be pointlike, with a standard deviation of zero.) Teicher’s work implies that a mixed distribution is identifiable: If you can break it down into a set of primitive distributions, then that decomposition is unique.

If STOC were a mathematics meeting, that might be the end of the story. But STOC is the Symposium on the Theory of Computing, and the question here is not “Does a solution exist?” but rather “Can you solve it in polynomial time?” Teicher’s result offers no such guarantee. It turns out that his proof of identifiability depends on the behavior of the distributions far out in the tails. Measuring data frequencies in these sparsely populated outlying regions requires an exponentially large number of samples. But other approaches to the problem don’t run into this snag; Valiant and his colleagues show that “robust polynomial identifiability” is indeed possible.

A key idea in their proof goes back at least as far as the 1890s, to work done by the indefatigable statisticians W. F. R. Weldon and Karl Pearson. In 1892 Weldon spent a summer on the Bay of Naples, measuring various features on the carapaces of crabs. One such feature yielded a distinctively lumpy distribution much like the height curve shown above. Weldon thought that the asymmetry might signal the incipient splitting of the crab population into two races or species, each of which if taken individually would have a normal distribution. For help with the analysis of his data he turned to Pearson, who was able to identify two component Gaussian curves that sum up to the observed distribution. (The figure comes from Weldon’s 1893 paper; see below for references.)

Figure 3 distribution from Weldon 1893.

Pearson’s method was based on calculating the first six statistical moments of the given distribution. (The nth moment of a distribution is the expectation value of \((x-\bar{x})^n\), where \(x\) is a random value drawn from the distribution and \(\bar{x}\) is the mean value.) The process called for solving a ninth-degree polynomial, which was a heroic feat in 1893.

Valiant et al. show that a procedure like Pearson’s will always succeed in the following sense: If two mixed distributions can be distinguished at all, then they can be distinguished by examining their first six moments, and those moments characterize the component Gaussians. Calculating the moments requires only a polynomial number of samples, and the running time of the overall algorithm is a polynomial function of various parameters such as the required accuracy. Moreover, the process can be extended from one-dimensional distributions to multidimensional Gaussians.

Although the algorithm has polynomial running time, that’s not a guarantee of practicality. (After all, this is a theory conference.) One stage of the process is essentially a brute-force search through a very large (though polynomially bounded) space of parameter values for the six moments.

Sources:

The STOC proceedings with the Kalai-Moitra-Valiant paper are now available online, but only by subscription. A PDF preprint is posted on Valiant’s web site; also an expanded version.

For Henry Teicher’s work see: “Identifiability of Mixtures,” The Annals of Mathematical Statistics 32(1):244–248 (1961) and “On the Mixture of Distributions,”  The Annals of Mathematical Statistics 31(1):55–73 (1960).

For Weldon and Pearson see: W. F. R. Weldon: “On Certain Correlated Variations in Carcinus maenas,“ Proceedings of the Royal Society of London 54:318–329 (1893) and Karl Pearson: “Contributions to the Mathematical Theory of Evolution,” Philosophical Transactions of the Royal Society of London A 185:71- 110 (1894).

My thanks to Virginia Gold and Irene Frawley at ACM, Lance Fortnow, Paul Oka, Jennifer Chayes and Christian Borgs, all of whom helped make it possible for me to attend some of the STOC sessions.

Which Steve invented the iPad?

Tuesday, March 30th, 2010

UIUC-tablet-winners.png

Maybe Wolfram? Or Omohundro and Skiena?

Back in 1988, a contest sponsored by a major computer company asked for visions of a future personal computer. The winning team came from the University of Illinois at Urbana-Champaign. The faculty advisers were Stephen Wolfram and Stephen Omohundro (at far left in the photo above); the student members (from left to right) were Arch Robison, Steven Skiena, Bartlett Mel, Luke Young and Kurt Thearling. Their proposal of a device called the TABLET had eerie anticipations of the contraption that goes on sale this weekend.

Some quotes from the description of the TABLET:

This rectangular slab will weigh but a few pounds, and have no buttons or knobs to play with. The front surface will be a touch-sensitive display screen and will blink to life upon touching two corners.

The front surface of our computer is a high-resolution touchscreen that yields slightly to the touch. With this single input device, we can get a tremendous range of flexibility and options. We can use it to create an entirely soft interface.

A high-resolution color display can do more than just imitate a notebook page. It will be fast enough to support video…. It takes only a little more courage to predict a Global Positioning System (GPS) receiver on our machine, either as a clip-on or a built-in component.

If we can take our computer anywhere, we need to be able to use it anywhere. This brings us to communications capabilities. Through our national telephone network, we can access any person or machine within seconds.

It seems the UIUC prophets even saw Facebook coming over the horizon:

In addition to communicating with peripherals… we can also talk to other computers. Each machine can continually broadcast personal facts that users may wish the world to know: perhaps their name, image, interests, and marital status for openers.

And YouTube:

One interesting problem is who will appreciate all this new art? Some form of “shareware video” might arise.

Of course they didn’t get everything right. There’s the matter of the stylus and handwriting recognition (suggesting a foreknowledge of the Newton rather than the iPad). And it was all supposed to happen in the year 2000, so we’re a decade behind schedule.

Which computer company sponsored this contest? It was Apple, of course. The judges were Ray Bradbury, Alan Kay, Diane Ravitch, Alvin Toffler and Stephen Wozniak. (Steve Jobs was in exile at the time.) All of the prizes were paid in Macintosh merchandise.

tablet-teardown.pngThe story of the 1988 predecessor of the iPad is told in Communications of the ACM, Vol. 31, No. 6, pp. 639–646. Included there is the first teardown photograph of an iPad (right). Note the wafer-scale integration. For those who can’t get past the ACM pay-wall, a slightly different telling of the story appears on Kurt Thearling’s web site. There’s also a brief account from the UIUC computer science department.

[Just to be clear: I am aware that there's a considerable difference between imagining an innovative device and actually building it.]

Home-baked graphics

Tuesday, March 9th, 2010

A couple of commenters have asked what software package I use to create the graphs that appear in bit-player posts–illustrations like the one below, which is a slightly improved version of something I posted last week. Let’s call it Figure 1.

rms-graph2-revised.png

Prompted by these inquiries, I immodestly ask myself: Why do my graphs look so darn good? I immodestly answer: It’s not because of any packaged software! I don’t need a cake mix, or even a recipe. These are home-baked graphs, made from scratch out of locally grown organic pixels.

I have strong opinions about the aesthetics of scientific illustrations, and I could certainly spout off about the design elements of Figure 1, such as that putty-colored background, just dark enough to allow drop-out white grid lines, yet neutral enough to avoid competing with the data curves, which also have a distinctive color scheme on which I could discourse at length. Yes, I can talk the Tufte talk. But I think the commenters were really asking how I create the graphs rather than why they’re so elegant, and so I’m going to focus here on the practical programming problem.

Most of my experience in drawing pictures with a computer comes from the world of print publishing, where the final product is ink on paper rather than pixels on a screen. Compared with the online environment, print has some advantages, notably higher resolution (up to 1,000 dots per centimeter) and precise control over typography and color. But print also has obvious limitations: On a magazine page, there are no mouseovers or clickable buttons, and you can’t make a square knot twirl in 3D.

Thirty years ago, the big challenge for computer-generated illustrations was not how to draw the picture but how to get it out of the computer and onto the printing press. You couldn’t just export a PDF and place it in a Quark or InDesign document; none of those things existed. The only practical option was to print out the artwork, photograph it, and “strip” the negative into the page-size film that would be used to make the press plate. Because of this emphasis on printouts, most of the effort went into programming the printer rather than the computer.

The figure below is the first published computer-generated illustration I had a hand in creating. It appeared in Scientific American in 1983.

epson-freq-table.png

The array of 282 tiny bar graphs was produced with an Epson MX-80 dot-matrix printer, using escape codes to fire combinations of the eight pins in the printhead. Of course the MX-80 was a black-and-white device. The two-color illustration was created from two separate printouts. Also, the Epson letterforms were replaced with typeset characters.

The world of computer-generated illustrations changed dramatically with the arrival of PostScript, the “page description language” created by John Warnock and his colleagues at Adobe Systems (based in part on earlier work at Evans and Sutherland and Xerox PARC). PostScript was designed as a complete programming language rather than just a file format or a set of drawing commands. And something else set it apart as well: attention to details of graphic design. With most earlier software (such as programs based on the Apple Quickdraw library), trying to create publishable figures was an exercise in frustration. For example, the apparent weight of a line would vary depending on its orientation: lighter when vertical or horizontal, heavier when diagonal. PostScript allows very precise control over such niceties of presentation. To take another example, where lines meet the edge of a graph, you don’t want to have to choose between falling short and overshooting; PostScript provides the tools needed to make it look right.

edge-effects.png

(The version in the rightmost panel is created by allowing the colored lines to extend outside the background box, and then applying a clipping mask that cuts off all objects at the boundary of the box.)

Obsessing over minute details like these may seem comically fussy, but I believe that neatness counts in these matters. To some extent, illustration is an art of illusion. Graphs and diagrams work best when you can look through them rather than at them. The viewer should be seeing the underlying information or abstraction–the array of correlation coefficients, the function y = f(x), or whatever–rather than noticing the mechanics of how the drawing was constructed. A ragged edge is the kind of distraction that destroys the illusion.

Although PostScript was a giant step forward from the MX-80 command set, in the early years it was still just another printer language, not a computer language. The only way I could execute a PostScript program was to send it to a laser printer and wait to see what came out. Sometimes it was a long wait. I had no way of running a PostScript program on the computer itself. (Ghostscript came later.)

ChernoffFaces.pngMy first PostScript illustrations were created as hand-written PostScript programs; the same language was used both for doing the computations and for presenting the results. The faces at right were created in this way. (They were inspired by the work of Herman Chernoff and drawn to illustrate an American Scientist article by Robert Levine in 1990.) The dual role of the language caused me a moment of disorientation just now when I went looking for my records of this project. I found an EPS (encapsulated PostScript) file, which I knew was the finished illustration, but where was the source code? And then I remembered: It’s the same file! Open it up in Ghostscript or Adobe Illustrator and you see those silly faces smiling or scowling at you; open the same file in a text editor, and you see procedures for drawing elements of the faces:

   /draweyes
     { newpath
       dx dy eyewidth eyeheight 0 360 ellipse stroke
       ex ey eyewidth eyeheight 0 360 ellipse stroke
     } bind def
   /drawpupils
     { fx fy pupilsize pupilsize 0 360 ellipse fill
       gx gy pupilsize pupilsize 0 360 ellipse fill
     } bind def

Bill Casselman, the graphics editor of the Notices of the American Mathematical Society, still favors this direct-to-PostScript methodology. He has written an excellent guidebook, taking you from the basics of PostScript through an elaborate library for rendering three-dimensional objects.

But here I part company from Casselman; I’d rather not do all my computing in PostScript. It’s not that I have anything against the language itself, but the development environment is not to my taste. I therefore adopted the modus operandi of writing a program in my language of choice (usually some flavor of Lisp) and having that program write a PostScript program as its output. After doing this on an ad hoc basis a few times, it became clear that I should abstract out all the graphics-generating routines into a separate module. The result was a program I named lips (for Lisp-to-PostScript).

Most of what lips does is trivial syntactic translation, converting the parenthesized prefix notation of Lisp to the bracketless postfix of PostScript. Thus when I write (lineto x y) in Lisp, it comes out x y lineto in PostScript. The lips routines also take care of chores such as opening and closing files and writing the header and trailer lines required of a well-formed PostScript program.

But the lips interface is low-level, confined to drawing individual dots, line segments, rectangles and the like. Assembling a complete graph out of these primitives is tedious. For example, the grid of white lines in Figure 1 would have to be drawn one line at a time, with each line specified by a sequence of commands such as

    (newpath)
    (moveto u v)
    (lineto x y)
    (stroke)

Before you can issue those commands, you have to calculate u, v, x and y. Clearly, a higher-level front end is needed; like everyone else, I call mine plot.

At the core of any plotting program is a simple operation: mapping points from an abstract user space to coordinates in a rectangular pane, the page space. In Figure 1, the y axis runs from 0 to 5000; values in this range have to be scaled to the dimensions of the graph, which is about 300 PostScript points, or 11 centimeters. Mathematically, the transformation is straightforward. Indeed, if I wished I could leave all the arithmetic to the PostScript interpreter, simply passing in the appropriate matrix elements for scaling and translation. This is an attractive option; it would allow plot to work entirely in user space. But a few niggling details get in the way. Consider the tick marks along the y axis in Figure 1. Their vertical positions are conveniently expressed in user coordinates: one tick every 500 units. But what about the length of the ticks–their horizontal extent? This dimension is purely concerned with the appearance of the graph and has nothing to do with the content; it ought to be expressed in unscaled units of points or pixels.

Here’s a possible solution: Let everything inside the rectangular frame of the graph–the area with the putty-colored background in Figure 1–go through the scaling engine, but define everything outside the frame, including the tick marks and the axis labels, directly in page coordinates. If you think this is the final answer, take a look at Figure 2:

figure2.png

In this nonsensical graph (constructed just for this occasion), data points are indicated by stars, crosses and diamonds. The positions of those glyphs ought to be defined in user space, but the drawing commands that create the shapes are properly defined in page coordinates. If we tried to draw the glyphs in user space, their size and shape would vary with position in the graph.

What’s the best way to deal with this messy situation? Is there some tidy solution that will reconcile the two coordinate systems and allow all dimensions to be treated uniformly? I don’t believe so; it’s just in the nature of graphs to mix up elements from these two disparate realms. We look through a window into a world of data or mathematical abstractions, but we also draw our own little doodles on the window itself.

Of course there are solutions; they’re just not as pretty as I would like. My own strategy for coping is to attach extra information to each geometric point, indicating whether or not the x and y coordinates are to go through the scaling transformation. This is less troublesome than it might seem; from the user’s point of view, it’s almost always invisible.

In writing the lips and plot programs, I walk a path that is already worn smooth by many earlier footsteps. I don’t know who wrote the first computer program for plotting data, but it probably came soon after the first program for producing data. Today we have hundreds of clever, comprehensive, well-designed and well-maintained programs for plotting and graphing. Gnuplot is very capable; Grace is one I’ve never used but I’ve heard good things about it; Mathematica, Sage, R, MATLAB, Octave and the like all have elaborate graphics facilities built in; the Python world, as usual, has an overabundance of options; there are a few libraries for my beloved Lisp; you can even do dataviz online.

All of which raises the question of why I bother to roll my own. I’ll never keep up–or even catch up–with the efforts of major software companies or the huge community of open-source developers. In my own program, if I want something new–treemaps? vector fields? the third dimension?–nobody is going to code it for me. And, conversely, anything useful I might come up with will never benefit anyone but me.

The trouble is, every time I try working with an external graphics package, I run into a terrible impedance mismatch that gives me a headache. Getting what I want out of other people’s code turns out to be more work than writing my own. No doubt this reveals a character flaw: Does not play well with others.

In any case, the time for change is coming. My way of working is woefully out of date and out of fashion. PostScript is a technology that even Adobe seems to regard as outmoded. And making ultraprecise PostScript graphs is quite silly when their destination is the web; before I can put them online, I have to convert them to low-res PNG images. Furthermore, a PostScript-based workflow loses out on all the interactive richness of the web. These are deathly still images. How can I expect to earn any web cred when my work is not even clickable, much less multitouch-enabled?

If I continue in my stubborn, do-it-yourself mode, I could replace the PostScript back end with one that generates SVG. This wouldn’t be a major undertaking. But is SVG the right answer? It’s been around for more than a decade and you still don’t see much of it in the wild. And there are horrid browser incompatibilities. I suspect that Javascript (and JQuery) has a brighter future. And if I can get over my abreaction to libraries, there are plenty of options. Advice anyone?

Update 2010-03-13: Many thanks for all the thoughtful comments. Herewith a few comments on comments:

Ron Renaud’s graphs using Javascript in a canvas element are really very pretty, and they give me renewed hope that web graphics can measure up to a print standard. But is the world quite ready for the canvas? This is a blog, after all. Lots of people get to it with an RSS reader, not a web browser.

Zvika requests a link to a higher-resolution PNG. I don’t know how to do that. I can make a larger PNG, but the resolution–the dots per centimeter–is really determined by the screen you’re looking at. Which is not to say that larger illustrations wouldn’t be a good idea. When I redesign these pages, I want to allow more room for bigger pictures.

Gary Reuben suggests Python Matplotlib and the ggplot2 package for R. The latter is new to me, and very impressive. I want to go read more about it.

Several other readers favor SVG. I’m okay with that. It looks easy to change my current software to generate SVG output instead of (or in addition to) PostScript. The question remaining for me is whether SVG on the web is something that browsers (and, again, RSS readers) can swallow without choking.

John Haugland mentions PDF as the successor to PostScript. I couldn’t possibly survive without PDF these days, but I don’t see it as an ideal medium for illustrations embedded in web pages. Reading PDFs within a browser requires a plugin, which some people refuse to install. (I’m one of those people.) Furthermore, because PDF is a binary format based on a directory of offsets to tables, it’s more trouble to write PDF files than either PostScript of SVG.

Nate mentions Processing, the graphics and animation language created by Casey Reas and Ben Fry. I’ve written about this before at bit-player and elsewhere. I’m a big fan, but Processing is essentially a front end to Java, and I have reservations about embedding Java applets in web pages. John Ressig’s reimplementation of the language in Javascript overcomes that problem, and one of these days I’ll get around to doing something serious with it.

Finally, Marc asks for a look at my Lisp code. I’m always shy about sharing such unfinished things, but here it is.

Update 2010-03-28: When commenter Zvika asked for higher-resolution graphics, my reply was unhelpful; let me now try to say something more useful.

Even though I can’t do anything to improve screen resolution, I can, as Zvika notes, provide a larger version of each image. The Wikipedia way of doing this, which Zvika mentions, is to link to a separate HTML page, which replaces everything on the page you’re currently reading. As a test, I’ve done this with Figure 1 above; click on that figure and you’ll be whisked away to a supersized version of the graph. To get back here, you’ll have to use your browser’s “back” button.

Another approach, adopted by, for example, the New York Times, is a pop-up window. I’ve done this with Figure 2 above; click on it and a new window with a bigger version will open in front of this page.

Personally, I dislike both of these strategies. I want to see the words and the pictures at the same time in the same context. I suppose I could implement (or swipe) some glitzy Ajaxian solution that expands the artwork box within the window. But I still feel the right solution is to redesign the pages so that there’s room for larger illustrations in the first place. (Admittedly I’ve been meaning to do that for more than a year.)

There’s also a question about how best to make use of a more spacious picture box. In the supersized Figure 1, the transformation is much like an optical enlargement, projecting the same information at larger scale. The overall dimensions are doubled, and so is the width of each line, the size of the type, and so on. This is probably the answer for those troubled by visual deficiencies (including my own presbymyopic eyes).

In Figure 2 I have enlarged the illustration by 150 percent but kept all the individual graphic elements (line weights, labels, the glyphs that mark data points) the same size. This kind of rescaling allows information to be read from the graph with greater precision.

Perhaps a middle way is sensible here. Indeed, maybe type should scale as the square root of graph size?

Herbert R. J. Grosch, 1918-2010

Friday, February 26th, 2010

Today I learned of the death of Herb Grosch, proud provocateur and mischief-maker of the computing industry. Anybody who ever knew Herb, however slightly or briefly, has a story to tell, so here’s mine.

Grosch held senior positions at major household-name companies (IBM, General Electric) as well as in the U.S. government; he also spent time at MIT and Columbia and was editor of Computerworld. But through it all he cultivated the role of the outsider, or even the outcast; he saw himself as a lone wolf; at IBM they labeled him a “wild duck.” He boasted that he was the second scientist ever hired at IBM (after Wallace Eckert) and, more important, was the first employee with facial hair. Even when he was an insider he was an outsider. Grosch was elected president of the Association for Computing Machinery–but as a dissident candidate, opposing the slate annointed by the nominating committee.

The free-spirited maverick is a celebrated figure in American culture, but not always a well-rewarded one. Wild ducks don’t get tenure, or a pension.

My brief encounter with Herb came ten years ago, when I was working on an article (PDF) about the uses of ternary notation in computing. I read somewhere that Grosch had proposed a ternary architecture for the Whirlwind, Jay Forrester’s enormous early electronic computer. The idea of a base-three machine probably seems weirder today than it did in the early 1950s, but all the same the proposal was not adopted, and most histories of the Whirlwind project say little or nothing about the ternary option. I had no luck finding a copy of Grosch’s memo on the subject, so I tried getting in touch with him directly. With the help of friends I tracked him down, though not in the first place I looked. He was in Riga, Latvia.

Grosch couldn’t supply a copy of the memo; most of his papers, he said, were buried in a landfill in Switzerland. The best he could do was point me to a passage in his autobiography, summarizing a conversation with Forrester:

I said what might be genuinely gainful would be to store a ternary digit in each core, and calculate in base-three rather than binary fashion. There were materials–some kinds of permalloy, as I remember–that had north, south and neutral stable magnetic states. I told him I had taught my Poughkeepsie evening classes at IBM about a special kind of base-three arithmetic I called “signed ternary,” in which zero was in the middle of the number range. In this curious system there was no need for algebraic signs, no problem about the sign of zero, and you rounded perfectly by dropping digits.

Jay being a stiff type, I refrained from calling the ternary digits “tits,” a name which had been the source of much boyish amusement in the Poughkeepsie classes.

By the way, I’m still looking for a copy of that memo. Here’s the reference: Grosch, H. J. R. 1952. Signed ternary arithmetic. Memorandum M-1496, Digital Computer Laboratory, MIT.

Herb Grosch in Barcelona, circa 2001

Of course I had to ask Herb how and why he’d found his way to Latvia. It was a long story, involving a small NSF grant, archives of Datamation magazine, a collection of Soviet-era computer hardware in an attic at Latvia University, and a romance that Herb hoped would develop into his fifth marriage. But the grant ran out, the marriage plans faltered, and Herb was heading back to U.S. in dire need of employment. His hopes, he said, were “not much above fast-foodery: editing, or tech writing, or even clerking at Barnes and Noble.” He was 82 at the time.

In the end, he did not have to stoop quite so far as editing or tech writing, or Barnes and Noble; he became Adjunct Distinguished Professor of Computer Science at the University of Nevada Las Vegas. But that posting didn’t last long. A few years later he turned up in Toronto, teaching the history of computing, but by then I’d lost touch with him.

In a “Dear Everybody” message some years ago, Grosch wrote:

I begin this letter on the fourth of six recent Binary Days, 01.11.01…. The next Binary Day will be nine years from now, 01.01.10, and I will write again then if I survive. Shorter recipient list, I assume!

Herb did make it until 01.01.10. He died January 25th. For more on his life and works, see the ACM obituary and a web site at Columbia maintained by Frank da Cruz.

Gruenberger’s prime path

Tuesday, February 16th, 2010

Fred Gruenberger may well have been the first blogger on computational topics. When he was writing, back in the 1970s, there was no RSS, and so he distributed his musings in a monthly newsletter called Popular Computing. A typical issue was 16 or 20 typewritten pages–stapled, folded, stamped and delivered by mail. It was always worth reading.

Gruenberger had been working and playing with computers since the 1940s. For a long stretch he was at the RAND Corporation, the famous think tank in Santa Monica. Later he taught at Cal State Northridge. In addition to Popular Computing he was involved in the startup of Datamation magazine and published at least a dozen books. I haven’t been able to learn much about his later years; he died in 1998.

A slogan that appeared in some issues of Popular Computing proclaimed: “The way to learn computing is to compute.” I took this advice to heart, although I was hampered by a total lack of hardware. Later on I acquired a programmable calculator, which helped on some of the problems and exercises.

Problem 149, from Popular Computing Vol. 4 No. 12, December 1976

The problem reproduced above appeared in the December 1976 issue of Popular Computing (Vol. 4, No. 12). At the time, I made no attempt to work this one out, but evidently the problem seemed interesting enough to be worth filing away. When I came upon the old clipping recently, I gave it a closer look and realized I have no idea how to answer Gruenberger’s question, though the impediment now is not lack of hardware.

Gruenberger asks us to trace a planar path whose steps are indexed by the odd integers starting at 3. For each number N we turn right 90 degrees before taking a step if N is a prime congruent to 1 mod 6; we turn left 90 degrees before moving one unit if N is a prime congruent to −1 mod 6; otherwise we continue straight ahead in whatever direction we happen to be facing.

In his typewriter graphics, Gruenberger plotted the trajectory from N=3 through 97. Below I continue the path through N=199.

trail201b.png

But something’s amiss here. Gruenberger wrote:

Eventually the path will cross itself, so that the cell containing 111 will also contain 147. Similarly, one cell will contain both 91 and 179.

Those two self-intersections are nowhere to be found in the diagram. When I first noticed this discrepancy, I assumed I must have made a mistake somewhere. (This eagerness to blame myself is not mere knee-jerk humility; I have years of experience to back it up.) Eventually, though, I concluded that it was Gruenberger who had made the wrong turn. I believe he mistakenly went left at 127, as shown in the brown trail below:

trail201b-error.png

The brown continuation of the red path includes the two coincidences mentioned in Gruenberger’s problem statement. But the left turn at N=127 is incorrect, because 127 is a prime equal to (6×21)+1, and thus it should specify a right turn. The error is of no great consequence, but it does reveal something interesting: Gruenberger must have been plotting these paths by hand. Most likely he wrote a program to compute the series of residue classes, then traced out the trajectory on squared paper.

Setting aside this anomaly, Gruenberger was quite right that the path does intersect itself. Here’s the trail continued through N=1,001:

trail1001b.png

And if that’s not tangled enough, here’s what it looks like at N=10,001:

trail10001c.png

Gruenberger asks for “a list of the contents of those cells containing more than one number, arranged in the order of the smallest number in the cell.” It’s not hard to identify some cells that belong on such a list. The table below includes all multiply-occupied cells discovered when tracing the path up to N=1,001, sorted as Gruenberger requests:

                   x    y    values of N
                 -11   28    (137 337)
                 -15   27    (147 683)
                 -16   27    (149 349 685)
                 -18   26    (155 355)
                 -19   27    (159 691)
                 -19   28    (161 693)
                 -19   29    (163 695)
                 -17   31    (171 319)
                 -18   32    (175 315)
                 -19   32    (177 701)
                 -20   32    (179 703)
                 -22   31    (185 769)
                 -23   31    (187 771)
                 -24   31    (189 773)
                 -30   41    (245 269)
                 -30   42    (247 271)
                 -27   40    (281 733)
                 -26   40    (283 735)
                 -26   37    (289 725)
                 -23   35    (299 715)
                 -22   35    (301 761)
                 -21   35    (303 759)
                 -20   35    (305 757)
                 -17   27    (351 687)
                 -18   27    (353 689)
                 -17   24    (361 673)
                 -16   24    (363 675)
                 -15   24    (365 677)
                 -17   21    (379 667)
                 -17   22    (381 669)
                 -17   23    (383 671)
                 -20   22    (391 631)
                 -20   21    (393 633)
                 -20   20    (395 635)
                 -20   19    (397 637)
                 -22   19    (401 593)
                 -22   18    (403 591)
                 -22   17    (405 589)
                 -22   16    (407 587)
                 -27   15    (419 575)
                 -27   14    (421 573)
                 -28   14    (423 819)
                 -29   14    (425 549)
                 -32   14    (431 539)
                 -32   13    (433 537)
                 -26   10    (563 831)
                 -26   11    (565 829)
                 -27   13    (571 823)
                 -28   18    (607 811)
                 -22   32    (707 767)
                   4   -6    (923 971)
                   4   -7    (925 969)
                   4   -8    (927 967)
                   4   -9    (929 989)
                   5   -9    (931 991)

Is this list the answer to Gruenberger’s question? No, it’s not, because there’s no reason to stop at an arbitrary limit such as N=1,001. Indeed, the list above is not even a prefix of the complete answer. The smallest value of N appearing in the list is 137, but the trail will eventually revisit cells occupied by smaller values of N. For example, continuing the experiment to N=10,001 reveals a bunch of intersections quite close to the beginning of the path, including a site that’s visited five times:

                   x    y    values of N
                   1    0    (5 1621)
                   1    1    (7 1623)
                   2    1    (9 4725)
                   3    1    (11 1263)
                   3    2    (13 1265)
                   5    3    (19 1635)
                   6    3    (21 1637)
                   7    4    (25 7537)
                   7    5    (27 7319 7539)
                   7    6    (29 7505 7541)
                   6    6    (31 1643 7323 7503 7543)
                   6    7    (33 1645 7325)
                   6    8    (35 1647 7327)
                   6    9    (37 1649 7329)

One point still missing from this list is the origin–the site at x=0, y=0, N=3. Does the path ever revisit its starting point? If so, at what value (or values) of N does it come back home? Since I don’t know the answer to this question, I guess I’ll have to leave it as an exercise for the reader.

I suspect that the problem Gruenberger meant to pose (or thought he was posing) was to generate a list of self-intersection sites arranged in their natural order of occurrence–that is, the order in which the crossings are created when you construct the path starting from the origin. This natural-order list is not at all the same as a list “arranged in the order of the smallest number in the cell.” The natural-order list is easy to generate step by step. All you need to do is obey the left/right/straight rules, plot the resulting sequence of positions on the xy lattice, and leave behind a trail of breadcrumbs so you can check at each step to see if the site has been visited before. This task is a matter of straightforward computation–just the kind of assignment that Gruenberger favored. The natural-order list begins:

                   x    y    values of N
                 -30   41    (269 245)
                 -30   42    (271 247)
                 -18   32    (315 175)
                 -17   31    (319 171)
                 -11   28    (337 137)
                 -16   27    (349 149)
                 -18   26    (355 155)
                 -32   13    (537 433)
                 -32   14    (539 431)
                 -29   14    (549 425)
                 -27   14    (573 421)

Thus the prime path first crosses itself when N=269, a value that shares the same coordinates as N=245, namely x=−30, y=41. There are 56 such crossings up to N=1,001, and 112,988 self-intersections up to N=106.

*     *     *

There is a wilder, conjectural answer to Gruenberger’s challenge–which I’m pretty sure he did not have in mind. It goes like this: Maybe the complete list of revisited values of N is simply the list of all N. In other words, maybe the Gruenberger prime path fills up the entire lattice of integers, crossing over itself everywhere many times.

In 1921 George Pólya published a celebrated proof that a random walk on the lattice of integers is recurrent in one or two dimensions, though not in higher dimensions. Recurrent means that the walk returns to each point along its length with probability 1, and indeed visits every point in its domain infinitely often. Is it possible that the prime path is also recurrent?

Pólya’s theorem is one of those mind-expanding results that seem impossible on first acquaintance, and then inevitable, and finally just so amazing that you want to go kiss a mathematician. I have to confess that I’ve never gotten all the way through Pólya’s original paper (it’s not long, but it’s in German). On the other hand, I can highly recommend a little book by Peter Doyle and Laurie Snell, Random Walks and Electric Networks, which gives several alternative proofs of the theorem; it was published in the MAA’s Carus Monograph series, and there’s a postprint available on the arXiv.

The key insight underlying Pólya’s result, as I understand it, is this: If you never revisit a former home, then you must be spending eternity somewhere else, and you can do that only if your universe has enough somewhere elses that you’ll never run out of new territories to visit. Suppose that, some eons after starting your journey, you find yourself at distance r from the origin. If you’re living in a one-dimensional universe, then there are just two places you could be at that moment, namely at +r or −r. It doesn’t matter how far you run; there are still just two points at any given distance from the origin. In two dimensions, a fugitive at distance r has a little more room to maneuver; the number of available points grows in proportion to r, forming a circle of radius r. But this is still not enough room to get lost in. Only in three dimensions or more is there a nonzero probability of escape. In three dimensions, the space available at radius r is proportional to r2. In this three-dimensional world, the volume of empty space grows faster than a random walker’s expected distance from home.

What does all this have to do with Gruenberger’s prime path? Well, it’s no secret that the distribution of prime numbers looks convincingly random–if you look at it in just the right way. And in particular the distribution of primes in various residue classes, such as 6K+1 and 6K−1, seems to behave at least approximately like a random variable. All this suggests we might consider viewing the Gruenberger prime trail as if it were a random walk through the two-dimensional lattice of integers. Because the space is two-dimensional, it’s a good guess that the walk should be recurrent.

The original recurrence results of Pólya refer to a simple random walk, where at each step the walker chooses randomly among the available directions and then moves one unit in that direction. For example, in the two-dimensional lattice of integers there are four possible directions: north, south, east, west. The simple random walk is not the best model of the Gruenberger process, which is more like a nonreversing random walk–a path where on each step the walker can turn left or turn right or go straight ahead but can never make a 180-degree about-face. We can further refine the random-walk model of the Gruenberger process by biasing the choice made at each step to reflect the changing abundance of prime numbers. Primes grow scarcer as their magnitude increases; in the vicinity of a given value of N, the probability that a randomly chosen number is prime is approximately 1/log N. Since the Gruenberger path goes straight for all composite numbers and turns only when N is prime, the trail will have longer and longer straight segments, and rarer turns, as N increases. A random walk can mimic this behavior by choosing an action at each step according to this logic:

    if random(1.0) > 2/log(N)
       then go straight
       elseif randomboolean()
           then turn left
           else turn right

(The proportion of primes is given as 2/log(N) rather than 1/log(N) because the Gruenberger process is defined on odd numbers only, which immediately eliminates half of the composites.)

One way to compare the various kinds of random walks is to measure the root-mean-square displacement–the distance from the origin to the final position of the walker, averaged over many realizations of the random process. For a simple random walk, the RMS displacement for an N-step walk converges to \(\sqrt{N}\); for the nonreversing random walk the average displacement is \(\sqrt{2N}\). The biased random walk based on the distribution of primes also appears to yield an RMS distance proportional to the square root of the number of steps; numerically the curve looks something like \(\sqrt{10N}\). I’m not entirely sure that’s the true form of the curve, but the geometric details don’t really make much difference. If I understand correctly, all three of these random processes should be recurrent in the sense of Pólya.

rms-graph5.png

Does the same reasoning apply to the Gruenberger prime path? There are two sides to this question.

The naysayer points out that Pólya’s theorem applies to random walks, but there’s nothing truly random about the sequence of primes. After all, we have a straightforward, deterministic algorithm for generating primes, as well as an efficient algorithm for testing whether any given integer is prime or composite. The essence of a random process is that every time you run it you get a different result, but there’s only one sequence of prime numbers, and so the Gruenberger prime path will come out exactly the same every time. According to this view of things, the kind of probabilistic reasoning that goes into the proof of Pólya’s recurrence theorem is out of bounds here. For randomness to make any sense, you need to average over some ensemble of independent instances. For example, you could average over the 50 salmon-pink paths in the graph below, which represent 50 independent realizations of a biased random walk; you can’t average over the prime path itself (green), because there’s only that one path.

random-prime-paths-51.png

The yeasayer retorts that a single path is all you need–if the path is infinitely long. Indeed, the salmon-colored trails above could be interpreted not as 50 distinct runs of a random process but as 50 segments of a single long path, which repeatedly loops around through the point at x=0, y=0, wanders off in various directions, and then comes back home yet again. In essence, everything that could possibly happen in an infinite set of random paths happens somewhere within a single infinite path; all possible variety is already present there.

I’m not sure how to settle this dispute between Dr. Yea and Professor Nay. When an argument hinges on the nature of randomness, the meaning of infinity and patterns in the distribution of the primes, I known I’m in over my head.

So I’ll leave that deep question unresolved and say a final word about a lesser curiosity. In the Gruenberger process, we’re using the congruence classes of prime numbers mod 6 as a kind of coin flip to decide which way to turn. Is it a fair coin flip? For small values of N, it certainly doesn’t look fair:

                               6x+1      6x−1
       primes <     100         11        12
       primes <    1000         80        86
       primes <   10000        611       616
       primes <  100000       4784      4806
       primes < 1000000      39231     39265

There’s a persistent excess of −1 primes, and the imbalance seems to be getting steadily larger. As a result, the prime path has a “winding number” that reaches 8.5 at N=106; that is, the path makes eight and a half net counterclockwise revolutions. Does the windup continue with still larger N? I gather that the definitive answer is “Yes and No.” For more see the masterful paper by Andrew Granville and Greg Martin cited above.

[Correction 2010-02-19: reflected the accent on Pólya.]

17 x 17: A nonprogress report

Tuesday, December 22nd, 2009

The question again: Is there a four-coloring of the 17-by-17 grid in which none of the 18,496 rectangles have the same color at all four corners? As I said last time, Bill Gasarch would not have put a bounty on this problem if it had an easy solution. Over the past couple of weeks I’ve invested some 1014 CPU cycles in the search, and a few neural cycles too. I have nothing to show for the effort, except maybe a slightly clearer intuition about the nature of the problem.

If you generate a bunch of four-colored n-by-n grids at random, the average number of monochromatic rectangles per grid increases quite smoothly with n:

random-grid-stats.png

This gradual progression might lead you to suspect that the difficulty of finding or producing an n-by-n grid that is totally devoid of monochrome rectangles would also be a smooth function of n. The truth is quite different.

success-graph.png

Finding a solution is easy for square grids of any size up to 15 by 15. The task suddenly becomes very hard at size 16 by 16. As for 17 by 17, it’s much harder still–and indeed is not yet known not to be impossible. (Details on the data behind this graph: For each size class from n=8 to n=17 I started with 1,000 randomly four-colored n-by-n grids. Then I applied a simple heuristic search (the first of the algorithms listed below) to each grid, running the program for 1,000 × n2 steps. The graph records the number of times this procedure succeeded–i.e., produced a grid with no monochrome rectangles–at each grid size. Up to n=14, the search never failed; at n=16 and beyond, it never succeeded.)

This kind of sudden transition from easy to hard is a familiar feature in the realm of constraint satisfaction. Well-known intractable problems such as graph coloring and boolean satisfiability have the same structure. That doesn’t bode well for any of the simple-minded computational methods I’ve tried. Here’s a brief catalog of my failures. These are algorithms that I’m pretty sure are never going to pay off.

  • Biased random walk. Start with a randomly colored grid. Repeatedly choose a site at random, then try changing its color; accept the move if it reduces the overall number of monochrome rectangles. This is the simplest of all the algorithms. None of the more-elaborate schemes is decisively better.
  • Whack-a-mole. Find all the monochrome rectangles in the grid; choose one of them and alter the color of one corner, thereby eliminating the rectangle. In the simplest version of this algorithm, you choose the rectangle and the corner and the new color at random; in more sophisticated versions, you might evaluate the alternatives and take the one that offers the greatest benefit.
  • Steepest descent. Examine all possible moves (for the 17-by-17 grid there are 867 of them) and choose one that minimizes the rectangle count.
  • Lookahead steepest descent. Examine all possible moves, and then all possible sequels to each such move (for the 17-by-17 grid there are 751,689 two-move sequences); choose a sequence that minimizes the rectangle count. In principle this method could be extended to chains of three or more moves, but the cost soon gets out of hand. (The lookahead technique is the mirror image of backtrack search; it explores the tree of possible moves breadth-first instead of depth-first.)
  • Color-balanced search. Allow only moves that maintain the overall balance of colors in the grid. For example, in a 16-by-16 four-colored grid, color balance implies 16 sites in each color. One way to maintain balance is to make moves that swap the colors of two sites. (There is no reason to think that a rectangle-free grid will have exact color balance; on the other hand, a solution for a large grid cannot depart too far from perfect balance. Thus a color-balanced search might be an effective trick for finding a neighborhood where solutions are more common.)
  • Row-and-column-balanced search. Allow only moves that maintain the balance of colors within each row and each column of the grid. In a 16-by-16 grid, each row and column should have four sites in each of four colors. A simple way to maintain this detailed color balance is to search for “harlequin rectangles” with the color pattern \(\begin{array}{cc}a & b\\b & a\end{array}\) and permute them to \(\begin{array}{cc}b & a\\a & b\end{array}\).

Most of these techniques are greedy: At each stage the algorithm chooses an action that maximizes some measure of progress. On hard instances, a pure greedy strategy almost always fails; the search gets stuck in some local optimum. Thus it’s usually best to temper the greediness to some extent, occasionally choosing a move other than the one that yields the best immediate return. (The family of methods known as simulated annealing are more elaborate variations on this idea, based on insights from thermal physics.)

greediness-traces.png

Here we see traces of three runs of the algorithm identified above as steepest descent, with differing values of a greediness parameter m. (The grid size is 15 × 15.) At m=0 (no greediness at all), all moves are equally likely to be chosen, and the algorithm executes a random walk on the space of grid colorings. At m=1 (maximum greediness), the program always chooses the highest-ranked move, which works well until the system stumbles into a state where no move can reduce the rectangle count. A value of m=0.3 seems to be a good compromise. (I’ll say a little more below on the greediness parameter; indeed, I have a question about how best to define and implement it.)

After all this fussing with a dozen variations on local-search algorithms, I’m afraid the outlook for success is not promising. With a little patience and some tuning of parameters, any one of these algorithms can solve grids up to 15 × 15. With a lot more patience and tuning, they’ll eventually yield answers for 16 × 16. But none of the algorithms come even close to cracking the 17 × 17 barrier. Solving that one is going to require a fundamentally new idea. Perhaps someone will find an analytic approach to constructing a solution, rather than blindly searching for one. Or perhaps someone will prove that no solution exists.

On the computational front, I suspect the best hope is a family of algorithms known in various contexts as belief propagation, survey propagation and the cavity method. I’ve been hoping that friends who are expert in these techniques might swoop in and solve the problem for me, but if not I may have to give it a try myself.

In the meantime, here’s the thing about greediness (an apt subject for this time of year?). We want to define a function greedy whose arguments are a vector of alternative moves ranked from best to worst and a number m such that 0 ≤ m ≤ 1. If the greediness parameter m is 0, the function returns a random element of the vector. If m = 1, the returned value is always the first (highest-ranked) move. Otherwise, we must somehow interpolate between these behaviors. One attractive notion is to return the first element of the vector with probability m, the second choice with probability m(1 − m), and so on. Thus for m = 1/2 the series of probabilities would begin 1/2, 1/4, 1/8…. For m = 1/3 the first few values are 1/3, 2/9, 4/27….

This scheme works just fine for a vector of infinite length, but there’s a problem with shorter vectors. Consider what happens with the procedure call greedy(v=[1, 2, 3, 4], m=0.5). We have the following table of probabilities:

     1 --> 1/2
     2 --> 1/4
     3 --> 1/8
     4 --> 1/16

But on adding up those values, we come up 1/16th short of 1. What happens to the missing probability? I took an easy way out, distributing the “extra” probability equally over all the elements of the vector. The code looks something like this:

function greedy(v, m)
  for i=0 to length(v)
     if (i==length(v))
        return v[random(length(v))]
     elseif (random(1.0) < m)
        return v[i]

This procedure seems to give sensible results, but I wonder if there might be a better or more natural definition of greedy probabilities. Also, the running time for my code is logarithmic in the length of the vector (assuming m < 1). Is there a constant-time algorithm that gives the same results? (We don’t know the length of the vector in advance, so merely precomputing the table of probabilities is not an option.)