Archive for April, 2008

The problem of describing trees

Wednesday, April 30th, 2008

When I finished writing about the Zeno wagering game recently, I had some trees left over, so I thought I would try planting them here.

In the Zeno article I was trying to understand and explain the structure of this peculiar-looking tree, which I’ll call the tangled tree:

zenotree.jpg

As a warmup exercise, I started out with this simpler example, the combed tree:

binarytree.jpg

These are both binary trees: Each node has exactly two descendants. And the trees also share a basic internal symmetry. In both structures, if a node has horizontal coordinate x, the two descendant nodes are symmetrically arrayed at x-d and x+d. The difference between the two trees is all in how the displacement distance d is calculated. In the combed tree, d has the initial value 1/4 at the root node, and it is reduced by half at each lower level of the tree. In the tangled tree, the formula is d = 1/2 min(x, 1-x). In other words, d is half the distance to the nearer boundary of the interval [0,1].

There was a third tree I didn’t have room to include in the article. I had stumbled upon it when I made a programming error. This rogue structure looks like this, and I’m calling it the braided tree:

crosstree.jpg

The nature of my programming error is not hard to see. Intending to generate the tangled tree, I had meant to calculate the positions of the child nodes as follows:

      left = x - 1/2 min(x, 1 - x)
      right = x + 1/2 min(x, 1 - x)

But instead I carelessly wrote:

      left = x - 1/2 x
      right = x + 1/2 (1 - x)

Stupid goof, but the result is visually intriguing. And although the braided tree is a botched version of the tangled tree, it actually has a lot in common with the combed tree. In particular, both the combed and the braided trees have exactly the same set of nodes; the difference is that those nodes are wired differently.

One way to understand the difference is to look at the paths leading from the root to any given node. Each such path is a sequence of left and right turns. For example, in the combed tree you can get from the root to the node at x = 3/16 by turning left at the root, then left again at 1/4 and finally right at 1/8. In the braided tree the path to 3/16 goes right to 3/4 then left to 3/8 and left to 3/16. Paths of this kind can be encoded as binary numerals, letting a 0 stand for each left turn and a 1 for each right turn. In this scheme the two paths described above are 001 and 100.

In essence, the braided tree is a permutation of the combed tree. If you list all the nodes at some fixed level in the combed tree (say those three levels below the root, where all denominators are 16), then the corresponding paths are in numerical order:

    1/16    000    (0)
    3/16    001    (1)
    5/16    010    (2)
    7/16    011    (3)
    9/16    100    (4)
   11/16    101    (5)
   13/16    110    (6)
   15/16    111    (7)

Here’s the corresponding tabulation for the braided tree:

    1/16    000    (0)
    3/16    100    (4)
    5/16    010    (2)
    7/16    110    (6)
    9/16    001    (1)
   11/16    101    (5)
   13/16    011    (3)
   15/16    111    (7)

The order is “counting from the left,” incrementing the most significant bit first. I wonder what other permutations give interesting-looking trees.

Turning back to the tangled tree that began this whole investigation, I would argue that it looks tangled and unruly not so much because lots of edges cross (there are even more crossings in the braided tree) but because of many near collisions and coincidences. For example, just below the root of the tangled tree, the mirror-image paths <left-right-right> and <right-left-left> almost converge on the same value, but they miss by just a bit. We can fix this! All it takes is an adjustment in the formula for the displacement d. In the tangled tree that displacement is 1/2 min(x, 1-x). The constant 1/2 in this expression is quite arbitrary, and we can substitute any other value a, with 0 < a < 1.

So what is the magic value of a in the following tree, which I am inclined to call the crocheted tree?

phitree.jpg

Note: If the title of this post fails to ring a bell, see the poem by Robert Hass:

The gene pool threw up a wobbly stem
And the tree danced. No.
The tree capitalized.
No. There are limits to saying,
In language, what the tree did.

The temblor forecast

Tuesday, April 15th, 2008

From the Associated Press, via the New York Times:

LOS ANGELES (AP) — California faces an almost certain risk of being rocked by a strong earthquake by 2037, scientists said in the first statewide temblor forecast.

New calculations reveal there is a 99.7 percent chance a magnitude 6.7 quake or larger will strike in the next 30 years. The odds of such an event are higher in Southern California than Northern California, 97 percent versus 93 percent.

caquake.jpg

I read this report with a certain sense of wonder. What impressed me was not the prediction itself; it’s not the first time I’ve heard that the Big One is coming. What took me by surprise was the level of mathematical sophistication that we can now take for granted in readers of the morning newspaper. No more do we have to worry that people will add up 97 percent and 93 percent to get 190 percent. Evidently, we’ve reached a state of universal numeracy, where everyone knows how to combine probabilities, and there’s no need to explain the calculation. We don’t even need to remind anyone that when we compute 1 – (1 – p)(1 – q), or p + qpq, we are assuming that p and q represent probabilities of statistically independent events; everybody knows that. And everybody understands that in this context “a chance of a quake” really means “a chance of at least one quake.”

I guess the only place where we might still stumble is in actually doing the arithmetic. My calculator tells me the number is 99.8 percent, not 99.7.

A further note: The original report on which the news item is based leaves me even more perplexed. The probability model adopted in the forecast is explained as follows:

The simplest assumption is that earthquakes occur randomly in time at a constant rate; i.e., they obey Poisson statistics. This model, which is used in constructing the national seismic hazard maps, is “time independent” in the sense that the probability of each earthquake rupture is completely independent of the timing of all others. Here we depart from the… conventions by considering “time-dependent” earthquake rupture forecasts that condition the event probabilities… on the date of the last major rupture. Such models… are motivated by the elastic rebound theory of the earthquake cycle…; they are based on stress-renewal models, in which probabilities drop immediately after a large earthquake releases tectonic stress on a fault and rise as the stress re-accumulates due to constant tectonic loading of the fault.

In other words, it doesn’t sound as though the assumption of independence is even approximately satisfied. I must be missing something. The 99.7 percent combined probability is mentioned in the executive summary of the report, but I found no explanation of how that number was calculated.

Perhaps I shouldn’t worry so much. I live thousands of kilometers away in a zone of seismic serenity.

Update, several hours later: After reading a little more carefully, I think the report does assume that all possible earthquake sites are independent. At each site the probability of an event is a function of time, but it is independent of probabilities at other sites. Thus calculating a joint probability for the northern and southern parts of the state does seem to be a valid operation. And the distinction between “exactly one” and “at least one” doesn’t really enter into the matter either. That’s because the model is only valid until the next major earthquake occurs; after that, all bets are off, since the time-dependent probabilities have to be recalculated.

If this interpretation of the model is correct, I think the way the result is expressed is somewhat misleading. To say there’s a 97 percent chance in Socal and a 93-percent chance in Nocal implies there’s a high probability (90.2 percent) of seeing both events in the course of the 30-year period. But the model is no longer valid after the first quake.

I wonder if there isn’t a better way to express the concept at the heart of this story. Qualitatively, it’s easy enough to grasp: In the next 30 years there will almost certainly be a major earthquake somewhere in California, and the event is more likely to happen in the southern part of the state than in the northern part. Putting this into numbers is somewhat tricky—or at least I’ve had a lot of trouble with it. Having finally surrendered to the computer and performed a Monte Carlo simulation, I come up with this statement: There’s a 99.8 percent chance that the next major California earthquake will happen by 2037. If indeed such a quake occurs, the odds are about 57 to 43 it will hit in Southern California.

In Zeno’s footsteps

Thursday, April 10th, 2008

The latest issue of American Scientist is just out, both on the newsstand and on the web. My “Computing Science” column is titled “Wagering with Zeno”; it returns to a subject mentioned briefly in an earlier column, “Follow the Money.”

Consider a random walk on the interval (0,1), where the walker moves according to these rules:

  • The walker begins at x = 1/2.
  • At each step the walker moves either left (toward 0) or right (toward 1) with equal probability.
  • The length of each step is one-half the distance to 0 or to 1, whichever is nearer. In other words, the distance moved is 1/2 min(x, 1–x).

If you want to know more about this process and why I bothered to write about it, please see the column. Here I just want to say a few words about visual aids that might be helpful in understanding how the random walk evolves and where the walker is likely to wind up.

Here is one of the illustrations that appears in the column:

four trajectories of the Zeno walk, each plotted for 10 steps

We see four trajectories, each lasting 10 steps. Although the walk is actually one-dimensional—moving back and forth along an interval of the x axis—the trajectories are plotted in two dimensions for clarity, moving down the page so that the illustration becomes a kind of discrete spacetime diagram. Even with this device, though, some of the paths overlap for part of their course. If I tried to crowd more trajectories into the figure, the overlaps would make it hard to sort out one walk from another.

In making pictures like this, the issue is not just how you explain an experiment to other people; it’s also a matter of how you come to understand the process yourself. While I was trying to make sense of the Zeno walk, I plotted lots of trajectories like these, initially by hand and then with a program that generated Postscript files. Some important features were quickly apparent. In particular, I noticed that the walker takes much bigger strides in the middle of the space than near the edges. (Of course this fact follows directly from the definition of the walk, but it doesn’t hurt to see a picture.) I also observed that most of the walks seem to spend most of their time hugging one edge or the other, seldom venturing back across the midline of the space. Even after looking at a few hundred examples, however, I didn’t feel like I had a really secure sense of how a “typical” walk would proceed.

As with many processes that evolve over time, the illustrative possibilities are much richer if we can escape the static bounds of ink on paper. And thus I was led to try creating a more dynamic and interactive display.

screen image of zenowalk applet

What you see above is not dynamic; it’s merely static pixels on glass. But you can see it all in action in this Java applet. Feel free to click on the link and go play, but do come back here afterward so we can continue the conversation.

Hello again.

Does this program succeed in conveying an intuitive sense of how the Zeno walker walks? Personally, I’ve found it helpful, although it still comes up far short of ideal. Adjusting the number of steps in each walk gives a sense of how the walker’s apparent attraction to the edges gets more extreme over time. And adjusting the rate at which paths fade into the background allows for a degree of balance between clutter and evanescence. But there may well be a better way to go about visualizing these concepts. I’d be interested in hearing suggestions.

A note on the implementation: The applet was created with the programming language called Processing, invented by Ben Fry and Casey Reas. I’ve just reviewed two books on Processing, and so this project was undertaken partly to try out the language and its programming environment. I found it very slick, and mostly trouble-free. The source code of the zenowalk program is available through a link on the applet page. (Most of the code for the buttons and sliders was cribbed from the book by Fry and Reas.)

Of course there’s always something lacking in any programming language. In this case what I missed most acutely was the exact rational arithmetic I rely on in Lisp. Processing has only single-precision floating-point numbers. In the Zeno walk, these numbers begin running out of significant digits after about 150 steps (which is why I limited the program to 145 steps). As far as I can tell, no one has yet written a Processing library for bignums and exact rationals. I guess that’s my job.

Hard covers

Thursday, April 3rd, 2008

My new book has come out this week: Group Theory in the Bedroom, and Other Mathematical Diversions, Hill and Wang, xi+269 pages, $25. ISBN-10: 0-8090-5219-9, ISBN-13: 978-0-8090-5219-9, Library of Congress Call Number: T185.H39 2008.

GTiBcover200.jpgThis is a collection of essays on themes that will be familiar to many readers of bit-player.org. Indeed, the essays themselves may be familiar! Eleven of the twelve essays appeared earlier as “Computing Science” columns in American Scientist; the twelfth was published in The Sciences. But there is some new material: Appended to each chapter is an “Afterthoughts” section where I confess to errors, castigate critics, bring outmoded notions up to date, and generally try to offer readers some excuse for selling them a $25 book, when they could find most of the content free on the web.

The book has its own handsome web site (I was lucky to snap up the domain name “grouptheoryinthebedroom.com” before some speculator squatted on it). I cordially invite all of you to go over there and have a look around. Meanwhile, back here at bit-player HQ, I’m going to throw myself a little party. See you in a day or two.

’Tis pleasant, sure, to see one’s name in print; A book’s a book, although there’s nothing in’t. —Lord Byron