Archive for August, 2008

Life Curves

Sunday, August 24th, 2008

J. John Sepkoski, Jr., was a fossil-hunter who did most of his digging in the library, sifting through the literature of paleontology to build a detailed, quantitative timeline of life on earth. Focusing on marine animals, he recorded the earliest and the latest known appearances of thousands of ancient organisms. The final edition of his compendium, published in 2002 (three years after his death at age 50), lists dates for more than 36,000 genera.

A few years ago I had a chance to get closely acquainted with Sepkoski’s compendium, when I needed a machine-readable version of the timeline. The listings were published on CD-ROM (remember those?), but the files were merely unstructured plain text. I needed something I could compute with, and so I spent a week or two reformatting the records and importing them into a database. (Others have done the same thing. Shanan Peters of the University of Wisconsin–Madison maintains an online version.)

Here is the summary graph that was the goal of my data-conversion project; it shows the number of extant genera as a function of time, according to Sepkoski’s tally of comings and goings:

Spekoski.png

My brief hands-on experience with Sepkoski’s compilation gave me a sense of how much care went into its preparation. Getting any large data collection into a computer tends to be a fiddly process. Irregularities that a human reader would hardly notice are sand in the gears of automated text processing. Sepkoski’s data files caused less trouble than I expected. The problems I encountered were mainly trivial typographic anomalies—missing punctuation, erratic spacing—and even those were surprisingly rare. The only hints of potentially meaningful errors were a dozen pairs of duplicated entries, where the same genus appeared twice in the listings. It’s easy to see how that would happen in a project that went on for almost three decades; indeed, it’s amazing there weren’t more duplicates.

In any case, I came away from this project with great respect for Sepkoski’s accomplishment, but that doesn’t mean that the curve reproduced above represents the final word on the history of life. It’s not even clear that the main features of the curve and its overall shape give an accurate portrait of changes in global biodiversity.

In constructing any such historical time series, certain biases and distortions are hard to overcome. Of particular importance in this case, fossils from more recent intervals are more likely to survive and to be discovered than those from more ancient times. This “pull of the recent” effect raises questions about the steep upward trend that dominates the Sepkoski curve from the Cretaceous to the present. Has evolution really been going crazy with innovation throughout the past 150 million years, or is that hockey-stick curve an artifact of preservational and sampling bias?

A newly completed analysis of another big fossil database addresses this question (and others). The data source for the new analysis is the Paleobiology Database, a large collaborative project coordinated by John Alroy of the University of California–Santa Barbara. The Paleobiology Database might be called a metacompilation: It brings together statistical and descriptive information from thousands of more-specialized fossil collections (83,444 at the latest count). Initial work on the database began a decade ago (Sepkoski was an early contributor), but it has shown a recent growth spurt.

Of course the new database is vulnerable to the same kinds of systematic bias that Sepkoski had to confront. There’s no avoiding the fact that, on the whole, younger geological strata are more accessible and better studied, and younger fossils are better preserved. But by organizing the data differently and retaining more information about each taxonomic group, Alroy and his colleagues see an opportunity to correct or compensate for some of the biases. Of particular note, whereas Sepkoski recorded only the first and last known appearance of each genus, Alroy et al. attempt to keep track of every occurrence of an organism. This extra information allows sampling bias to be estimated and corrected.

Consider these hypothetical fossil records, where each dot represents a single occurrence of a fossil organism in one of nine labeled intervals:

Alroy.png

In both cases Sepkoski’s protocol would merely indicate that the taxonomic group originated in period 3 and became extinct in or after period 8. The new database records each time unit in which the fossil was found and, whenever possible, the number of occurrences per interval. This data might seem like superfluous detail. After all, if an organism was alive in periods 3 and 8, we can safely infer that it must have existed in periods 4, 5, 6 and 7 as well, whether or not fossil evidence has come to light. But it turns out that recording occurrences rather than just chronological ranges allows for some helpful statistical magic.

As I understand it, the scheme works something like this. Suppose we could gather together all the fossils ever collected by paleontologists, and sort them into bins according to age. Because of the various sampling and preservational biases, the bins for fairly recent periods (say 50 million years ago, in the Tertiary) would be much fuller than the bins for earlier times (say 400 million years ago, in the Devonian). Any bin with more specimens would be likely to exhibit more diversity as well, simply because rare organisms have a better chance of showing up at least once in a larger sample. But we can control for this bias through a simple subsampling procedure: Draw a fixed number of specimens from each bin, making each selection at random and with replacement. The counts of genera in the subsamples should reflect the true diversity of the biota in each bin.

In practice it gets more complicated than that, because we can’t actually sample the entire fossil record at the level of individual specimens; the best we can do is to randomly choose collections of fossils or the publications that describe them. And the publications vary greatly in how much quantitative data they include; some are just lists of species observed.

After many adjustments, refinements and calibrations, Alroy and 34 co-authors have published a diversity curve based on the subsampling technique:

Alroy.png

(Graph courtesy of John Alroy.)

Their article (subscription required) appeared last month in Science, along with 67 pages of supplementary material.

The Sepkoski and the Alroy graphs are twins separated at birth—widely separated. The overall upward trend still exists in the newer graph, but it is much less dramatic, especially in the past 100 million years. Some of the famous mass-extinction events, such as those at the end of the Permian (P) and at the end of the Cretaceous (K), are visible in the new graph but are altered in character; instead of a sudden crash after a sustained build-up, we see something more like a return to normal after a brief, sharp spike in diversity. (Alroy elaborates on the dynamics of mass extinctions in a second recent article, this one in PNAS.)

Looking at the two curves, I arrive at this question: How is the interested but nonexpert reader to evaluate these contrasting views of our planetary past? I want to emphasize that the question animating me is not “Who is right?” but “How can we know who is right?” Is there some way that the ordinary, scientifically literate outsider can form a reasoned judgment about such competing claims to truth?

It was questions like these that got me in trouble the last time I wandered into this area. In 2005 Richard A. Muller of the Lawrence Berkeley National Laboratory and Robert A. Rohde, a graduate student at UC Berkeley, published a report in Nature claiming to detect periodic cycles of rising and falling diversity in the Sepkoski data. Applying Fourier analysis to the time series, they reported finding a strong signal at a period of 62 million years and a weaker one at 140 million years. The claim was controversial from the start, and I decided to take a do-it-yourself approach to understanding the issue. I went back to the original data, reimplemented the analytic methods and tried to assess the robustness of the conclusion. I told the story in an American Scientist column.

The column pleased no one. It certainly didn’t please Muller and Rohde, who objected that I was out of my depth in my amateur attempt to replicate their work. It didn’t please the critics of the Muller-Rohde hypothesis, who thought my focus on certain narrow technical issues deflected attention from deeper conceptual flaws in the argument. And it didn’t please me, because I agreed with the criticisms from both sides.

I should also mention that my column had zero impact on the controversy, which not only continues to rage but has also been extended to the new database. Alroy writes in the PNAS article that some of the peaks and valleys forming the supposed cycles fail to materialize in the new data set. On the other hand, a preprint from Adrian L. Melott of the University of Kansas argues that cycles with periods of 62 and 150 million years emerge from the Paleobiology Database with higher statistical significance than they had in the Sepkoski collection.

All in all, I think I’ll sit this one out. I’ve been itching to get my hands on some records from the new database and implement the subsampling algorithm (which sounds both intriguing and readily accessible). It would be fun to play with these ideas. But I’ll let someone else have the fun this time.

Science builds its credibility on the bedrock idea that experiments and other kinds of results are subject to independent confirmation or refutation. And the advent of computational science has made this egalitarian ideal much more practical than it used to be. Although experiments in high-energy physics remain beyond the means of most amateurs, anything done with a computer rather than a particle accelerator is pretty much fair game these days. Still, there are bounds. If every reader set out to replicate every experiment, the world wouldn’t make much progress.

Shut up and calculate!

Tuesday, August 12th, 2008

In my latest American Scientist column I advert to a famous passage in Leibniz (translation by Robert Latta, 1898):

When controversies arise, there will be no more necessity of disputation between two philosophers than between two accountants.  Nothing will be needed but that they should take pen in hand, sit down with their counting tables and (having summoned a friend, if they like) say to one another: Let us calculate.

That final exhortation, “Let us calculate,” is a single word in Leibniz’s Latin text, lending me a title for my column: Calculemus! I might have expressed the same thought more forcefully, quoting somebody or other, as: “Shut up and calculate!”

The column is a manifesto in support of computer programming as a tool for exploring, experimenting and problem-solving—what I like to call inquisitive computing. This is an activity that shouldn’t need defending or promoting, and yet I feel it is becoming a neglected art. These days, computer programming is often viewed mainly as an aspect of software development—but that’s not the kind of programming I have in mind. In software development, the program is the product; it’s what you hand over to the end user. Inquisitive computing is different: The program is just a device for getting an answer. The ultimate goal is not the program itself but the result of running the program. Indeed, once the answer is in hand, the program is often of no further use or interest and can be thrown away.

My gripe is that tools for inquisitive computing are not getting as much attention as they once did. The world of software development offers luxurious, richly appointed programming environments, systems such as Xcode on the Macintosh, or the Eclipse editor favored by many Java programmers. But these systems are not well-adapted to the needs of inquisitive computing, where the emphasis is on low overhead and incremental, trial-and-error methods.

Am I alone in feeling aggrieved about this situation? I’d be interested in knowing what others think. Do you do the kind of programming that I’m describing as inquisitive computing? What tools do you use? Are you happy with them? (Let me hasten to add that I don’t see this as another food fight over the virtues of various programming languages. I’d welcome well-made environments for inquisitive computing based on any language whatever.)

In any case, I don’t want to sound too grumpy about this. The new column is also supposed to be an anniversary celebration: I’ve been writing these essays for 25 years. To mark the occasion I am trying to make some of my earlier work available online. There are no machine-readable copies of the early columns, so my only recourse is to run paper pages through the scanner. The result is a bloated PDF of marginal quality; sorry, it’s the best I can manage. So far I have scanned about a dozen of the columns I wrote for Scientific American and Computer Language in the 80s and for The Sciences in the 90s. I’ll be adding more over the next few weeks. There are links in my publications list.

Big Money

Sunday, August 3rd, 2008

Zimbabwean bank notes, including a ZW$50,000,000,000 Special Agro-Check

(Photo courtesy ZeroOne.)

It’s a cruel irony: As the citizens of Zimbabwe sink into bitter poverty, they are becoming millionaires and billionaires. Inflation is eroding the value of the Zimbabwean dollar so rapidly that everyday transactions turn into lessons in the arithmetic of large numbers. When the photo above was made on July 17, the largest currency denomination in circulation was a note for ZW$50,000,000,000. Last week the nation’s central bank issued a ZW$100,000,000,000 bill. (I’ll spare you the trouble of counting zeroes: That’s 1011, or 100 billion by American reckoning.)

The Zimbabwean inflation is the worst in the world at the moment, but it is not (yet) setting all-time records. Probably the most famous episode of extreme inflation was that of the German Weimar Republic (a story told vividly in Erich Maria Remarque’s novel The Black Obelisk.) In 1921, German marks traded at about 60 to the U.S. dollar; two years later, in December of 1923, the exchange rate was 4.2×1012 per dollar. The Hungarian inflation following World War II reached even greater numerical heights. In a single year the exchange rate for the Hungarian pengo went from 100 per U.S. dollar to 4×1029. As Feynman said, astronomical numbers are dwarfed by economical ones.

Takayuki Mizuno, Misako Takayasu and Hideki Takayasu have analyzed the German and Hungarian episodes of “hyperinflation.” (Citation: Physica A 308 (2002) 411; there’s also an arXiv preprint.) Inflation at its worst, they find, proceeds at a doubly exponential rate. In other words, prices rise not just as an exponential function of time—exp(t)—but as an exponentiated exponential—exp(exp(t))—or:

doubleexpt.png

This growth law has a simple meaning in terms of everyday experience. With “ordinary,” single-exponential inflation, prices have a constant doubling time. If bus fare was 1 million last month and 2 million this month, it will be 4 million next month. Under double-exponential growth, the doubling time itself decreases exponentially. In the last months of the Hungarian inflation the doubling time fell from about 20 days to 15 hours.

On a logarithmic scale, a simple exponential function yields a straight-line graph. Here is the Mizuno-Takayasu evidence that the final phase of the Hungarian inflation was superexponential:

Mizunofg1.jpg

And here are the data for the final six months plotted as log(log(p(t))), showing a simple linear trend:

Mizunofg2.jpg

How does the Zimbabwean economy look when submitted to this kind of scrutiny? I don’t know of a reliable source of data on prices in Zimbabwe, but foreign exchange rates can serve as a rough proxy. Until three months ago, the official ZW$ rate was pegged at roughly 30,000 per US$, but on May 10 the currency was allowed to float free, and the rate immediately jumped to 190,000,000 ZW$ per US$. By July 31 the rate had reached 57,381,544,140. Thus the 50 billion ZW$ note in the photo above was worth a little less than a 1 US$ by the end of last month. And that’s at the official rate of exchange; the street value is reportedly about a tenth of the official quote.

Here’s how the official exchange rate has varied in the 84 days between May 10 and August 1, as plotted on a linear scale:

ZW-rates.png

And here’s the same data after a logarithmic transformation:

ZW-log-and-fit.png

Although there’s more bumpiness here than in the Mizuno-Takayasu data, the trend looks reasonably linear to me. The fitted line has slope 0.03358, which yields a doubling time of about nine days. I see no hint of superexponential growth. I’d like to think this is an encouraging sign, a glimmer of hope that Zimbabwe will be spared an even more pernicious phase, when even inflation has inflation.

Runaway inflation is usually blamed on the incompetence or malevolence of governments and the central banks that implement their policies. In the case of Zimbabwe, the government of Robert Mugabe certainly has a lot to answer for. The country was once the shining success story of southern Africa—I have friends who migrated across the continent to go to school there—but the nation is now a basket case, and inflation is only one of many urgent crises. (The unemployment rate is reported to be 80 percent.) The Mugabe regime can’t escape blame for this situation. Still, it seems that hyperinflation is not to be explained purely in terms of fundamental economic imbalances—too many dollars and not enough goods. Sometimes it seems there is also a psychological component. When you believe that prices will double next week, you raise your own prices in anticipation. It’s a self-reinforcing process.

One sign of such a feedback loop in the inflationary spiral is that inflation sometimes stops even though the underlying economic situation hasn’t really changed. The Weimar hyperinflation ended with the introduction of the Rentenmark, which was set equal to 1012 old marks but really had no firmer backing than the earlier Papiermark. The change in currency did nothing to solve Germany’s problems of debt and unemployment, but the inflation ended anyway. Evidently, people chose to believe that the value of the Rentenmark would remain stable, and it did.

The central bank of Zimbabwe has just announced a similar effort at currency reform, devaluing the ZW$ by a factor of 1010. In other words, the ZW$100,000,000,000 note introduced a week ago is equal in value to a new ZW$10 bill. According to press reports, the main motive for the change was simply logistical convenience:

Gideon Gono, the Central Bank governor, … acted because the high rate of inflation was hampering the country’s computer systems. Computers, electronic calculators and automated teller machines at Zimbabwe’s banks cannot handle basic transactions in billions and trillions of dollars. (AP/Baltimore Sun)

But perhaps one can hope that the newly denominated currency will bring more than numerical benefits. Over the weekend, the official exchange rate has held at 6.569 new Zimbabwe dollars to the U.S. dollar. We’ll have to wait a few more days to see if the curve has really flattened out.

Update 2008-09-04: With another month of exchange-rate data, here’s what the situation looks like:

ZW-rates-904.png

ZW-log-and-fit-904.png

The blue line in the semilog graph is the same as the one in the corresponding earlier graph—that is to say, it is fitted to the first 80 days of data. It appears that the inflation rate has diminished slightly since the revaluation at the end of July. But that slightly lower rate is still formidable; in a little more than a month the value of the new Zimbabwe dollar has fallen from about 15 cents (U.S.) to about 2 cents.

Update 2008-10-02: After another month, what passes for good news is that the rate of exponential growth does not seem to be growing:

On the other hand, news reports suggest that the situation in Harare is bleaker than ever. Money is scarce as well as nearly worthless; people stand in line all night for the privilege of withdrawing the equivalent of a dollar or two from their own bank accounts. (Note that the equivalent of $1 U.S. is $ZW137 in the devalued currency issued in August. In pre-devaluation Zimbabwe dollars, it comes to $ZW1.37 trillion.)

Isn’t it curious that both here in the U.S. and in Zimbabwe, the financial pages are filled with such enormous numbers.

Update 2008-11-02: One more month of data:

Still no sign of “hyperinflation”—if that term is taken to mean doubly exponential growth—but that can’t be much solace to the Zimbabweans whose currency has yet again lost three-fourths of its value over the course of a month. Adjusting for the August devaluation, one U.S. dollar now buys 5.6 trillion Zimbabwean dollars.