QCD

19 October 2008

Lattice QCD is something I’ve been trying to understand for 30 years. My latest attempt is chronicled in the new issue of American Scientist.

latticequarks.png

QCD is quantum chromodynamics, the theory of interactions between quarks and gluons. The lattice version of the theory appeals to those of us who like our physics in discrete, countable bits. In the lattice formulation, quarks and gluons exist only at the nodes of a four-dimensional spacetime grid. There’s no evidence that the universe really has such a rectilinear structure, but it turns out to be a useful fiction when you want to calculate things like the masses of quarks. In large measure we are all made of quarks, and so it seems like a good idea to know some basic facts about them.

In my quest to figure out what lattice QCD is all about, I’ve had a lot of help. Years ago, as an editor of magazine articles, I had the splendid opportunity to get one-on-one tutorials from Kenneth G. Wilson and from Claudio Rebbi. I received further help for this latest project, and for that I want to thank G. Peter Lepage of Cornell.

In a paper titled “Lattice QCD for Novices” (written and published in 1998 but posted on the arXiv in 2005), Lepage presented a simple Python program that illustrates some of the key ideas behind lattice QCD. The complete source code for the program is given within Lepage’s article, but to save readers the trouble of gathering the pieces and extracting them from a PDF, I have (with Lepage’s permission) made the program available here in the file qcd.py. I’ve also made some trivial changes for the sake of compatibility with more recent versions of Python.

Finally, thanks too to Massimo Di Pierro of Depaul University for a vizualization of lattice QCD in action.

Survey on computing in the sciences

17 October 2008

Do you create software for scientific computing, or use such software in doing research? Then my friend Greg Wilson would like to hear from you. Together with colleagues from the University of Toronto, Simula Research Laboratory and the National Research Council of Canada, he is conducting a survey on practices in scientific computing. Greg plans to report the results next year in an American Scientist article.

Demaine event

16 October 2008

“Between The Folds” uncovers the stories of ten fine artists and intrepid theoretical scientists who abandoned careers and scoffed at hard-earned graduate degrees—all to forge unconventional lives as modern-day paperfolders.

Perhaps such a film should be kept from impressionable youth, lest more degree-scoffers be lost to a life in the creases. Nevertheless, the one-hour documentary, written and directed by Vanessa Gould, is showing online this weekend, today through Sunday only, in connection with the Hamptons International Film Festival. Several of the featured folders come from the world of science and math: Robert Lang, Tom Hull, Erik Demaine, and Erik’s father Martin Demaine.

ErikDemaine1877.jpg

If you’d like still more cinematic experience of Erik Demaine, his recent talk at the opening symposium for the Microsoft New England research center is now online, where he performs magic tricks as well as showing off origami. (But the video is in a Windows-only format—what’s with that?)

At right Erik makes a hand-waving argument while blindfolded.

Thanks to Barry and Ros for tips on these items.

Spam by the numbers

4 October 2008

Reviewing this month’s batch of incoming junk mail, I stumbled upon the following message:

numberspam440.png

In case that image is too tiny to read, here is the first word in source-code form:

     28    47   34
     74    33
      85  42
      16  43    25    5048     08124   8813    2714
      34  02    25       66   50  31   855        05
       3404     65    88362   00  25   72      01651
       8008     36   42  77   27  81   06     04  40
        72      83   02  32   47  12   24     87  33
        78      03    87100    83844   18      21813
                                  08
                              73634

The basic technique is anything but novel. I can remember green-and-white-striped printouts that had my name emblazoned in the same kind of two-inch-high characters. But why are the characters here formed entirely out of numbers, rather than other ASCII glyphs? And do the numbers themselves mean anything?

I think I know the answer to the first question: The spammer thought a message composed of nothing but numerals might slip through the spam filters. (In my case, at least, it didn’t work. I fished this message out of the garbage pail.)

As for the second question, my immediate guess was that the digits are the output of some simple pseudo-random number generator. That would be an easy way to produce them, and it would also allow the spammer to make each individual message unique. On taking a closer look, however, I realized there was something quite nonrandom about the numbers in the message.

Here is the full list of digits. There are exactly 900 of them. Do you see what’s missing?

284734807433341016202332628542642574418481303116432550480812488
132714721846667434022566503185505580464271163634046588362002572
016511712427000046735580083642772781060440148383627872830232471
224873301464000807803871008384418218130077346262602008225346571
155727363470732323181618223162744253246331737038301533254837881
148802160371074555632302255640217448457046416116253484658726108
147181540061231788804563557807254177278106044014838362787283023
247122487330146400080780387100838462042135220046847482422143746
770236783058460185444521283134537306537546855305024142275437615
010235002438258320577785451436776143066166025853832747551576004
831136831376228235381112678466011047530048032816623514158481030
413446024450055236762111281250031205166204213522004684748242214
374677023678305846018544452128313453730653754685530502414227543
761501023500243825832057778545143677614306616602585383274755157
600483113683137622

There’s nary a 9 in the bunch. And in other respects too the digit distribution looks slightly off-kilter:

digitdist.png

When I tabulated all the correlations between successive digits, that too looked a little fishy, although the sample is too small for any reliable conclusions.

                   s e c o n d
           0  1  2  3  4  5  6  7  8  9
      0   23 12 20  9 17  9  7  7  8  0
      1   11 13 11 12 16  8 13  5 10  0
   f  2   11 11 13 15 14 15  6 14  9  0
   i  3   18 13 15  7 11  8 13 13 12  0
   r  4    8  9 12 13 12 10 22 11 18  0
   s  5   11  7  5 14 12 14  4 10 11  0
   t  6   12 10 15  6  8  7 10 10  6  0
      7    6 10  9 10 12  7  9 11 14  0
      8   12 14  8 24 13 10  0  7  6  0
      9    0  0  0  0  0  0  0  0  0  0

So what’s going on here? I think the pseudo-random generator is still a leading candidate, though it would have to be a badly implemented RNG. The absence of 9s isn’t hard to explain: We only have to suppose that the spammer was working in C and wrote the plausible-looking expression random(9), thinking that would generate integers between 0 and 9.

On the other hand, maybe it isn’t random. Maybe there’s a secret message-within-the-message. Anybody see a pattern?

While I’m talking spam, I’ll update my ongoing tally of my inbox contents. I can report that September was a good, strong month for spam, with further steady growth continuing the summer-long trend. The stock market is in retreat and credit is tight, but the purveyors of replica watches are undeterred. My receipts have crossed the 5,000-per-month threshold for the first time:

spamcounts.png

And another threshold has also been left behind: For the first time this month, more than half of my spam is written in Russian. (Based on character-set declarations, 2,858 messages out of 5,021 were in Cyrllic scripts, or about 57 percent.)

Update 2008-10-12: In response to a request in the comments, I’ve uploaded the full text (including headers) of the original email. The file is here. Incidentally, I’ve searched my spam archive for other messages like this one, without success. That in itself makes this a peculiar spam. Usually, if I get a spam once, I see dozens of copies or variants within a few days.

Alfonso’s universe

30 September 2008

Tardy announcement: I’ll be giving a talk tomorrow at Harvard. Details. If you’re in the neighborhood, please look in.

Update: Harvard has posted video of the talk; please have a look, if you’re interested, and you’re equipped to play RealMedia files. I’m not equipped, so I haven’t seen it myself. But I’m sure it’s wonderful (thanks to the AV expertise of Bill Countie and Geoff Maness).

Slides are here (PDF), though they won’t make much sense without the narration.

An amiable companion

25 September 2008

GowersCompanionCover.gif

The mail has brought me an early copy of The Princeton Companion to Mathematics, a 1,034-page compendium edited by Timothy Gowers (as well as June Barrow-Green and Imre Leader, associate editors), with contributions from more than 130 other authors. I’ve only just begun to browse through its pages, but already I’m completely charmed. This is one of those books that makes you wish you had a desert island to be marooned on.

In the preface, Gowers is at pains to establish that his book is a companion, not an encyclopedia. What that means, in part, is that authors are allowed to exhibit attitude and personality. Gowers wastes no time in doing so himself. The preface begins by citing a definition of “pure mathematics” written by Bertrand Russell in 1903:

Pure Mathematics is the class of all propositions of the form “p implies q,” where p and q are propositions containing one or more variables, the same in the two propositions, and neither p nor q contains any constants except logical constants….

Russell is allowed to go on in this vein for another eight lines, and then Gowers remarks: “The Princeton Companion to Mathematics could be said to be about everything that Russell’s definition leaves out.”

Since I haven’t yet read more than 5 percent of the book, I’m in no position to review it here, but I think I can say a little more about the nature of what’s in it. Two big sections are essentially reference material: 99 short articles on mathematical concepts (arranged alphabetically) and 96 biographies of mathematicians (arranged chronologically, excluding living persons; the median birthdate is 1822). A third section gives brief accounts of 35 “theorems and problems,” many of them either open or recently solved (the Riemann hypothesis, the Mordell conjecture, Fermat’s Last Theorem) but also including a few classics (the three-body problem, the insolubility of the quintic).

Lots of good stuff in all of those sections, but not a lot of surprises. What I’m really warming up to are the parts of the book where authors are given freer rein to follow their own particular instincts or obsessions and where they express more distinctively personal views.

Gowers himself wrote a 76-page introduction that undertakes to explain what mathematics is all about, not only as a body of knowledge but also as a cultural phenomenon and as a way of thinking about the world. (The last section of the chapter is titled “What do you find in a mathematical paper?” At one point, Gowers begins to sound a little like Russell: “The object of a typical paper is to establish mathematical statements.”)

Seven more essays take another stab at introducing mathematics, this time working from a historical perspective. For the most part the sequence of topics follows an uncontroversial trajectory through the past two millennia: numbers, geometry, algebra, analysis, proof. But the editors have also decided to put algorithms on an equal footing with these subjects, a choice that would have been unlikely 50 years ago. On the other hand, the historical progression culminates in “The Crisis in the Foundation of Mathematics.” The crisis in question is that of the intuitionist rebellion and Gödel’s incompleteness results, events of the 1920s and 30s. It’s rather like a political history of the world that ends with the conflict between communism and fascism. But I suppose that the rest of the book could be taken as an effort to fill in the record of what’s happened since then.

The best bits of all come at the end of this weighty volume. Although the Companion claims to focus on “pure” mathematics, 14 chapters on “The Influence of Mathematics” show a definite leaning toward applications. We get views of mathematics in chemistry, biology, economics, statistics, music and art. And there are a few more narrowly focused essays, on topics such as wavelets, traffic and cryptography.

The book’s last section, titled “Final Perspectives,” is where I would recommend beginning. Here are the contents:

  • The Art of Problem Solving, by A. Gardiner.
  • “Why Mathematics?” You Might Ask, by Michael Harris.
  • The Ubiquity of Mathematics, by T. W. Körner.
  • Numeracy, by Eleanor Robson.
  • Mathematics: An Experimental Science, by Herbert S. Wilf.
  • Advice to a Young Mathematician, with contributions from Sir Michael Atiyah, Béla Bollobás, Alain Connes, Dusa McDuff and Peter Sarnak.
  • A Chronology of Mathematical Events, by Adrian Rice.

Details: Princeton University Press. ISBN: 978-0-691-11880-2. Price: $99. The book’s web page has PDFs of a few chapters, as well as an interview with Gowers. Gowers also has a blog with a few entries pertaining to the book.

Let’s blame the accountants

20 September 2008

Economics has always mystified me, but watching the death spiral on Wall Street this past week has left me even more baffled than usual. I don’t pretend to understand what’s happening. But this morning I’ve begun to wonder if maybe—just maybe—there are some aspects of this debacle that I fail to understand not just because I’m thick but because they actually don’t make sense.

The part that’s puzzling me right now is an accounting rule known as “mark to market.” If I understand correctly, the gist of the mark-to-market principle is that the value of a thing is whatever you could get for it if you were to sell it right now. “Fair value” = “current exit price.” For some kinds of assets there’s a longstanding tradition of assigning value in this way, but in the past year the rule has been applied much more broadly. (See Rule 157 of the Financial Accounting Standards Board, which took effect November 7, 2007.)

Is it really a good idea to equate price and worth quite this rigidly? I’m not trying to overturn all that adamsmithian capitalist orthodoxy about a willing buyer and a willing seller and an invisible hand. And I understand that mark-to-market was introduced as an improvement over the mark-to-wishful-thinking kind of accounting that led to the previous round of scandals, such as Enron. Still, could it be that we’ve gone too far?

Consider the guy who sells umbrellas on the sidewalk around the corner from the New York Stock Exchange. Under mark-to-market rules, his inventory is assigned a much higher value on rainy days and becomes nearly worthless when the sun shines. I suspect that the umbrella guy has a very clear understanding of how the weather affects his business, and yet he doesn’t just throw away all of his stock whenever the sky turns blue. He knows that under those conditions he can’t sell an umbrella at a profit, but he still considers the umbrella to have a certain intrinsic worth. He hangs on to it. He carries it on his books as an asset. This concept of inherent value is apparently too sophisticated for the traders inside the exchange. In that world, if the Emir of Dubai doesn’t need an umbrella on Monday, then all umbrellas are “toxic.” But if the Secretary of the Treasury asks for an umbrella on Friday, then everybody invests in umbrellas.

Surely I’m missing something. It can’t be this stupid.

Marketplace of Ideas interview

5 September 2008

A couple of weeks ago I had the pleasure of chatting with Colin Marshall, whose radio show “The Marketplace of Ideas” is broadcast and webcast from KCSB in Santa Barbara. The main subject of our talk was Group Theory in the Bedroom, but the conversation also wandered into topics like the nature of publishing in the Internet age. The interview is available in streaming audio or as an MP3 file or as a podcast distributed via the iTunes store.

And if you grow weary of my voice, you might browse some of the other interviews—maybe Denis Dutton of “Arts and Letters Daily,” Michael Shermer of The Skeptic, or Steve Wozniak.

Shut up and program!

3 September 2008

This is an update to “Shut up and calculate!” (which was posted here three weeks ago, 2008-08-12).

Many thanks to the readers who have suggested programming languages or environments for inquisitive computing. There are a dozen or so recommendations in the comments to the earlier post, and I’ve received even more by private correspondence. Here is a summary of the suggestions, in no particular order. (Numbers in parentheses indicate the number of times each system was mentioned.)

  • R (2). The open-source version of the S language for statistics. (Also accessible through Sage.)
  • Sage (1). Open-source mathematical software.
  • Haskell (2). Lazy functional programming language.
  • Python (2). Scripting and programming language.
  • MATLAB (2) and Mathematica (1). $$$ mathematical software.
  • Programmable calculators (3). The specific machines mentioned were the HP-15C, the TI-83 and the TI-V200.
  • Yorick (1). An open-source scripting language that emphasizes scientific computation.
  • Octave (2). Open-source mathematics software similar to MATLAB.
  • SuperCollider (1). Language for audio and music synthesis.
  • Fathom and Tinkerplots (1). Data-analysis and graphics software from Key Curriculum Press.
  • DERIVE (1).Successor to muMath; now discontinued.
  • Excel and other spreadsheets (2).
  • APL (1), C (1), Forth (2), Fortran (2). Old favorites.
  • UBASIC (1). BASIC with bignums and rationals; last release seems to be 1998; MS-DOS only.
  • Scala (1). Recent open-source language that uses Java runtime facilities or .NET.
  • “Roll your own” (2). Two readers politely suggested that if I think I know how a programming environment ought to work, then I ought to build one myself.

I’m impressed and surprised by the wide spectrum of responses. It’s not just that there were so many different answers but also that they come from some very distant corners of the computing universe. Several of the systems mentioned are new to me, and I plan to give them a look.

From all of the above it appears that we have some happy campers out there—people who have found programming tools that suit their needs. Others share at least some aspects of my discontent. But given the vastly differing preferences expressed here, it seems unlikely that any one solution could please everyone.

Life Curves

24 August 2008

J. John Sepkoski, Jr., was a fossil-hunter who did most of his digging in the library, sifting through the literature of paleontology to build a detailed, quantitative timeline of life on earth. Focusing on marine animals, he recorded the earliest and the latest known appearances of thousands of ancient organisms. The final edition of his compendium, published in 2002 (three years after his death at age 50), lists dates for more than 36,000 genera.

A few years ago I had a chance to get closely acquainted with Sepkoski’s compendium, when I needed a machine-readable version of the timeline. The listings were published on CD-ROM (remember those?), but the files were merely unstructured plain text. I needed something I could compute with, and so I spent a week or two reformatting the records and importing them into a database. (Others have done the same thing. Shanan Peters of the University of Wisconsin–Madison maintains an online version.)

Here is the summary graph that was the goal of my data-conversion project; it shows the number of extant genera as a function of time, according to Sepkoski’s tally of comings and goings:

Spekoski.png

My brief hands-on experience with Sepkoski’s compilation gave me a sense of how much care went into its preparation. Getting any large data collection into a computer tends to be a fiddly process. Irregularities that a human reader would hardly notice are sand in the gears of automated text processing. Sepkoski’s data files caused less trouble than I expected. The problems I encountered were mainly trivial typographic anomalies—missing punctuation, erratic spacing—and even those were surprisingly rare. The only hints of potentially meaningful errors were a dozen pairs of duplicated entries, where the same genus appeared twice in the listings. It’s easy to see how that would happen in a project that went on for almost three decades; indeed, it’s amazing there weren’t more duplicates.

In any case, I came away from this project with great respect for Sepkoski’s accomplishment, but that doesn’t mean that the curve reproduced above represents the final word on the history of life. It’s not even clear that the main features of the curve and its overall shape give an accurate portrait of changes in global biodiversity.

In constructing any such historical time series, certain biases and distortions are hard to overcome. Of particular importance in this case, fossils from more recent intervals are more likely to survive and to be discovered than those from more ancient times. This “pull of the recent” effect raises questions about the steep upward trend that dominates the Sepkoski curve from the Cretaceous to the present. Has evolution really been going crazy with innovation throughout the past 150 million years, or is that hockey-stick curve an artifact of preservational and sampling bias?

A newly completed analysis of another big fossil database addresses this question (and others). The data source for the new analysis is the Paleobiology Database, a large collaborative project coordinated by John Alroy of the University of California–Santa Barbara. The Paleobiology Database might be called a metacompilation: It brings together statistical and descriptive information from thousands of more-specialized fossil collections (83,444 at the latest count). Initial work on the database began a decade ago (Sepkoski was an early contributor), but it has shown a recent growth spurt.

Of course the new database is vulnerable to the same kinds of systematic bias that Sepkoski had to confront. There’s no avoiding the fact that, on the whole, younger geological strata are more accessible and better studied, and younger fossils are better preserved. But by organizing the data differently and retaining more information about each taxonomic group, Alroy and his colleagues see an opportunity to correct or compensate for some of the biases. Of particular note, whereas Sepkoski recorded only the first and last known appearance of each genus, Alroy et al. attempt to keep track of every occurrence of an organism. This extra information allows sampling bias to be estimated and corrected.

Consider these hypothetical fossil records, where each dot represents a single occurrence of a fossil organism in one of nine labeled intervals:

Alroy.png

In both cases Sepkoski’s protocol would merely indicate that the taxonomic group originated in period 3 and became extinct in or after period 8. The new database records each time unit in which the fossil was found and, whenever possible, the number of occurrences per interval. This data might seem like superfluous detail. After all, if an organism was alive in periods 3 and 8, we can safely infer that it must have existed in periods 4, 5, 6 and 7 as well, whether or not fossil evidence has come to light. But it turns out that recording occurrences rather than just chronological ranges allows for some helpful statistical magic.

As I understand it, the scheme works something like this. Suppose we could gather together all the fossils ever collected by paleontologists, and sort them into bins according to age. Because of the various sampling and preservational biases, the bins for fairly recent periods (say 50 million years ago, in the Tertiary) would be much fuller than the bins for earlier times (say 400 million years ago, in the Devonian). Any bin with more specimens would be likely to exhibit more diversity as well, simply because rare organisms have a better chance of showing up at least once in a larger sample. But we can control for this bias through a simple subsampling procedure: Draw a fixed number of specimens from each bin, making each selection at random and with replacement. The counts of genera in the subsamples should reflect the true diversity of the biota in each bin.

In practice it gets more complicated than that, because we can’t actually sample the entire fossil record at the level of individual specimens; the best we can do is to randomly choose collections of fossils or the publications that describe them. And the publications vary greatly in how much quantitative data they include; some are just lists of species observed.

After many adjustments, refinements and calibrations, Alroy and 34 co-authors have published a diversity curve based on the subsampling technique:

Alroy.png

(Graph courtesy of John Alroy.)

Their article (subscription required) appeared last month in Science, along with 67 pages of supplementary material.

The Sepkoski and the Alroy graphs are twins separated at birth—widely separated. The overall upward trend still exists in the newer graph, but it is much less dramatic, especially in the past 100 million years. Some of the famous mass-extinction events, such as those at the end of the Permian (P) and at the end of the Cretaceous (K), are visible in the new graph but are altered in character; instead of a sudden crash after a sustained build-up, we see something more like a return to normal after a brief, sharp spike in diversity. (Alroy elaborates on the dynamics of mass extinctions in a second recent article, this one in PNAS.)

Looking at the two curves, I arrive at this question: How is the interested but nonexpert reader to evaluate these contrasting views of our planetary past? I want to emphasize that the question animating me is not “Who is right?” but “How can we know who is right?” Is there some way that the ordinary, scientifically literate outsider can form a reasoned judgment about such competing claims to truth?

It was questions like these that got me in trouble the last time I wandered into this area. In 2005 Richard A. Muller of the Lawrence Berkeley National Laboratory and Robert A. Rohde, a graduate student at UC Berkeley, published a report in Nature claiming to detect periodic cycles of rising and falling diversity in the Sepkoski data. Applying Fourier analysis to the time series, they reported finding a strong signal at a period of 62 million years and a weaker one at 140 million years. The claim was controversial from the start, and I decided to take a do-it-yourself approach to understanding the issue. I went back to the original data, reimplemented the analytic methods and tried to assess the robustness of the conclusion. I told the story in an American Scientist column.

The column pleased no one. It certainly didn’t please Muller and Rohde, who objected that I was out of my depth in my amateur attempt to replicate their work. It didn’t please the critics of the Muller-Rohde hypothesis, who thought my focus on certain narrow technical issues deflected attention from deeper conceptual flaws in the argument. And it didn’t please me, because I agreed with the criticisms from both sides.

I should also mention that my column had zero impact on the controversy, which not only continues to rage but has also been extended to the new database. Alroy writes in the PNAS article that some of the peaks and valleys forming the supposed cycles fail to materialize in the new data set. On the other hand, a preprint from Adrian L. Melott of the University of Kansas argues that cycles with periods of 62 and 150 million years emerge from the Paleobiology Database with higher statistical significance than they had in the Sepkoski collection.

All in all, I think I’ll sit this one out. I’ve been itching to get my hands on some records from the new database and implement the subsampling algorithm (which sounds both intriguing and readily accessible). It would be fun to play with these ideas. But I’ll let someone else have the fun this time.

Science builds its credibility on the bedrock idea that experiments and other kinds of results are subject to independent confirmation or refutation. And the advent of computational science has made this egalitarian ideal much more practical than it used to be. Although experiments in high-energy physics remain beyond the means of most amateurs, anything done with a computer rather than a particle accelerator is pretty much fair game these days. Still, there are bounds. If every reader set out to replicate every experiment, the world wouldn’t make much progress.