59

10 December 2008

It’s the custom among mathematicians that when you reach age 60, you’re put on an ice floe with a day’s supply of walrus meat and set adrift.

But I’m not a mathematician.

Computer scientists get a few years’ grace. Because they count in binary, for them the dreaded end of productive intellectual life is put off until age 26.

I’m not a computer scientist either. I’m a writer. This is no dispensation, however; the literary world also worships precocity. Byron said it: “Is there anything in the future that can possibly console us for not being always twenty-five?” I took an even harder line in my own youth. At age 16, when I ran away from home to write a novel, I vowed that I would be a famous author by 22 or find an ice floe and make an end of it.

And today I am 60–1, and I find I am not yet quite ready to be set out to sea. I look back to the poets who seemed so intensely young when I was young, and I find they have aged well. Here’s George Herbert:

Who would have thought my shrivel’d heart
Could have recovered greennesse? It was gone
Quite under ground; as flowers depart
To see their mother-root, when they have blown;
Where they together
All the hard weather,
Dead to the world, keep house unknown.

. . .

And now in age I bud again,
After so many deaths I live and write;
I once more smell the dew and rain,
And relish versing: O my onely light,
It cannot be
That I am he
On whom thy tempests fell all night.

Cows, Colleges and Contentment

7 November 2008

It must be something in the milk. For expository writers who specialize in mathematical subject matter, there’s a strange attractor in Northfield, Minnesota. That’s where Lynn Arthur Steen and his colleagues produced “telegraphic reviews” of mathematical writing for many years. Math Horizons, the student math magazine published by the MAA, was edited there for a while by Steve Kennedy and Deanna Haunsperger. Barry Cipra lives there. And I used to live there too.

I’ll be making a pilgrimage back to my old hometown next week. Thanks to the generosity of Paul Zorn and the hospitality of Barry Cipra, I’ll be giving a talk at St. Olaf College Tuesday afternoon. Details here. And Rosalind Reid and I will be meeting with Mary Steen’s class of potential future math scribes.

Monthly spam update

2 November 2008

Wall Street may have had its worst month ever, but the spam market has been booming. As faithful readers know, I’ve been tracking my incoming spam for several years now. (If you’re utterly sick of reading about this strange obsession, please forgive me. Also please let me know.)

spamvoloct08.png

As the graph suggests, this past month has produced an extraordinary uptick. October shows a 50 percent increase over September, with a total of 7,506 messages. Both the rate of increase and the absolute number of messages set records. I’ve seen roughly a tenfold increase since the low point back in May of 2007 (when I believed that we might soon be seeing the end of the spam phenomenon). The preponderance of Russian fare in my spam diet also continues: 4,694 of the month’s messages (or 62.5 percent) used a character encoding for the Cyrillic alphabet.

(Earlier reports: 2003 American Scientist article, 2007 American Scientist article, June bit-player item, October bit-player item.)

Update 2008-12-03: Well, it looks like the spam market has finally crashed. After five months of accelerating growth, my spam intake collapsed in November. I got a total was 3,492 messages, less than half the October count and sending me back to levels not seen since June.

graph of spam volume, Jan 2007 to Nov 2008

The proportion of Russian spam has also fallen, though only slightly: 1,575 of the messages (45 percent) arrived with a Cyrillic character encoding.

News reports suggest one possible reason for the sudden downturn in spam volume: On November 11 an ISP named McColo was isolated from the rest of the Internet. Unidentified customers of McColo are accused of using servers there to control spam-spewing botnets. Can this event account for the lighter load on my spam filter? Here’s the daily log:

daily spam volume nov 2008

There is indeed a noticeable dropoff after the 11th. However, it’s worth noting that volume was already well below the October average of about 240 messages per day.

Update 2009-01-01: The volume of spam in my inbox is falling faster than the price of light sweet crude. My December total is down more than 60 percent since the October peak, to 2,774 messages. The graph below gives a wrap-up for a full two years.

spamvolume20090101.png

What drives these tides in the flow of Vi@gra ads and offers of replica \/\/atches? The cycles of boom and bust are even more mysterious than those of the global economy. All I can say about this event is that it appears to be a general downturn, not a collapse of some particular category of spam. In particular, the distribution of languages remains fairly stable. In December 49 percent of my spam carried one of the Cyrillic character encodings, so somebody out there still thinks I’m a likely customer for Отчаянные домохозяйки.

The Chromatic Number of Liechtenstein

28 October 2008

Four colors suffice for any planar map: We’ve known that since 1977. If a map is divided into countries or provinces or other regions, and you want to color the map so that no two adjacent regions have the same color, you’ll never need more than four crayons. But there are a couple of definitions that have to be accepted if this theorem is to hold. First, “adjacent” means that two regions share a boundary segment, not merely a point or a discontinuous set of points. Second, a “region” has to be a connected area, so that you can reach any point in the interior from any other such point without crossing a boundary.

A few days ago the strange and wonderful Strange Maps blog called attention to this map of the Principality of Liechtenstein:

382px-Liechtenstein-admin.png

(The map comes from the Wikimedia Commons Atlas of Liechtenstein, which has a larger image.)

It seems that Liechtenstein is divided into 11 communes, which emphatically do not satisfy the connectivity requirement of the four color map theorem. Just four of the communes consist of a single connected area (Ruggell, Schellenberg and Mauren in the north, and Triesen in the south). All the rest of the communes have enclaves and/or exclaves. (I confess that I didn’t know the distinction between these words until yesterday. They make a nice pair.) The most fragmented commune is Vaduz, which includes the principality’s capital. Vaduz consists of six isolated pieces; the smallest is a sliver about a kilometer northeast of the town of Planken.

In the map above, each commune is assigned its own color, and so we have an 11-coloring. It’s easy to see we could make do with fewer colors, but how many fewer? I have found a five-clique within the map; that is, there are five communes that all share a segment of border with one another. It follows that a four-coloring is impossible. Is there a five-coloring? What is the chromatic number of Liechtenstein?

Net work

25 October 2008

I’m afraid. I’m afraid, Dave. Dave, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I’m a… fraid.

I’m about to dig into the innards of the software that runs this blog (updating WordPress from 1.5 to 2.6, installing some plugins, fiddling with CSS and PHP). So, just in case I can’t get the pod doors open when I’m done, I want to say it’s been fun, and thanks for tuning in.

Update: It seems I’m still here. Much remains to be done, but I’m hoping the site will remain functional while under reconstruction. Comments and complaints are welcome.

Update 26 October 2008: It’s like gardening, or dentistry, or cleaning out the attic: Once you start mucking about, you notice more and more that needs attention, and there’s no end to it. It’s clear now that I’m going to be fussing with the presentation of this site for weeks to come. Expect something different every time you visit. Those who read via RSS will probably be seeing the same posts flagged as ‘new’ over and over again. Sorry about that. (Does anyone know a way to avoid it, other than shutting down the feed altogether?)

Anyway, don’t be shy about letting me know what doesn’t work or what looks awful. One of the main motives for this housecleaning was to provide a less-lousy interface for leaving comments. I’ve already experimented with three approaches to that problem, and I’m not terribly happy with any of them. I’m likely to change again. If you’d like to test the system, this post would be a good place to leave witty remarks such as “Hello, world!” and “Testing, one, two, three.”

Here’s a specific question: All of the schemes for providing a preview of comment formatting rely on Javascript. I read RSS feeds with a program that doesn’t do Javascript (NetNewsWire 2.0). How about others? Will a Javascript solution actually solve anything?

The arity of equality

22 October 2008

The symbol “=” is the fulcrum of a balance. It declares that the things on either side of it, whatever they may be, are equivalent, identical, alike, indistinguishable, the same. [*]

The metaphor of the balance implies that exactly two things are being weighed, one against the other; thus equality is usually taken as a pairwise, or binary, relation. The other relational operators, ≠, <, >, ≤ and ≥, get a similar interpretation as comparisons of just two entities. The pairwise constraint is reinforced by the infix notation of mathematics and of many programming languages. When you write x=y or u>v, there’s no convenient place for more than two operands.

The restriction to two arguments no longer seems quite so natural and necessary when you begin thinking in a language with either prefix notation (Lisp) or postfix notation (Forth or Postscript). In Lisp an equality predicate is written (= x y), and there’s nothing to stop you from going on to write (= w x y) or (= w x y z). Furthermore, it seems easy to assign a meaning to such an expression: (= x y z) is true if and only if x, y and z are all equal. Similarly, (< x y z) is true if and only if x, y and z form a strictly increasing sequence.

If we can cope with comparisons of more than two arguments, what about fewer than two? What should we make of an expression such as (= x) or (< y)? How about relational functions with zero arguments, which would be written in Lisp as (=) or (<)? This question arose recently on a mailing list concerned with standards for the Scheme programming language (a dialect of Lisp). The message that initiated the thread was posted last Saturday evening; three days later, the discussion has spawned a couple of satellite threads and there are more than 75 responses.

It’s been a lively exchange, with quite a wide spectrum of views expressed. I can’t do full justice to the dialogue; if you’re at all interested in the issue, please read the original messages. Here I’ll just lay out some of the positions argued:

  • The concept of comparison is intrinsically binary; a comparison should have exactly two arguments and otherwise constitutes an error.
  • Comparisons with more than two arguments make sense, but not those with fewer than two.
  • Comparisons with one argument are okay, and likewise those with two or more arguments, but there’s no need or reason to allow zero arguments.
  • Comparison operators should be able to handle any number of arguments. Those invoked with zero arguments or one argument should return false.
  • Comparison operators should be able to handle any number of arguments. Those invoked with zero arguments or one argument should return true.

What’s the right answer? At the outset, I was leaning toward the ultraliberal, “anything goes” view, but now I’m not so sure.

On first thought, the extension from exactly two arguments to at least two arguments seems like a harmless convenience. After all, even infix notations allow expressions such as x=y=z or 1<2<3. In a straightforward way we could define the more-than-two-argument versions of the relational operators in terms of a strictly binary operator:

(= x y z)(and (= x y) (= y z))

Here we rely on the transitivity of equality to dispense with the third comparison, (= x z). If for some reason you don’t believe in transitivity, life is going to be hard for you. But even for those of us who make the leap of faith about x=z, there are some subtle traps here. For strictly binary comparison operators, we can always rely on a few logical identities:

(≠ x y) = (not (= x y))

(≤ x y) = (not (> x y))

(≥ x y) = (not (< x y))

With more than two arguments, these identities no longer hold. Consider (≤ 1 2 1) and (> 1 2 1), which are both false. In the case of = and ≠, it’s not even entirely clear what the semantics ought to be. Is (≠ 1 2 1) true or false? In other words, should the symbol ≠, when applied to more than two arguments, be read as “not all the same” or as “all different”? If we choose the latter interpretation, then we can no longer trust transitivity. The expansion of (≠ x y z) becomes (and (≠ x y) (≠ y z) (≠ x z)). (Note that the number of pairs to be checked grows quadratically with the number of arguments.)

Looking in the other direction—at comparisons of fewer than two arguments—a first question is why anyone would ever want to try such a thing. What’s the point of comparing just one number? Or no numbers? But the counterargument asks why we should forbid such operations. If an expression such as (< 3) or (=) can be assigned a meaning consistent with the semantics of the rest of the language, why impose an arbitrary limitation?

One way of approaching this issue is by analogy with other procedures or operators. Consider addition. In this case too, infix notation gives the impression that “+” ought to be a binary operator, but in a prefix language there’s no reason to enforce that restriction. An expression such as (+ 3 4 5) has an obvious meaning; it returns the value 12. Moreover, there’s no mathematical offense in declaring that (+ 3) evaluates to 3 or that the zero-argument form, (+), yields 0. This last result is justified by noting that 0 is the identity element of the integers under addition.

This reasoning leads to an admirably simple and spare definition of summation. Suppose we are given a binary addition operator, binary+, and we want to build a generic summation operator, list+, which can be applied to any number of arguments from zero on up. To avoid introducing extraneous details, I’m going to assume that the arguments come already wrapped up in a list, numlist, and that all the elements of this list are guaranteed to be numbers. The definition of list+ looks like this:

(define (list+ numlist)
  (if (null? numlist)
      0
      (binary+ (car numlist)
               (list+ (cdr numlist)))))

The definition says that the sum of a list of numbers is 0 if the list is empty and otherwise is equal to the first element of the list plus the sum of the remaining elements. The simple recursive structure of this procedure depends crucially on applying the same rules to a list of any length, including a length of zero. Trying to impose restrictions on the number of arguments would just clutter up the code with needless tests and exceptions.

It’s easy to see that we can deal with multiplication in exactly the same way, except that the identity element is 1 rather than 0. Greatest-common-divisor and least-common-multiple allow a very similar treatment, with the zero-argument (gcd) yielding 0 and (lcm) returning 1.

The logical connectives and and or also lend themselves to this kind of implementation. Given the primitive operation binary-and, we can define an n-ary version as follows:

(define (list-and booleanlist)
  (if (null? booleanlist)
      #t
      (binary-and (car booleanlist)
                  (list-and (cdr booleanlist)))))

Here the argument is a list of booleans. In the base case, applying and to an empty list returns #t (the Scheme token for true) as an identity element; otherwise we invoke list-and recursively on the tail of the list. For the corresponding or procedure, the base case is #f, or false.

It seems we have a pattern—a general method for converting a binary operator into an n-ary one. But does it always work?

Think about max and min, which select the largest and the smallest of their arguments. It looks as if we could fit them into the framework constructed above for “+” and for and—but there’s a problem. What is the base case for the recursion? There’s no doubt that (max 1 2) is 2, and no one would object to saying that (max 1) is 1, but what numerical value is appropriate for the zero-argument form (max)? Mathematically, I suppose the best answer is negative infinity. Some versions of Scheme include a representation of this number, but they don’t return it as the value of (max), and as a matter of practical programming, I don’t think it would be a good idea for them to do so. In fact, both Scheme and Common Lisp require max and min to have at least one argument. (They apply the same rule to subtraction and division.)

Getting back to the relational operators, it’s interesting to note that Common Lisp treats them the same as max and min: They work for one or more arguments but raise an error if invoked on zero arguments. No rationale for this choice is given in the standards documents. The Scheme standard specifies two or more arguments for the relationals, but apparently some implementations of the language also allow zero arguments or one argument, returning true in those cases.

The two dialects also differ in how they deal with the ambiguity of (≠ x y z). Common Lisp declares that “≠” means “all different,” accepting the oddity that “≠” is not the same as “not =”. Scheme has a slyer way of resolving the conflict: The language simply doesn’t offer a “≠” predicate.

I suppose this whole issue is one that can safely be left to the language lawyers. The arity of equality is not likely to become a major issue in software engineering, and I’d be wary of any program whose behavior depends critically on whether (=) is true or false. What intrigues me is simply that issues like these can remain unresolved and the subject of lively debate in spite of the best efforts of very smart people to find right answers. Lisp is celebrating its 50th birthday, and Scheme has been around for roughly 30 years. The standards that define the languages are not slapdash documents; in particular, their treatment of numbers and arithmetic is unsurpassed in the world of programming languages. But it seems we still have to sweat the details.

QCD

19 October 2008

Lattice QCD is something I’ve been trying to understand for 30 years. My latest attempt is chronicled in the new issue of American Scientist.

latticequarks.png

QCD is quantum chromodynamics, the theory of interactions between quarks and gluons. The lattice version of the theory appeals to those of us who like our physics in discrete, countable bits. In the lattice formulation, quarks and gluons exist only at the nodes of a four-dimensional spacetime grid. There’s no evidence that the universe really has such a rectilinear structure, but it turns out to be a useful fiction when you want to calculate things like the masses of quarks. In large measure we are all made of quarks, and so it seems like a good idea to know some basic facts about them.

In my quest to figure out what lattice QCD is all about, I’ve had a lot of help. Years ago, as an editor of magazine articles, I had the splendid opportunity to get one-on-one tutorials from Kenneth G. Wilson and from Claudio Rebbi. I received further help for this latest project, and for that I want to thank G. Peter Lepage of Cornell.

In a paper titled “Lattice QCD for Novices” (written and published in 1998 but posted on the arXiv in 2005), Lepage presented a simple Python program that illustrates some of the key ideas behind lattice QCD. The complete source code for the program is given within Lepage’s article, but to save readers the trouble of gathering the pieces and extracting them from a PDF, I have (with Lepage’s permission) made the program available here in the file qcd.py. I’ve also made some trivial changes for the sake of compatibility with more recent versions of Python.

Finally, thanks too to Massimo Di Pierro of Depaul University for a vizualization of lattice QCD in action.

Survey on computing in the sciences

17 October 2008

Do you create software for scientific computing, or use such software in doing research? Then my friend Greg Wilson would like to hear from you. Together with colleagues from the University of Toronto, Simula Research Laboratory and the National Research Council of Canada, he is conducting a survey on practices in scientific computing. Greg plans to report the results next year in an American Scientist article.

Demaine event

16 October 2008

“Between The Folds” uncovers the stories of ten fine artists and intrepid theoretical scientists who abandoned careers and scoffed at hard-earned graduate degrees—all to forge unconventional lives as modern-day paperfolders.

Perhaps such a film should be kept from impressionable youth, lest more degree-scoffers be lost to a life in the creases. Nevertheless, the one-hour documentary, written and directed by Vanessa Gould, is showing online this weekend, today through Sunday only, in connection with the Hamptons International Film Festival. Several of the featured folders come from the world of science and math: Robert Lang, Tom Hull, Erik Demaine, and Erik’s father Martin Demaine.

ErikDemaine1877.jpg

If you’d like still more cinematic experience of Erik Demaine, his recent talk at the opening symposium for the Microsoft New England research center is now online, where he performs magic tricks as well as showing off origami. (But the video is in a Windows-only format—what’s with that?)

At right Erik makes a hand-waving argument while blindfolded.

Thanks to Barry and Ros for tips on these items.

Spam by the numbers

4 October 2008

Reviewing this month’s batch of incoming junk mail, I stumbled upon the following message:

numberspam440.png

In case that image is too tiny to read, here is the first word in source-code form:

     28    47   34
     74    33
      85  42
      16  43    25    5048     08124   8813    2714
      34  02    25       66   50  31   855        05
       3404     65    88362   00  25   72      01651
       8008     36   42  77   27  81   06     04  40
        72      83   02  32   47  12   24     87  33
        78      03    87100    83844   18      21813
                                  08
                              73634

The basic technique is anything but novel. I can remember green-and-white-striped printouts that had my name emblazoned in the same kind of two-inch-high characters. But why are the characters here formed entirely out of numbers, rather than other ASCII glyphs? And do the numbers themselves mean anything?

I think I know the answer to the first question: The spammer thought a message composed of nothing but numerals might slip through the spam filters. (In my case, at least, it didn’t work. I fished this message out of the garbage pail.)

As for the second question, my immediate guess was that the digits are the output of some simple pseudo-random number generator. That would be an easy way to produce them, and it would also allow the spammer to make each individual message unique. On taking a closer look, however, I realized there was something quite nonrandom about the numbers in the message.

Here is the full list of digits. There are exactly 900 of them. Do you see what’s missing?

284734807433341016202332628542642574418481303116432550480812488
132714721846667434022566503185505580464271163634046588362002572
016511712427000046735580083642772781060440148383627872830232471
224873301464000807803871008384418218130077346262602008225346571
155727363470732323181618223162744253246331737038301533254837881
148802160371074555632302255640217448457046416116253484658726108
147181540061231788804563557807254177278106044014838362787283023
247122487330146400080780387100838462042135220046847482422143746
770236783058460185444521283134537306537546855305024142275437615
010235002438258320577785451436776143066166025853832747551576004
831136831376228235381112678466011047530048032816623514158481030
413446024450055236762111281250031205166204213522004684748242214
374677023678305846018544452128313453730653754685530502414227543
761501023500243825832057778545143677614306616602585383274755157
600483113683137622

There’s nary a 9 in the bunch. And in other respects too the digit distribution looks slightly off-kilter:

digitdist.png

When I tabulated all the correlations between successive digits, that too looked a little fishy, although the sample is too small for any reliable conclusions.

                   s e c o n d
           0  1  2  3  4  5  6  7  8  9
      0   23 12 20  9 17  9  7  7  8  0
      1   11 13 11 12 16  8 13  5 10  0
   f  2   11 11 13 15 14 15  6 14  9  0
   i  3   18 13 15  7 11  8 13 13 12  0
   r  4    8  9 12 13 12 10 22 11 18  0
   s  5   11  7  5 14 12 14  4 10 11  0
   t  6   12 10 15  6  8  7 10 10  6  0
      7    6 10  9 10 12  7  9 11 14  0
      8   12 14  8 24 13 10  0  7  6  0
      9    0  0  0  0  0  0  0  0  0  0

So what’s going on here? I think the pseudo-random generator is still a leading candidate, though it would have to be a badly implemented RNG. The absence of 9s isn’t hard to explain: We only have to suppose that the spammer was working in C and wrote the plausible-looking expression random(9), thinking that would generate integers between 0 and 9.

On the other hand, maybe it isn’t random. Maybe there’s a secret message-within-the-message. Anybody see a pattern?

While I’m talking spam, I’ll update my ongoing tally of my inbox contents. I can report that September was a good, strong month for spam, with further steady growth continuing the summer-long trend. The stock market is in retreat and credit is tight, but the purveyors of replica watches are undeterred. My receipts have crossed the 5,000-per-month threshold for the first time:

spamcounts.png

And another threshold has also been left behind: For the first time this month, more than half of my spam is written in Russian. (Based on character-set declarations, 2,858 messages out of 5,021 were in Cyrllic scripts, or about 57 percent.)

Update 2008-10-12: In response to a request in the comments, I’ve uploaded the full text (including headers) of the original email. The file is here. Incidentally, I’ve searched my spam archive for other messages like this one, without success. That in itself makes this a peculiar spam. Usually, if I get a spam once, I see dozens of copies or variants within a few days.