Archive for the ‘mathematics’ Category

Nautical numeration

Tuesday, September 8th, 2009

I’ve been goofing off in Nova Scotia for a few days. In Halifax I climbed up to the Citadel, a hilltop fortress built to protect the city from the French and later rebuilt to fend off the Americans; now it welcomes both those nationalities and anyone else willing to pay $12 for the tour and the costume show.

signal-balls-vert.jpg In one room of the museum I discovered the curious diagrams reproduced at right. These figures are extracted from a poster on military codes designed for ship-to-shore communication. Signals were sent by hoisting large balls or disks on a mast reaching high above the fortress ramparts; ships in the harbor could reply by raising similar signal disks on their own masks.

The part of the code shown here obviously pertains to the transmission of numbers, but I’ve never seen a weirder system of numeration. Based on these diagrams and others, I infer that the signal disks were raised on three halyards. (A fourth halyard is present in the diagrams but always seems to be empty.) Each halyard could display zero or one or two disks, and each displayed disk could be in either an upper or a lower position. That comes to five possible patterns per halyard, and thus 53 = 125 patterns in all. What baffles me is why the ten decimal digits were assigned to the specific patterns shown in the diagrams. Who counts in the sequence 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 9, 9?

I’m not even sure I understand the basic signaling protocol. When I first looked at the poster, I assumed that the ten digits were represented thus:

positional-encoding.png

Presumably, the three encodings for ‘9′ were equivalent, and any of them could be used at the signaler’s option.

Later, after much musing over this puzzle, I decided that another interpretation of the diagrams seemed more plausible:

cumulative-encoding.png

In this scheme at least the numerals from 1 through 5 have some kind of logic to them, in that the number of disks shown matches the numeral being transmitted; in essence we have a unary notation within that limited range. On the other hand, if this interpretation of the diagrams is correct, we have to ask: Why did the designers stop at 5? They could have extended the same unary scheme to the range from 1 to 6, and then used doubled disks at the top of the mast for the numerals 7, 8 and 9.

If we were creating a code like this today, I’m sure everyone’s first impulse would be a system based on binary notation. It’s not terribly surprising that the British naval authorities circa 1850 didn’t think of that. But I still find the apparent arbitrariness of this numerical code utterly perplexing. Am I missing some pattern in the data, either obvious or subtle? (I suppose the encoding could be deliberately obscure, but there’s no evidence that this was meant to be a secret code, and the mere existence of the poster—which appears to be a 19th-century artifact—suggests otherwise.)

Added 2009-09-10: Below is the signal mask on which the disks were displayed, seen from the landward side. The disks (actually fabric-covered crossed hoops, which have an approximately circular cross section from any azimuth) were 36 inches in diameter. I think the diameter of the mast is about 12 inches at the base.

photo of Halifax military signal mast

* * *

The Googleverse has so far failed to solve this mystery for me, but in the process of searching I stumbled upon a remarkable book that suggests there was some lucid thinking in the 19th century on themes that we would now identify as information theory. The book is A Manual of Signals: For the Use of Signal Officers in the Field, and for Military and Naval Students, Military Schools, Etc., by Albert J. Myer. The full text is available on Google Books in either HTML or PDF.

Myer founded the U.S. Army Signal Corps just before the outbreak of the Civil War. He had studied medicine, writing a dissertation on “A New Sign Language for Deaf Mutes,” and had also worked as a telegraph operator. Then, in the army, he devised the signaling method commonly known as wigwag, using a single flag waved to the left and the right. Fort Myer in Virginia is named for him. (These biographical snippets come from a Signal Corps history, which offers much further detail.)

Myer’s book was first published in 1864 and then revised in 1868, but it takes quite a modern mathematical approach to problems of communication and information. He begins with a tutorial on permutations and combinations, then applies these ideas to the encoding of signals:

We wish for example to make a large number of signals…. We take any few different and simple known signs, sounds, motions, or indications, which we can easily make, and we join them together, twos or threes, or more at a time, making one after another into many and different and more complex signs or arrangements. Each of these new signs becomes, when a meaning is given to it, a signal. We can increase the number of such signals to any limit by continuing to join together the known signals in greater numbers or in new arrangements.

That’s a pretty good description of Σ*, the set of all strings over a finite alphabet.

* * *

Guest update 2009-09-10: The following comes from Barry Cipra.

I have an idea for an alternative interpretation of the way digits are meant to be represented by disks on halyards. My basic premise is that a distant viewer can tell the difference between a disk being up high and it being down low, but cannot accurately judge which halyard a disk is on unless there’s at least one disk on each halyard. (In particular, two disks on outer halyards could be confused with two disks on adjacent halyards if the mast is at an angle to the distant viewer.) In this case, the digits 0 through 9 can be unambiguously represented as follows:

disks-on-halyards encoding suggested by Barry Cipra

In other words, basically your second interpretation for 0 through 8, but with an extra “anchoring” pip for the numbers 1 through 5, but your first interpretation for the number 9. This makes it clear why the small-number format stops at 5.

If my premise is correct (or accepted as correct), then out of the 216 distinct patterns, there are only 1+5+25+125 = 156 unambiguous signals possible. That is, there is 1 signal with no halyards having any disks, 5 with exactly one halyard showing at least one disk, 25 with exactly two halyards showing at least one disk each, and 125 with disks showing on all three halyards.

But maybe we should also require the meaning of a pattern to be unambiguous when the orientation of the ship is uncertain — i.e., A,B,C should have the same meaning as C,B,A. If I’ve thought through things correctly, this cuts the number of unambiguous signals down to 1+5+15+75 = 96.

It would be great to find out how the Navy really did things!

* * *

Update 2009-09-17: Many thanks to Jim Ward (in the comments) for unearthing a 2007 document by Spurgeon G. Roscoe that illuminates the history of this signaling system, even if it doesn’t quite pin down how the code worked.

Roscoe describes a system of signaling towers founded by Edward, Duke of Kent, during his busy tenure as military commander in the Maritime Provinces at the end of the 18th century. Kent’s optical telegraph line was to run from Halifax to Annapolis in Nova Scotia and on to Fredericton in New Brunswick. It is not known with certainty how much of the network was completed or whether it ever served as a practical communications link. (Roscoe is skeptical, pointing out among other things that the territory around the Bay of Fundy is very foggy.)

The chart on exhibit at the Citadel in Halifax is not directly connected with the Kent telegraph system; it comes from a later era and is identified as a device for ship-to-shore communication; nevertheless, there is an unmistakable familial resemblance. Sketches reproduced by Roscoe give the following code system for the Duke of Kent’s overland telegraph:

numeric code scheme from Roscoe manuscript

This is merely a cyclic permutation of positions 0, 6, 7, 8 and 9 in the Citadel code. (On first glance there also seems to be a mirror reversal involved, but that’s not the case; we’re merely looking at the signal mast from the opposite direction.)

Roscoe says nothing about how to interpret these diagrams—whether, for example, the display for “6″ consisted of six disks or a single disk in the lower right corner, or one of the other schemes discussed above or in the comments. But his description of how messages were composed seems so impossibly cumbersome that it leaves me wondering if this plan was ever really put into practice. Apparently, alphabetic characters did not have codes of their own; each letter was encoded in a sequence of numerals. In one system, “A” was 2, 3, 0; “B” was 1, 4, 0; “Q” was 4; “T” was 2, 3. (The compact, single-digit encoding of “Q” makes this a kind of anti-Hamming code.) Can anything so maladaptive have survived the trial of use in the field?

Update 2009-09-22: This is a response to Carl Witty’s comment. I’m putting it here in the main text rather than in a further comment because I think Witty has essentially solved the mystery.

Witty’s interpretation of the Roscoe manuscript rules out all schemes in which the code is “cumulative” — where the signal for 3, for example, consists of the disks numbered 1, 2 and 3. Instead, digits 1 through 6 are each represented by a single displayed disk, and digits 7 through 9 each consist of a single pair of disks. (The triple code for 9 is to be interpreted as an “OR”: Any of the three positions can be used with the same meaning.) Then these digits — which are really just opaque symbols, with no numerical meaning — are combined in various ways to represent the letters of the alphabet and the numbers from 1 to 99. (Supplementary flags bring the range of numbers up to 499.)

Thus when Roscoe indicates that the letter S has the coding “1, 3, 5,” that means the disks labeled 1, 3 and 5 are all displayed simultaneously in order to transmit an S.

There’s a test for this hypothesis. If the scheme is to be workable, the alphabetic and numerical codes composed by displaying two or three or four digits at once must have a particular form. There can be no repeated digits in any encoding, and no two of the digits can use the same position on the mast. Specifically, in the Duke of Kent code illustrated in the 2009-09-17 update above, there can be no composite code that calls for both a 1 and a 7, or a 3 and an 8 or a 9 and a 5.

Does the code transcribed by Roscoe pass this test? Roscoe actually gives two codes, both apparently found in an 1802 “Signal Book” from Camperdowne Station in Nova Scotia. Roscoe’s second listing of alphabetic and numeric encodings does indeed satisfy the constraints. Indeed, it obeys a stronger restriction: No combination of symbols in the code calls for hoisting more than two disks on a single halyard. For example, the code avoids not only 1 and 7 but also 2 and 7.

The other code that Roscoe lists fails the test — or so it seems at first glance. There are patterns such as “1, 7″ encoding the number 54 and “3, 8″ for 64. But on looking closer, it appears that this code adheres to a different set of constraints: There are no instances where a 6 appears together with either a 1 or 2, or where 7 is combined with 3 or 4, or finally where 8 comes together with 5 or 0. These are exactly the restrictions that have to be applied in the Citadel code!

Outnumbered

Monday, August 10th, 2009

My column in the new issue of American Scientist is about the challenges of computing with very large numbers. The column ends with an open question, which I want to restate here as an invitation to commentary.

Any system of arithmetic in which numbers occupy a fixed amount of space can accommodate only a finite set of numbers. For example, if all numbers have to fit into a 32-bit register, then there are only 232 representable numbers. So here’s the question: If our numbering system can represent only 4,294,967,296 distinct values, which 4,294,967,296 values should we choose to represent?

There is an obvious trade-off here between range and precision: To reach farther out on the number line, we have to leave wider gaps between successive representable numbers. But that’s not the only choice to be made. We must also decide whether to sprinkle the numbers uniformly across the available range or to pack them densely in some regions while spreading them thinly elsewhere. I’m not at all sure what principles to adopt in trying to identify the best distribution of numbers.

To frame the question more clearly, I’d like to introduce a series of toy number systems. In each of these systems a number is represented by a block of six bits. There are 26=64 possible six-bit patterns, and so there can be no more than 64 distinct numbers. A function f(x) maps each bit pattern x to a number. Sometimes it’s convenient to regard the bit patterns themselves as numbers, ordered in the obvious way, in which case f(x) is a function from numbers to numbers.

In the simplest case, f(x) is just the identity function, and so the bit patterns from 000000 through 111111 map into the counting numbers from 0 through 63. A slightly more general scheme allows f(x) to be a linear transformation, f(x)=ax+b, where the parameters a and b are constants. If a=1 and b=0 we again get the nonnegative counting numbers. If a=1 and b=–32, we have a range of integers from –32 to +31. With a=1/64 and b=0, the entire spectrum of numbers is packed into the interval [0,1). Numbers formed in this way are called fixed-point numbers, since there’s an implied radix point at the same position in all the numbers. By adjusting the values of a and b, we can generate fixed-point numbers to cover any range, but they are always distributed uniformly across that range.

The well-known floating-point number system, modeled on scientific notation, is a tad more complicated. In my toy version, three bits are allocated to an exponent t, which therefore ranges from 0 to 7. The other three bits specify a fractional significand s, interpreted as having values in the range 8/16, 9/16, 10/16, …, 15/16. The mapping function is defined as f(t, s)=s×2t. The 64 numbers generated in this way range from 1/2 to 120. (Real floating-point systems, such as those defined by the IEEE standard, not only have more bits but also have a more elaborate structure, with allowance for positive and negative significands and exponents and various other doodads.)

Floating-point numbers are not distributed uniformly; they are densest at the bottom of their range and grow sparse toward the outskirts of the number line. Specifically, the density of binary floating-point numbers is reduced by half at every integer power of 2. In the toy model, the gap between successive representable numbers is 1/16 in the range between 1/2 and 1, then 1/8 between 1 and 2, then 1/4 between 2 and 4, and so on. At the top of the range—beyond 64—the only numbers that exist are those divisible by 8.

The distribution of floating-point numbers describes a piecewise-linear curve, which approximates the function 2x. In other words, if you squint and turn your head sideways, you can see that floating-point numbers look a lot like base-2 logarithms. This suggests another numbering scheme: Instead of approximating logarithms, why not use the real thing? We can simply interpret a binary pattern as a logarithm and generate numbers by exponentiating. For the six-bit model, we can take the first two bits as the integer part of the logarithm (the characteristic) and the last four bits as the fractional part (the mantissa). Assuming we continue to work in base 2, the number-defining function is just f(x)=2x. The resulting system of six-bit numbers has a range from 1 to about 235. (By simple rescaling, we could make the range of these logarithmic numbers match that of the floating-point numbers, 1/2 to 120. I’ve not done so for a trivial reason: So that in the graphs presented below the two curves do not overlap and obscure each other.)

Finally, I want to mention a fourth numbering scheme, called the level-index system. Indeed, what led me to this whole topic in the first place was an encounter with some papers on level-index numbering written in the 1980s by Charles W. Clenshaw and Frank W. J. Olver. The level-index system takes a step beyond logarithms to iterated logarithms and beyond exponents to towers of exponents. The idea is easiest to explain in terms of the mapping function f(x). As with logarithmic numbers, we view the bit pattern x as the concatenation of an integer part (the level) and a fractional part (the index). Then f(x) takes the following recursive form:

level-index-eqn.png

For any level greater than 0, this formula gives rise to a tower of exponentiations. For example, the level-index number 3.25 expands as follows:

tower-of-e.png

In the six-bit model, we can dedicate two bits to the integer level and four bits to the fractional index. The result is a numbering system that covers a very wide range: from 0 to almost 400,000 in 64 steps. The distribution of representable numbers in this range is extremely nonuniform: the gap between the two smallest numbers is 1/16, whereas that between the largest numbers is more than 320,000. (Note also that we have shifted from powers to 2 to powers of e. Actually, a level-index system could be defined on any base, but e has some pleasant properties.)

The graph below compares the distribution of numbers in the four six-bit systems.

linear-curves.png

For fixed-point numbers the graph is necessarily a straight line, since the numbers are distributed at equal intervals. The floating-point graph is a jointed sequence of straight-line segments, with the slope doubling at every power of 2. The logarithmic numbers produce a smooth curve, concave-upward. (Again, this curve could be made to coincide with that of the floating-point numbers.) The level-index curve is also smooth but has an even more pronounced elbow or hockey-stick form.

Plotting the same data on log-linear scales clarifies certain details:

log-curves.png

Now it’s the logarithmic system that yields a straight-line graph. The floating-point curve approximates a parallel straight line. The fixed-point graph is concave-downward, while the level-index curve is sigmoid.

By twiddling various knobs and dials, we could create a number system that would approximate any monotonic line or curve in this space, giving us fine-grained control over the distribution of numbers. For example, on the semilogarithmic graph a straight line of any positive slope can be generated by adjusting the base of the logarithmic number system or the radix of a floating-point system; a larger base or radix yields a steeper slope. The inflection of the level-index curve can be altered in the same way, by choosing different bases. If we wished, we could interpolate between the fixed-point curve and the logarithmic curve by creating number systems in which f(x) is some polynomial, such as x2. If we wanted a flatter curve than the fixed-point system, we could build numbers around a function such as f(x)=√2.

So many ways of counting! Let me count the ways.

And so I return to the main question. On what basis should we choose a number system for a digital computer? Let’s leave aside practicalities of implementation: Suppose we could do arithmetic with equal efficiency using any of these schemes. Which of the curves should we prefer? There are plausible arguments for allocating a greater share of our numerical resources to smaller numbers, on the grounds that we spend most of our time on that part of the number line—it’s like the middle-C octave on the piano keyboard. That would seem to favor the logarithmic or floating-point system over fixed-point numbers. But can we make the argument quantitative, so that it will tell us the ideal slope for the logarithmic line, and hence the mathematically optimal base or radix? (Does Benford’s law help at all in making such a decision?)

And what about the level-index system, which skews the distribution even further, providing higher resolution in the neighborhood of 1 in exchange for a sparser population out in the boondocks? An argument in favor of this strategy is that overflow is a more serious failure than loss of precision; the choice is between a less-accurate answer and no answer at all. Level-index numbers grow so fast that they can effectively eliminate overflow: Although it’s a finite system, you can’t fall off the end of the number line.

Some years ago Nick Trefethen, a numerical analyst, wrote down some predictions for the future of scientific computing, including these remarks:

One thing that I believe will last is floating point arithmetic. Of course, the details will change, and in particular, word lengths will continue their progression from 16 to 32 to 64 to 128 bits and beyond…. But I believe the two defining features of floating point arithmetic will persist: relative rather than absolute magnitudes, and rounding of all intermediate quantities. Outside the numerical community, some people feel that floating point arithmetic is an anachronism, a 1950s kludge that is destined to be cast aside as machines become more sophisticated. Computers may have been born as number crunchers, the feeling goes, but now that they are fast enough to do arbitrary symbolic manipulations, we must move to a higher plane. In truth, however, no amount of computer power will change the fact that most numerical problems cannot be solved symbolically. You have to make approximations, and floating-point arithmetic is the best general-purpose approximation idea ever devised.

At this point, predicting the continued survival of floating-point formats seems like a safe bet, if only because of the QWERTY factor—the system is deeply entrenched, and any replacement would have to overcome a great deal of inertia. But Trefethen’s “two defining features of floating point arithmetic” are not actually unique to the floating-point system; logarithmic and level-index numbers (and doubtless others as well) also employ “relative magnitudes” and require “rounding of all intermediate quantities.” It may well be that floating point is the best general-purpose approximation idea ever devised, but I’m not convinced that all the alternatives have been given serious evaluation.

Free Pi

Two further notes. Because of space constraints, my American Scientist column appeared with a painfully truncated bibliography. I have posted an ampler list of references here.

Also, would anyone like to have 10 million digits of pi on microfiche cards? While scrounging in my files for interesting lore on large numbers, I came upon a yellow envelope in which 10,015,000 digits of pi fill up nine fiche cards. The envelope bears the notation “rec’d from Y. Kanada, Univ. of Tokyo, 1983.” The 10 million digits set a record back then, which I noted in a brief Scientific American news item. It seems a shame to destroy such a curious artifact; on the other hand, I can think of no earthly use for it.

Kanada’s group has gone on to calculate more than a trillion digits of pi, but he didn’t send me the 900,000 microfiche cards it would take to hold the listing.

Not up to norm

Saturday, June 20th, 2009

I have a new column out in American Scientist, on “compressive sensing”:

When you take a photograph with a digital camera, the sensor behind the lens has just a few milliseconds to gather in a huge array of data. A 10-megapixel camera captures some 30 megabytes—one byte each for the red, green and blue channels in each of the 10 million pixels. Yet the image you download from the camera is often only about 3 megabytes. A compression algorithm based on the JPEG standard squeezes the file down to a tenth of its original size. This saving of storage space is welcome, but it provokes a question: Why go to the trouble of capturing 30 megabytes of data if you’re going to throw away 90 percent of it before anyone even sees the picture? Why not design the sensor to select and retain just the 3 megabytes that are worth keeping?

The answer to this question leads to some ingenious mathematics and technology—but I’ll leave all that for the column itself. Here, regrettably, I need to try to patch up a few soft spots in my telling of the story. To make the best of the situation, perhaps I can find something interesting to say about my mistakes.

Here’s the first trouble spot. I wrote:

If we have N equations in N unknowns (and if the equations are all distinct) the system is certain to have a unique solution.

The equations in question are linear, and they have coefficients restricted to the set {0,1}. In my first draft, the parenthesis read “(and if the equations are all linearly independent),” but I wanted to avoid that bit of jargon, and I persuaded myself that because of the restriction on the values of the coefficients, distinct equations would necessarily be linearly independent. That’s wrong. Take the system:

\[ \begin{split}
0x + 0y + 1z& = 2 \\
1x + 1y + 0z& = 3 \\
1x + 1y + 1z& = 5 \\
\end{split}\]

We have three distinct equations in three unknowns with all coefficients in {0,1}, but the system is not linearly independent because the third equation is the sum of the first two. And the system has infinitely many solutions (namely all those that satisfy \(x + y = 3).\)

So I’m left wondering: Is there some way I could have explained the point correctly without a long digression on linear independence, the rank of a matrix and other niceties of linear algebra? Peter Renz, who pointed out my error in the first place, suggested this phrasing: “What is needed is that each equation adds information to the whole, so that no equation can be derived from the others.” I like that way of putting it, but on the other hand it doesn’t tell you how to look at a system of equations and determine whether each one adds information. Still, maybe it’s the best we can do.

The second expository pothole reads as follows:

For each vector element \(x\), the least-squares rule calculates \(x^2\) and then sums all the results. The search for a sparsest vector can be framed in similar terms, the only change being that \(x^2\) is replaced by \(x^0\). The zeroth power of \(0\) is \(0\), but for any other value of \(x\), \(x^0\) is equal to \(1\).

A reader noted that he had been taught—and had gone on to teach others—that \(0^0\) is an “indeterminate form,” without a definite value. And indeed it’s true: \(0^0 = 1\) if we take the expession as the limit of \(x^0\) as \(x\) approaches \(0\), but \(0^0 = 0\) if we view it as the limit of \(0^x\) as \(x\) goes to \(0\) from above. Given this discontinuity, I should not have declared so glibly that \(0^0 = 0\), as if there could be no controversy about it. Instead I could have phrased it along the lines of, “In this context it’s convenient to adopt the convention that \(0^0\) is equal to \(0\).”

But, on looking at the situation a little more closely, the whole business seems really murky. Here’s a passage from Concrete Mathematics, by Graham, Knuth and Patashnik:

Some textbooks leave the quantity \(0^0\) undefined, because the functions \(x^0\) and \(0^x\) have different limiting values when \(x\) decreases to \(0\). But this is a mistake. We must define

\(x^0 = 1\) for all \(x\),

if the binomial theorem is to be valid when \(x=0, y=0\), and/or \(x=-y\). The theorem is too important to be arbitrarily restricted! By contrast, the function \(0^x\) is quite unimportant.

In spite of this authoritative pronouncement, however, workers in certain other fields adopt the opposite convention. The tangled sentences that got me into this pickle were written an attempt to explain—again without introducing some hard-core terminology—the difference between the \(L_2\) and the \(L_0\) vector norms. And it appears that the usual definition of the \(L_0\) norm just doesn’t work unless \(0^0 = 0\). But I had never bothered to think clearly about this until my reader raised the question.

Suppose we’re given the four-dimensional vector \(\mathbf{x} = (2,0,1,5)\), and we’re asked to define its length \(\|\mathbf{x}\|\). The most familiar definition of length is the Euclidean “square root of the sum of the squares” formula, which corresponds to the \(L_2\) norm:

\[\|\mathbf{x}\|_2 = (2^2 + 0^2 + 1^2 + 5^2)^\frac{1}{2} \approx 5.477.\]

The \(L_1\) norm is defined in the same way but with first powers rather than second powers:

\[\|\mathbf{x}\|_1 = (2^1 + 0^1 + 1^1 + 5^1)^1 = 8.\]

The \(L_1\) norm is really just the simple sum of the vector components. In the same way that the \(L_2\) norm measures Euclidean distance, the \(L_1\) norm implements the Manhattan, or taxicab, metric.

Following the same scheme, we can invent lots of higher norms. Here’s the length of our example vector in the \(L_5\) norm:

\[\|\mathbf{x}\|_5 = (2^5 + 0^5 + 1^5 + 5^5)^\frac{1}{5} \approx 5.011.\]

We can generalize this idea in a formula that defines the \(L_p\) norm for any positive integer \(p\):

\[\|\mathbf{x}\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^\frac{1}{p}.\]

As \(p\) increases, the \(L_p\) calculation gives greater and greater weight to the larger components of the vector. By taking the limit as \(p \to \infty\), we can even define the \(L_\infty\) norm, which is essentially a max operation—only the largest element contributes. For our example vector, \(\|(2,0,1,5)\|_\infty = 5\).

What about going in the other direction, to \(p = 0\)? In a fuzzy-headed, hand-waving way, it’s clear that what we want in this case is the sum of the zeroth-powers of the vector components. In other words, we simply want to count the components, ignoring differences in their magnitudes. Unfortunately, though, we can’t just plug \(p=0\) into the formula. That would give:

\[\|\mathbf{x}\|_0 = \left( \sum_{i=1}^n |x_i|^0 \right)^\frac{1}{0},\]

which just won’t do. The most serious problem, of course, is the \(\frac{1}{0}\) exponent, but there’s also the question of what meaning to assign to \(0^0\). As I understand it, the communities of workers who care about using the \(L_0\) norm solve these problems by introducing a special-case definition such as this:

\[\|\mathbf{x}\|_0 = \sum_{i=1}^n |x_i|^0,\]

along with the further proviso that \(0^0 = 0\). Accordingly, \(\|\mathbf{x}\|_0\) is a count of the nonzero components of the vector \(\mathbf{x}\). For our running example,

\[\|\mathbf{x}\|_0 = (2^0 + 0^0 + 1^0 + 5^0) = (1 + 0 + 1 + 1) = 3.\]

I guess this works, but if we need a special case anyway, wouldn’t it be easier just to define the \(L_0\) norm directly as the count of the nonzero elements? In formal terms this could be done using some sort of delta function instead of \(x^0\). Taking this approach, the \(0^0\) issue would never enter the case—and maybe I’d have a better chance of staying out of trouble.

Treats Tropiques

Wednesday, June 17th, 2009

I’ve let the tropical mathematics bandwagon pass me by. It’s not that I’ve been unaware. I noticed the stream of preprints, the special session at a joint mathematics meeting a few years back, workshops in Palo Alto and Moscow, upcoming events in Montreal and Berkeley. I’ve noticed all this, but I’ve been resisting it. Maybe it was the term “tropical” that put me off—a little too whimsical, with too-deliberate designs on my curiosity. But now the tropical craze has made it to the cover of Mathematics Magazine, with a lead article by David Speyer and Bernd Sturmfels (preprint version available here). I guess it’s finally time for me to figure out what the fuss is all about. The notes below—a work in progress—trace my efforts to do so.

MathMag-cover-img0514.jpg

For starters, what is that word “tropical” supposed to mean? Speyer and Sturmfels explain: “The adjective tropical was coined by French mathematicians, including Jean-Eric Pin, in honor of their Brazilian colleague Imre Simon.” Pin, in a 1998 paper (.pdf), deflects the credit to another French mathematician, Dominique Perrin, again noting that the name honors “the pioneering work of our brazilian colleague and friend Imre Simon.” Simon himself, in a 1988 paper (.ps), attributes the term to yet a third French mathematician, Christian Choffrut. Apparently, no one wants to lay claim to the word, and I can’t entirely blame them. Speyer and Sturmfels go on: “There is no deeper meaning in the adjective ‘tropical’. It simply stands for the French view of Brazil.”

The fundamental objects in the tropical world are the real numbers \(\mathcal{R}\) augmented with \(\infty\), and two operations on those numbers, tropical addition, denoted \(\oplus\), and tropical multiplication, \(\otimes\). Tropical addition is simply the minimum operation; that is, a \(\oplus\) b is equal to the lesser of a and b. For example:

\[ \begin{split}
8 \oplus 3& = 3 \\
-8 \oplus 3& = -8 \\
x \oplus \infty& = x
\end{split}\]

Tropical multiplication is equivalent to addition in conventional (non-tropical) arithmetic: a \(\otimes\) b = a + b. Some examples:

\[ \begin{split}
8 \otimes 3& = 11\\
-8 \otimes 3& = -5\\
x \otimes \infty& = \infty .
\end{split}\]

So that’s the key idea in a nutshell: plus is min and times is plus. This looks like a fairly strange and arbitrary reshuffling of definitions. First we get rid of addition, then we bring it back but call it multiplication. Why not just leave addition alone and redefine multiplication as min? There is a reason for doing it the other way around. In tropical arithmetic, the min operation \(\oplus\) plays the same role that addition plays in ordinary arithmetic, and \(\otimes\) has the same role as multiplication. In particular, consider the distributive law:

\[ a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c) .\]

This rule holds correctly when \(\oplus\) is min and \(\otimes\) is +, but it wouldn’t work if the definitions were swapped. The commutative and associative laws are also valid in tropical arithmetic. Speyer and Sturmfels point out that the system even fulfills “the Freshman’s Dream”:

\[ (a \oplus b)^{3} = a^{3} \oplus b^{3} .\]

On the other hand, there’s trouble with subtraction. No value of \(x\) can satisfy an equation such as \(x \oplus 2 = 5\). Because of this asymmetry, the algebraic structure of tropical arithmetic is classified as a semiring (whereas ordinary arithmetic on \(\mathcal{R}\) is a ring). Here’s another bit of jargon for jargon’s sake: the tropical semiring is idempotent. This just means that \(a \oplus a \oplus a \oplus … \oplus a = a\).

And now you ask: What’s it all good for?

As I understand it, Imre Simon and those French mathematicians who take a tropical view of Brazil were interested in the tropical semiring mainly in connection with automata theory. The idea goes something like this. A finite-state automaton can be represented as a directed graph, where the nodes of the graph are states of the machine and the edges are transitions between states. Suppose the edges are assigned numeric labels, and a path from an initial state to a final state accumulates the sum of all the edges traversed along the way. There may be multiple paths that connect the same initial and final nodes, and so it makes sense to ask which of these paths has the minimal sum. These questions are conveniently addressed in the algebra of tropical arithmetic, where the summation of the labels is the tropical product \(\otimes\) and the taking of a minimum is the tropical sum \(\oplus\). (Expressing the rules in a formal algebra rather than an ad hoc convention helps with certain proofs about the automata.)

Automata theory may be where it all began, but the current interest in tropical mathematics seems to have a different focus. The emphasis has shifted to algebraic geometry—an area of mathematics that I look upon with equal parts of fascination and fright. The main aim in this realm is to understand the solution sets of polynomial equations.

If you plot the values of an ordinary polynomial, you get a smooth curve. The graph below at right shows the simple quadratic function \(y=2x^2+x-1\) with its quadratic.png two real roots at \(x=-1\) and \(x=1/2\). What is the equivalent graph for a tropical polynomial? To construct such a beast, we first have to decide what \(2x^2\) might mean in a tropical context. In ordinary arithmetic, \(2x^2\) is shorthand for \(2 \times x \times x\), and so the obvious translation is \(2 \otimes x \otimes x\); translating back to conventional arithmetic, this expression becomes \(2 + x + x\), or in other words \(2x + 2\). Thus the quadratic term \(2x^2\) becomes a linear function, and the same transformation applies to higher powers also: In general \(kx^{n}\) is converted into \(nx + k\). tropical-quad.png Thus the exponent in the original equation becomes the integer coefficient that determines the slope of a line. The graph of the tropical polynomial is shown at left. The green line traces the complete function \(y=2x^2 \oplus x \oplus -1\); it is the union of segments and rays drawn from the three lines \(y=2x+2\), \(y=x\) and \(y=-1\). For each value of \(x\), the function takes on the least value of \(y\) that lies on one of these lines (the “lower envelope” of the lines). Graphs of all tropical polynomials share this same basic appearance: They are piecewise linear functions and they are always concave downward.

Just as the ordinary polynomial \(2x^2+x-1\) can be factored into the form \((x+1)(2x-1)\), the tropical polynomial \(y=2x^2 \oplus x \oplus -1\) has the unique factorization

\[2 \otimes (x \oplus -2) (x \oplus -1).\]

The constants –2 and –1 appearing in these factors correspond to the “bend points” in the graph of the function, and they play a role analogous to the roots of the ordinary polynomial. They are values of the variable \(x\) where two or more of the linear constraints are simultaneously satisfied. The locus of all such points is the solution set, the object of particular interest to the algebraic geometers.

For a polynomial in one variable, the solution set generally consists of isolated points. The situation gets hairier in higher dimensions, where the solution sets become algebraic “curves”—though they sure don’t look very curvaceous. Take the two-variable tropical polynomial function \(z(x,y) = x^2 \oplus y^2 \oplus 1\). The three terms of the polynomial define three planes in \(\mathcal{R}^3\), and the value of the function is the lower envelope of these planes:

xy-polynomial-3d.png

The “bend points” are now “bend lines” where the planes intersect. Projected onto the \(x,y\) plane, this tropical “curve” has a distinctive Y-shaped form. zxy-solution-set.png At right, the rays in red represent the solution set. They divide the plane into three regions, in each of which a different linear relation has minimal height; the rays themselves define the locus of points where two (or, at the origin, all three) of the terms are minimal. This trident motif is preserved in all tropical curves, not just this simple example. The illustration on the cover of Mathematics Magazine shows a more complicated graph, part of a proof of the tropical version of Bézout’s theorem. (The theorem states, with various caveats, that two algebraic curves of degree m and n intersect in mn points. If I understand correctly, it carries over fully from “temperate” to tropical mathematics.)

The mere idea of reconstructing all of arithmetic with different operations, and then building higher-level structures (algebra, geometry) on top of this new foundation—all that’s enough fun to justify the project. But it seems there are also practical applications. Speyer and Sturmfels discuss a biological problem: Given genetic sequences that define pairwise distances between species, construct a phylogenetic tree showing the evolutionary history of the species. Tropical arithmetic turns out to be useful in finding a tree where all the distances are metrically consistent—that is, they obey the triangle inequality.

That’s about as far as I’ve been able to get in my explorations of the tropical latitudes. But in closing I’d like to say another word about the naming of things in science and mathematics. My sober, fuddy-duddy opinion is that we’d be better off in the long run giving things plain, descriptive names, without too much metaphorical spin on them. Physicists once had a lot of fun with quarks and gluons, charm and strangeness, infrared slavery and ultraviolet freedom; but eventually the joke wears thin. I gather that biologists are now regretting the game of giving genes names like Sonic Hedgehog. Tropical mathematics may also get tired of explaining over and over again about “the French view of Brazil.” On the other hand, I have to admit that if all those papers and workshops had talked about min-plus algebra, they never would have caught my eye.

Update 2009-08-25: I’ve just learned that Imre Simon died August 12.

Alpha bravo

Sunday, May 17th, 2009

WolframAlpha, the new whatchamacallit from Stephen Wolfram, is up and running. It’s also stumbling: I’ve gotten a fair number of error messages. But now is not the time for nitpicky griping. I just want to say bravo to the whole idea.

I applaud because Alpha invites us to use a computer for actual computing. Did you know that this machine that brings you YouTube and Facebook can also evaluate integrals and calculate correlation coefficients? What a thought!

I have ranted elsewhere about the sad decline of casual, inquisitive programming, and the lack of tools that will draw more people into that mode of exploring their world. WolframAlpha is not exactly the answer to my plea. It is a powerful computational tool cleverly disguised as a mild-mannered search engine. You’re not writing programs; you’re just “searching” for the roots of x2 + x – 1 or the limit of (1 + 1/n)n as n goes to infinity or the sine of 1050. I have a feeling that this search-engine metaphor may wear thin pretty quickly, but it may also serve its obvious purpose—breaking down barriers for those who wouldn’t dream of writing a line of Python or Lisp but who are masters of Google.

WolframAlpha is not the first thing of its kind. (My Sage friends will be quick to point out that they have had an online notebook interface up and running for a couple of years.) I’ve only just begun to explore Alpha, and I’m sure there are both treasures and pitfalls I haven’t glimpsed yet. Still, for the moment, I just want to cheer it on.

Himalayan mathematics

Monday, April 20th, 2009

Browsing through some notes I jotted down sometime in 1988, I come upon this sentence:

Mathematicians feel about computers much as the Nepalese feel about jet aircraft.

Did I read that somewhere, or did I make it up? My notes give no clue to the provenance of the thought, and Google is unhelpful.

Bracketology: The Madness of March

Friday, March 13th, 2009

The American Institute of Physics has just sent me a press release, whose subject line I have borrowed for the title of this post. The release discusses the mathematics of an upcoming college basketball tournament:

Mathematicians who study problems like these use combinatorics which helps them determine exactly how many possible outcomes there could be. Starting with 64 teams with two possible outcomes for each team - a win or a loss - the number of possible outcomes for the tournament is a staggering 2 to the power of 64: that is, 2 multiplied by itself 64 times, or 18,446,744,073,709,551,616 possibilities.

Not being much of a sports fan, I hadn’t realized exactly how basketball tournaments work. Now that I understand, I find it cheering. I’m going to root for the outcome in which all 64 teams win.

Wrong number

Wednesday, February 25th, 2009

Is it just me, or does it happen to everyone? When I try to correct someone else’s arithmetic, I can count on making an error of my own. For instance: I was reviewing William Goldbloom Bloch’s clever and provocative book The Unimaginable Mathematics of Borges’ Library of Babel. I noted in passing a numerical error, trivial in significance even though it’s enormous in magnitude. Bloch had occasion to introduce the number:

blochnumber.png

To give a sense of this number’s size, he mentioned that it has 33 million digits. I pointed out that the true digit count is exponentially larger: 1033,013,740. Now I’ve received a correction to my correction, in a note from Michael Rosenthal:

The author [of the book] stated 33 million digits as the length, but Hayes getting closer states that the length is (10 to the 33,013,730). Isn’t the actual number of digits ((10 to the 33,013,730) + 1)?

Of course Rosenthal is right. There are 10n n-digit decimal numbers, but 10n is not one of them. On the other hand, Rosenthal also gives a double-barrelled demonstration of my point about the perils of correcting other people’s arithmetic: In the course of setting me straight, he makes a typographical slip of his own, twice mistranscribing 33,013,740.

My correspondence with Rosenthal provoked me to take a closer look at where that enormous number came from in the first place. This has led me into some diverting adventures with factorials and logarithms. But it has also put me in the position of proposing another correction to a published number, which means I am in grave danger of making still more mistakes.

Bloch’s book is a mathematical companion to “The Library of Babel,” a brief ficción by the Argentinian fabulist Jorge Luis Borges. In the Library of Babel all books are written in an alphabet of just 25 symbols. Each book has 410 pages, and each page has 40 lines of 80 characters, for a total of 1,312,000 characters per volume. We are told that the library possesses one copy of every possible volume composed in this way. Thus the number of volumes on the library’s shelves is 251,312,000. For convenience of reference let us call this number B (for Book, or Babel, or Borges, or Bloch).

The much larger number mentioned in my review enters Bloch’s story when he asks how many ways we might arrange the books on the library shelves. The answer, of course, is the factorial of B. If we imagine that the shelves have discrete slots that hold one book each, then we have B choices for the first slot, B–1 choices for the second slot, B–2 for the third, and so on, until we come to the final slot where there’s just one book left to choose. The product of all these numbers is B!.

So what sort of a number is B!? Can we have a look at it? In this age of computational wonders, surely we can just fire up Sage or Mathematica, or write a four-liner in Lisp, and have the digits of that number served to us on a silver platter, no?

No. We can’t.

After three decades of playing with computers, I am still childishly agog at the ease with which I can tap a few keys and calculate a gargantuan quantity such as 52!, the number of permutations of a standard deck of playing cards:

8065817517094387857166063685640376
6975289505440883277824000000000000

I can even calculate 10,000!, unleashing a Niagara of digits that spill out onto my screen in a long gray torrent—some 35,660 of them by the time the flood abates. (Digression: How many of those digits are trailing zeros? This is a question Fred Gruenberger would always ask when he talked about factorials in his old Popular Computing newsletter, back in the days when computing that many digits took a bit of doing.)

Computing has come a long way, but even with so much numerical horsepower at our fingertips, the direct calculation of B! is still totally unthinkable. If Bloch had been right about the 33 million digits, the computation might be feasible, if tedious, but computing a number with 1033,013,740 digits is just ridiculous, with or without the extra digit I forgot.

If we can’t compute B! outright, maybe we can at least get its logarithm. For a rough estimate of the magnitude—in other words, for counting the digits—the realm of logarithms is the right place to be. Since the factorial of n is a product,

factorial.png

the logarithm of the factorial has to be a sum:

logfactorial.png

I don’t know a quick and easy way to perform this summation, but I do know how to evaluate a slightly different sum, one where every term on the right hand side of the equation is log n. Since there are n of these terms, the sum is obviously just n log n. What we have calculated in this way is the logarithm of nn. Thus nn can be taken as a crude approximation—a very sloppy upper bound—to the value of n!. For our particular case we have log BB = B log B. When I plug in the numerical value of B, and use base-10 logarithms, I come up with

logbtob.png

Exponentiating both sides gives this answer:

btob.png

Hmm. There’s a problem here. This is supposed to be a loose upper bound, but it’s smaller than the value we were expecting for B!. Clearly, BB can’t be less than B!.

Bloch reports that he came to his estimate of B! via Stirling’s approximation. If you look up this bit of mathematical technology, you’ll find a formula something like this:

stirlingformula.png

It would be easy to translate the formula directly into computer code, but it’s pointless to do so if the aim is to calculate B!. Again the only practical way to proceed is to take logarithms of both sides. Then the product in the formula becomes a sum of three terms. The first term, log nn, we already know: It’s n log n. The second term, log e–n, is simply –n if we choose to work with natural logarithms; if we want to stick with base-10 logs, it’s –n/log 10. And the third term we don’t need to worry about; at the level of precision we can hope to achieve in these calculations, ½ log 2πn is too small to notice. So, keeping just the first two terms, we have a rough-and-ready approximation to Stirling’s approximation, which ought to work well enough for very large n. B certainly seems to qualify as a large n; when I plug in its value, here’s what I get:

blogbminusb.png

I have plodded my way slowly and methodically through the steps of this computation in an effort to persuade myself that I might finally have it right. At this point I’m fairly sure that log B! is closer to 101,834,103 than it is to 1033,013,740, but I’m chary of making a stronger claim. With all my crossing back and forth between the safe world of logarithms and the uninhabitable wasteland of exponentials, I worry that I’ve misplaced a factor of e or, worse, mistaken the logarithm of a number for the number itself. If I’ve goofed again somewhere in this calculation, I trust that readers will let me know.

I want to emphasize that my aim in telling this story is certainly not to embarrass Bloch, whose work I admire. His book does a splendid job of explaining ideas in combinatorics, number theory and even topology, in a literary context where such themes don’t get much exposure. His little numerical error—if it is an error—has no impact on the meaning of the passage where it appears.

Indeed, it’s hard to imagine a circumstance where the difference between the two proposed values of B! could have any earthly consequence. Both numbers are always going to be ghostly presences in our universe—actors whose voice we hear from offstage but who never show their face under the lights. A certain breed of radical constructivist would argue that the numbers we’re talking about don’t exist at all. I don’t find that doctrine helpful or persuasive, but I do think there’s a distinction worth making between numbers we can actually hold in our hands and these strange, outsized objects that come equipped with their own exclamation points. Computing with numbers this large is like working with toxic or radioactive materials. You never get to touch anything directly; all manipulations are performed by remote control from behind a blast shield. The experience can be exciting, but afterwards you don’t come away with a sense that you really know what you’re doing.

Fred Gruenberger would have wanted to know how many trailing zeros there are in B!. Can we answer?

Math, fonts, and HTML

Monday, February 16th, 2009

In the latest issue of American Scientist I write about the awkward and ugly business of trying to present mathematical typography on the web. All the while I was writing the column, I had constantly in mind the uncomfortable knowledge that the column itself—with all of its examples of equations and arcane symbols—would have to be prepared for presentation in HTML. The conversion went more smoothly than I expected, thanks to the energy and expertise of Anna Lena Phillips. Nevertheless, I recommend reading the PDF version.

One issue that arises in this story goes beyond the particular problems of making mathematical notation intelligible. Why is it that a web page can’t include the fonts that are needed to display the text properly? It’s routine for web documents to incorporate images and audio and video and various kinds of code that affect the display of information—CSS, JavaScript, etc. When it comes to fonts, however, the web server is not empowered to supply what’s needed; instead, it has to play a silly guessing game with the client computer. “Do you have ‘Times New Roman’?” the server asks. “No, well how about ‘Times’? ‘Georgia’? Okay, give me anything with serifs, please.” It would be easier for both parties if the server could just package up a font and send it along—or so it seems to me.

This is not a new idea. Most of the recent discussion of embedable or linkable fonts focuses on legal impediments, which seem quite manageable to me (though of course I’m not a lawyer, nor am I an owner of typeface copyrights). Are there other reasons to back away from such a proposal? The bandwidth required seems moderate by present-day standards, and there are obvious schemes for avoiding waste by sending only glyphs that are actually needed in a document and not already present on the client computer. Publishers and advertisers would doubtless abuse the facility with garish attempts to attract attention, but we survived the <marquee> and <blink> tags back in the nineties, and an attack of circus-poster typefaces seems no more obnoxious. Am I missing something obvious?

jsMath

Sunday, December 28th, 2008

A famous equation:

$$\nabla\times\mathbf{E}=-\frac{\partial\mathbf{B}}{\partial{t}}$$

Imagine that—mathematics on the Web!

Double-click for the LaTeX source. There’s more info in the tiny box you ought to find in the lower right hand corner of your browser window.

I’ll have more to say about all this in a few days.