Archive for the ‘mathematics’ Category

Not up to norm

Saturday, June 20th, 2009

I have a new column out in American Scientist, on “compressive sensing”:

When you take a photograph with a digital camera, the sensor behind the lens has just a few milliseconds to gather in a huge array of data. A 10-megapixel camera captures some 30 megabytes—one byte each for the red, green and blue channels in each of the 10 million pixels. Yet the image you download from the camera is often only about 3 megabytes. A compression algorithm based on the JPEG standard squeezes the file down to a tenth of its original size. This saving of storage space is welcome, but it provokes a question: Why go to the trouble of capturing 30 megabytes of data if you’re going to throw away 90 percent of it before anyone even sees the picture? Why not design the sensor to select and retain just the 3 megabytes that are worth keeping?

The answer to this question leads to some ingenious mathematics and technology—but I’ll leave all that for the column itself. Here, regrettably, I need to try to patch up a few soft spots in my telling of the story. To make the best of the situation, perhaps I can find something interesting to say about my mistakes.

Here’s the first trouble spot. I wrote:

If we have N equations in N unknowns (and if the equations are all distinct) the system is certain to have a unique solution.

The equations in question are linear, and they have coefficients restricted to the set {0,1}. In my first draft, the parenthesis read “(and if the equations are all linearly independent),” but I wanted to avoid that bit of jargon, and I persuaded myself that because of the restriction on the values of the coefficients, distinct equations would necessarily be linearly independent. That’s wrong. Take the system:

\[ \begin{split}
0x + 0y + 1z& = 2 \\
1x + 1y + 0z& = 3 \\
1x + 1y + 1z& = 5 \\
\end{split}\]

We have three distinct equations in three unknowns with all coefficients in {0,1}, but the system is not linearly independent because the third equation is the sum of the first two. And the system has infinitely many solutions (namely all those that satisfy \(x + y = 3).\)

So I’m left wondering: Is there some way I could have explained the point correctly without a long digression on linear independence, the rank of a matrix and other niceties of linear algebra? Peter Renz, who pointed out my error in the first place, suggested this phrasing: “What is needed is that each equation adds information to the whole, so that no equation can be derived from the others.” I like that way of putting it, but on the other hand it doesn’t tell you how to look at a system of equations and determine whether each one adds information. Still, maybe it’s the best we can do.

The second expository pothole reads as follows:

For each vector element \(x\), the least-squares rule calculates \(x^2\) and then sums all the results. The search for a sparsest vector can be framed in similar terms, the only change being that \(x^2\) is replaced by \(x^0\). The zeroth power of \(0\) is \(0\), but for any other value of \(x\), \(x^0\) is equal to \(1\).

A reader noted that he had been taught—and had gone on to teach others—that \(0^0\) is an “indeterminate form,” without a definite value. And indeed it’s true: \(0^0 = 1\) if we take the expession as the limit of \(x^0\) as \(x\) approaches \(0\), but \(0^0 = 0\) if we view it as the limit of \(0^x\) as \(x\) goes to \(0\) from above. Given this discontinuity, I should not have declared so glibly that \(0^0 = 0\), as if there could be no controversy about it. Instead I could have phrased it along the lines of, “In this context it’s convenient to adopt the convention that \(0^0\) is equal to \(0\).”

But, on looking at the situation a little more closely, the whole business seems really murky. Here’s a passage from Concrete Mathematics, by Graham, Knuth and Patashnik:

Some textbooks leave the quantity \(0^0\) undefined, because the functions \(x^0\) and \(0^x\) have different limiting values when \(x\) decreases to \(0\). But this is a mistake. We must define

\(x^0 = 1\) for all \(x\),

if the binomial theorem is to be valid when \(x=0, y=0\), and/or \(x=-y\). The theorem is too important to be arbitrarily restricted! By contrast, the function \(0^x\) is quite unimportant.

In spite of this authoritative pronouncement, however, workers in certain other fields adopt the opposite convention. The tangled sentences that got me into this pickle were written an attempt to explain—again without introducing some hard-core terminology—the difference between the \(L_2\) and the \(L_0\) vector norms. And it appears that the usual definition of the \(L_0\) norm just doesn’t work unless \(0^0 = 0\). But I had never bothered to think clearly about this until my reader raised the question.

Suppose we’re given the four-dimensional vector \(\mathbf{x} = (2,0,1,5)\), and we’re asked to define its length \(\|\mathbf{x}\|\). The most familiar definition of length is the Euclidean “square root of the sum of the squares” formula, which corresponds to the \(L_2\) norm:

\[\|\mathbf{x}\|_2 = (2^2 + 0^2 + 1^2 + 5^2)^\frac{1}{2} \approx 5.477.\]

The \(L_1\) norm is defined in the same way but with first powers rather than second powers:

\[\|\mathbf{x}\|_1 = (2^1 + 0^1 + 1^1 + 5^1)^1 = 8.\]

The \(L_1\) norm is really just the simple sum of the vector components. In the same way that the \(L_2\) norm measures Euclidean distance, the \(L_1\) norm implements the Manhattan, or taxicab, metric.

Following the same scheme, we can invent lots of higher norms. Here’s the length of our example vector in the \(L_5\) norm:

\[\|\mathbf{x}\|_5 = (2^5 + 0^5 + 1^5 + 5^5)^\frac{1}{5} \approx 5.011.\]

We can generalize this idea in a formula that defines the \(L_p\) norm for any positive integer \(p\):

\[\|\mathbf{x}\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^\frac{1}{p}.\]

As \(p\) increases, the \(L_p\) calculation gives greater and greater weight to the larger components of the vector. By taking the limit as \(p \to \infty\), we can even define the \(L_\infty\) norm, which is essentially a max operation—only the largest element contributes. For our example vector, \(\|(2,0,1,5)\|_\infty = 5\).

What about going in the other direction, to \(p = 0\)? In a fuzzy-headed, hand-waving way, it’s clear that what we want in this case is the sum of the zeroth-powers of the vector components. In other words, we simply want to count the components, ignoring differences in their magnitudes. Unfortunately, though, we can’t just plug \(p=0\) into the formula. That would give:

\[\|\mathbf{x}\|_0 = \left( \sum_{i=1}^n |x_i|^0 \right)^\frac{1}{0},\]

which just won’t do. The most serious problem, of course, is the \(\frac{1}{0}\) exponent, but there’s also the question of what meaning to assign to \(0^0\). As I understand it, the communities of workers who care about using the \(L_0\) norm solve these problems by introducing a special-case definition such as this:

\[\|\mathbf{x}\|_0 = \sum_{i=1}^n |x_i|^0,\]

along with the further proviso that \(0^0 = 0\). Accordingly, \(\|\mathbf{x}\|_0\) is a count of the nonzero components of the vector \(\mathbf{x}\). For our running example,

\[\|\mathbf{x}\|_0 = (2^0 + 0^0 + 1^0 + 5^0) = (1 + 0 + 1 + 1) = 3.\]

I guess this works, but if we need a special case anyway, wouldn’t it be easier just to define the \(L_0\) norm directly as the count of the nonzero elements? In formal terms this could be done using some sort of delta function instead of \(x^0\). Taking this approach, the \(0^0\) issue would never enter the case—and maybe I’d have a better chance of staying out of trouble.

Treats Tropiques

Wednesday, June 17th, 2009

I’ve let the tropical mathematics bandwagon pass me by. It’s not that I’ve been unaware. I noticed the stream of preprints, the special session at a joint mathematics meeting a few years back, workshops in Palo Alto and Moscow, upcoming events in Montreal and Berkeley. I’ve noticed all this, but I’ve been resisting it. Maybe it was the term “tropical” that put me off—a little too whimsical, with too-deliberate designs on my curiosity. But now the tropical craze has made it to the cover of Mathematics Magazine, with a lead article by David Speyer and Bernd Sturmfels (preprint version available here). I guess it’s finally time for me to figure out what the fuss is all about. The notes below—a work in progress—trace my efforts to do so.

MathMag-cover-img0514.jpg

For starters, what is that word “tropical” supposed to mean? Speyer and Sturmfels explain: “The adjective tropical was coined by French mathematicians, including Jean-Eric Pin, in honor of their Brazilian colleague Imre Simon.” Pin, in a 1998 paper (.pdf), deflects the credit to another French mathematician, Dominique Perrin, again noting that the name honors “the pioneering work of our brazilian colleague and friend Imre Simon.” Simon himself, in a 1988 paper (.ps), attributes the term to yet a third French mathematician, Christian Choffrut. Apparently, no one wants to lay claim to the word, and I can’t entirely blame them. Speyer and Sturmfels go on: “There is no deeper meaning in the adjective ‘tropical’. It simply stands for the French view of Brazil.”

The fundamental objects in the tropical world are the real numbers \(\mathcal{R}\) augmented with \(\infty\), and two operations on those numbers, tropical addition, denoted \(\oplus\), and tropical multiplication, \(\otimes\). Tropical addition is simply the minimum operation; that is, a \(\oplus\) b is equal to the lesser of a and b. For example:

\[ \begin{split}
8 \oplus 3& = 3 \\
-8 \oplus 3& = -8 \\
x \oplus \infty& = x
\end{split}\]

Tropical multiplication is equivalent to addition in conventional (non-tropical) arithmetic: a \(\otimes\) b = a + b. Some examples:

\[ \begin{split}
8 \otimes 3& = 11\\
-8 \otimes 3& = -5\\
x \otimes \infty& = \infty .
\end{split}\]

So that’s the key idea in a nutshell: plus is min and times is plus. This looks like a fairly strange and arbitrary reshuffling of definitions. First we get rid of addition, then we bring it back but call it multiplication. Why not just leave addition alone and redefine multiplication as min? There is a reason for doing it the other way around. In tropical arithmetic, the min operation \(\oplus\) plays the same role that addition plays in ordinary arithmetic, and \(\otimes\) has the same role as multiplication. In particular, consider the distributive law:

\[ a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c) .\]

This rule holds correctly when \(\oplus\) is min and \(\otimes\) is +, but it wouldn’t work if the definitions were swapped. The commutative and associative laws are also valid in tropical arithmetic. Speyer and Sturmfels point out that the system even fulfills “the Freshman’s Dream”:

\[ (a \oplus b)^{3} = a^{3} \oplus b^{3} .\]

On the other hand, there’s trouble with subtraction. No value of \(x\) can satisfy an equation such as \(x \oplus 2 = 5\). Because of this asymmetry, the algebraic structure of tropical arithmetic is classified as a semiring (whereas ordinary arithmetic on \(\mathcal{R}\) is a ring). Here’s another bit of jargon for jargon’s sake: the tropical semiring is idempotent. This just means that \(a \oplus a \oplus a \oplus … \oplus a = a\).

And now you ask: What’s it all good for?

As I understand it, Imre Simon and those French mathematicians who take a tropical view of Brazil were interested in the tropical semiring mainly in connection with automata theory. The idea goes something like this. A finite-state automaton can be represented as a directed graph, where the nodes of the graph are states of the machine and the edges are transitions between states. Suppose the edges are assigned numeric labels, and a path from an initial state to a final state accumulates the sum of all the edges traversed along the way. There may be multiple paths that connect the same initial and final nodes, and so it makes sense to ask which of these paths has the minimal sum. These questions are conveniently addressed in the algebra of tropical arithmetic, where the summation of the labels is the tropical product \(\otimes\) and the taking of a minimum is the tropical sum \(\oplus\). (Expressing the rules in a formal algebra rather than an ad hoc convention helps with certain proofs about the automata.)

Automata theory may be where it all began, but the current interest in tropical mathematics seems to have a different focus. The emphasis has shifted to algebraic geometry—an area of mathematics that I look upon with equal parts of fascination and fright. The main aim in this realm is to understand the solution sets of polynomial equations.

If you plot the values of an ordinary polynomial, you get a smooth curve. The graph below at right shows the simple quadratic function \(y=2x^2+x-1\) with its quadratic.png two real roots at \(x=-1\) and \(x=1/2\). What is the equivalent graph for a tropical polynomial? To construct such a beast, we first have to decide what \(2x^2\) might mean in a tropical context. In ordinary arithmetic, \(2x^2\) is shorthand for \(2 \times x \times x\), and so the obvious translation is \(2 \otimes x \otimes x\); translating back to conventional arithmetic, this expression becomes \(2 + x + x\), or in other words \(2x + 2\). Thus the quadratic term \(2x^2\) becomes a linear function, and the same transformation applies to higher powers also: In general \(kx^{n}\) is converted into \(nx + k\). tropical-quad.png Thus the exponent in the original equation becomes the integer coefficient that determines the slope of a line. The graph of the tropical polynomial is shown at left. The green line traces the complete function \(y=2x^2 \oplus x \oplus -1\); it is the union of segments and rays drawn from the three lines \(y=2x+2\), \(y=x\) and \(y=-1\). For each value of \(x\), the function takes on the least value of \(y\) that lies on one of these lines (the “lower envelope” of the lines). Graphs of all tropical polynomials share this same basic appearance: They are piecewise linear functions and they are always concave downward.

Just as the ordinary polynomial \(2x^2+x-1\) can be factored into the form \((x+1)(2x-1)\), the tropical polynomial \(y=2x^2 \oplus x \oplus -1\) has the unique factorization

\[2 \otimes (x \oplus -2) (x \oplus -1).\]

The constants –2 and –1 appearing in these factors correspond to the “bend points” in the graph of the function, and they play a role analogous to the roots of the ordinary polynomial. They are values of the variable \(x\) where two or more of the linear constraints are simultaneously satisfied. The locus of all such points is the solution set, the object of particular interest to the algebraic geometers.

For a polynomial in one variable, the solution set generally consists of isolated points. The situation gets hairier in higher dimensions, where the solution sets become algebraic “curves”—though they sure don’t look very curvaceous. Take the two-variable tropical polynomial function \(z(x,y) = x^2 \oplus y^2 \oplus 1\). The three terms of the polynomial define three planes in \(\mathcal{R}^3\), and the value of the function is the lower envelope of these planes:

xy-polynomial-3d.png

The “bend points” are now “bend lines” where the planes intersect. Projected onto the \(x,y\) plane, this tropical “curve” has a distinctive Y-shaped form. zxy-solution-set.png At right, the rays in red represent the solution set. They divide the plane into three regions, in each of which a different linear relation has minimal height; the rays themselves define the locus of points where two (or, at the origin, all three) of the terms are minimal. This trident motif is preserved in all tropical curves, not just this simple example. The illustration on the cover of Mathematics Magazine shows a more complicated graph, part of a proof of the tropical version of Bézout’s theorem. (The theorem states, with various caveats, that two algebraic curves of degree m and n intersect in mn points. If I understand correctly, it carries over fully from “temperate” to tropical mathematics.)

The mere idea of reconstructing all of arithmetic with different operations, and then building higher-level structures (algebra, geometry) on top of this new foundation—all that’s enough fun to justify the project. But it seems there are also practical applications. Speyer and Sturmfels discuss a biological problem: Given genetic sequences that define pairwise distances between species, construct a phylogenetic tree showing the evolutionary history of the species. Tropical arithmetic turns out to be useful in finding a tree where all the distances are metrically consistent—that is, they obey the triangle inequality.

That’s about as far as I’ve been able to get in my explorations of the tropical latitudes. But in closing I’d like to say another word about the naming of things in science and mathematics. My sober, fuddy-duddy opinion is that we’d be better off in the long run giving things plain, descriptive names, without too much metaphorical spin on them. Physicists once had a lot of fun with quarks and gluons, charm and strangeness, infrared slavery and ultraviolet freedom; but eventually the joke wears thin. I gather that biologists are now regretting the game of giving genes names like Sonic Hedgehog. Tropical mathematics may also get tired of explaining over and over again about “the French view of Brazil.” On the other hand, I have to admit that if all those papers and workshops had talked about min-plus algebra, they never would have caught my eye.

Update 2009-08-25: I’ve just learned that Imre Simon died August 12.

Alpha bravo

Sunday, May 17th, 2009

WolframAlpha, the new whatchamacallit from Stephen Wolfram, is up and running. It’s also stumbling: I’ve gotten a fair number of error messages. But now is not the time for nitpicky griping. I just want to say bravo to the whole idea.

I applaud because Alpha invites us to use a computer for actual computing. Did you know that this machine that brings you YouTube and Facebook can also evaluate integrals and calculate correlation coefficients? What a thought!

I have ranted elsewhere about the sad decline of casual, inquisitive programming, and the lack of tools that will draw more people into that mode of exploring their world. WolframAlpha is not exactly the answer to my plea. It is a powerful computational tool cleverly disguised as a mild-mannered search engine. You’re not writing programs; you’re just “searching” for the roots of x2 + x – 1 or the limit of (1 + 1/n)n as n goes to infinity or the sine of 1050. I have a feeling that this search-engine metaphor may wear thin pretty quickly, but it may also serve its obvious purpose—breaking down barriers for those who wouldn’t dream of writing a line of Python or Lisp but who are masters of Google.

WolframAlpha is not the first thing of its kind. (My Sage friends will be quick to point out that they have had an online notebook interface up and running for a couple of years.) I’ve only just begun to explore Alpha, and I’m sure there are both treasures and pitfalls I haven’t glimpsed yet. Still, for the moment, I just want to cheer it on.

Himalayan mathematics

Monday, April 20th, 2009

Browsing through some notes I jotted down sometime in 1988, I come upon this sentence:

Mathematicians feel about computers much as the Nepalese feel about jet aircraft.

Did I read that somewhere, or did I make it up? My notes give no clue to the provenance of the thought, and Google is unhelpful.

Bracketology: The Madness of March

Friday, March 13th, 2009

The American Institute of Physics has just sent me a press release, whose subject line I have borrowed for the title of this post. The release discusses the mathematics of an upcoming college basketball tournament:

Mathematicians who study problems like these use combinatorics which helps them determine exactly how many possible outcomes there could be. Starting with 64 teams with two possible outcomes for each team - a win or a loss - the number of possible outcomes for the tournament is a staggering 2 to the power of 64: that is, 2 multiplied by itself 64 times, or 18,446,744,073,709,551,616 possibilities.

Not being much of a sports fan, I hadn’t realized exactly how basketball tournaments work. Now that I understand, I find it cheering. I’m going to root for the outcome in which all 64 teams win.

Wrong number

Wednesday, February 25th, 2009

Is it just me, or does it happen to everyone? When I try to correct someone else’s arithmetic, I can count on making an error of my own. For instance: I was reviewing William Goldbloom Bloch’s clever and provocative book The Unimaginable Mathematics of Borges’ Library of Babel. I noted in passing a numerical error, trivial in significance even though it’s enormous in magnitude. Bloch had occasion to introduce the number:

blochnumber.png

To give a sense of this number’s size, he mentioned that it has 33 million digits. I pointed out that the true digit count is exponentially larger: 1033,013,740. Now I’ve received a correction to my correction, in a note from Michael Rosenthal:

The author [of the book] stated 33 million digits as the length, but Hayes getting closer states that the length is (10 to the 33,013,730). Isn’t the actual number of digits ((10 to the 33,013,730) + 1)?

Of course Rosenthal is right. There are 10n n-digit decimal numbers, but 10n is not one of them. On the other hand, Rosenthal also gives a double-barrelled demonstration of my point about the perils of correcting other people’s arithmetic: In the course of setting me straight, he makes a typographical slip of his own, twice mistranscribing 33,013,740.

My correspondence with Rosenthal provoked me to take a closer look at where that enormous number came from in the first place. This has led me into some diverting adventures with factorials and logarithms. But it has also put me in the position of proposing another correction to a published number, which means I am in grave danger of making still more mistakes.

Bloch’s book is a mathematical companion to “The Library of Babel,” a brief ficción by the Argentinian fabulist Jorge Luis Borges. In the Library of Babel all books are written in an alphabet of just 25 symbols. Each book has 410 pages, and each page has 40 lines of 80 characters, for a total of 1,312,000 characters per volume. We are told that the library possesses one copy of every possible volume composed in this way. Thus the number of volumes on the library’s shelves is 251,312,000. For convenience of reference let us call this number B (for Book, or Babel, or Borges, or Bloch).

The much larger number mentioned in my review enters Bloch’s story when he asks how many ways we might arrange the books on the library shelves. The answer, of course, is the factorial of B. If we imagine that the shelves have discrete slots that hold one book each, then we have B choices for the first slot, B–1 choices for the second slot, B–2 for the third, and so on, until we come to the final slot where there’s just one book left to choose. The product of all these numbers is B!.

So what sort of a number is B!? Can we have a look at it? In this age of computational wonders, surely we can just fire up Sage or Mathematica, or write a four-liner in Lisp, and have the digits of that number served to us on a silver platter, no?

No. We can’t.

After three decades of playing with computers, I am still childishly agog at the ease with which I can tap a few keys and calculate a gargantuan quantity such as 52!, the number of permutations of a standard deck of playing cards:

8065817517094387857166063685640376
6975289505440883277824000000000000

I can even calculate 10,000!, unleashing a Niagara of digits that spill out onto my screen in a long gray torrent—some 35,660 of them by the time the flood abates. (Digression: How many of those digits are trailing zeros? This is a question Fred Gruenberger would always ask when he talked about factorials in his old Popular Computing newsletter, back in the days when computing that many digits took a bit of doing.)

Computing has come a long way, but even with so much numerical horsepower at our fingertips, the direct calculation of B! is still totally unthinkable. If Bloch had been right about the 33 million digits, the computation might be feasible, if tedious, but computing a number with 1033,013,740 digits is just ridiculous, with or without the extra digit I forgot.

If we can’t compute B! outright, maybe we can at least get its logarithm. For a rough estimate of the magnitude—in other words, for counting the digits—the realm of logarithms is the right place to be. Since the factorial of n is a product,

factorial.png

the logarithm of the factorial has to be a sum:

logfactorial.png

I don’t know a quick and easy way to perform this summation, but I do know how to evaluate a slightly different sum, one where every term on the right hand side of the equation is log n. Since there are n of these terms, the sum is obviously just n log n. What we have calculated in this way is the logarithm of nn. Thus nn can be taken as a crude approximation—a very sloppy upper bound—to the value of n!. For our particular case we have log BB = B log B. When I plug in the numerical value of B, and use base-10 logarithms, I come up with

logbtob.png

Exponentiating both sides gives this answer:

btob.png

Hmm. There’s a problem here. This is supposed to be a loose upper bound, but it’s smaller than the value we were expecting for B!. Clearly, BB can’t be less than B!.

Bloch reports that he came to his estimate of B! via Stirling’s approximation. If you look up this bit of mathematical technology, you’ll find a formula something like this:

stirlingformula.png

It would be easy to translate the formula directly into computer code, but it’s pointless to do so if the aim is to calculate B!. Again the only practical way to proceed is to take logarithms of both sides. Then the product in the formula becomes a sum of three terms. The first term, log nn, we already know: It’s n log n. The second term, log e–n, is simply –n if we choose to work with natural logarithms; if we want to stick with base-10 logs, it’s –n/log 10. And the third term we don’t need to worry about; at the level of precision we can hope to achieve in these calculations, ½ log 2πn is too small to notice. So, keeping just the first two terms, we have a rough-and-ready approximation to Stirling’s approximation, which ought to work well enough for very large n. B certainly seems to qualify as a large n; when I plug in its value, here’s what I get:

blogbminusb.png

I have plodded my way slowly and methodically through the steps of this computation in an effort to persuade myself that I might finally have it right. At this point I’m fairly sure that log B! is closer to 101,834,103 than it is to 1033,013,740, but I’m chary of making a stronger claim. With all my crossing back and forth between the safe world of logarithms and the uninhabitable wasteland of exponentials, I worry that I’ve misplaced a factor of e or, worse, mistaken the logarithm of a number for the number itself. If I’ve goofed again somewhere in this calculation, I trust that readers will let me know.

I want to emphasize that my aim in telling this story is certainly not to embarrass Bloch, whose work I admire. His book does a splendid job of explaining ideas in combinatorics, number theory and even topology, in a literary context where such themes don’t get much exposure. His little numerical error—if it is an error—has no impact on the meaning of the passage where it appears.

Indeed, it’s hard to imagine a circumstance where the difference between the two proposed values of B! could have any earthly consequence. Both numbers are always going to be ghostly presences in our universe—actors whose voice we hear from offstage but who never show their face under the lights. A certain breed of radical constructivist would argue that the numbers we’re talking about don’t exist at all. I don’t find that doctrine helpful or persuasive, but I do think there’s a distinction worth making between numbers we can actually hold in our hands and these strange, outsized objects that come equipped with their own exclamation points. Computing with numbers this large is like working with toxic or radioactive materials. You never get to touch anything directly; all manipulations are performed by remote control from behind a blast shield. The experience can be exciting, but afterwards you don’t come away with a sense that you really know what you’re doing.

Fred Gruenberger would have wanted to know how many trailing zeros there are in B!. Can we answer?

Math, fonts, and HTML

Monday, February 16th, 2009

In the latest issue of American Scientist I write about the awkward and ugly business of trying to present mathematical typography on the web. All the while I was writing the column, I had constantly in mind the uncomfortable knowledge that the column itself—with all of its examples of equations and arcane symbols—would have to be prepared for presentation in HTML. The conversion went more smoothly than I expected, thanks to the energy and expertise of Anna Lena Phillips. Nevertheless, I recommend reading the PDF version.

One issue that arises in this story goes beyond the particular problems of making mathematical notation intelligible. Why is it that a web page can’t include the fonts that are needed to display the text properly? It’s routine for web documents to incorporate images and audio and video and various kinds of code that affect the display of information—CSS, JavaScript, etc. When it comes to fonts, however, the web server is not empowered to supply what’s needed; instead, it has to play a silly guessing game with the client computer. “Do you have ‘Times New Roman’?” the server asks. “No, well how about ‘Times’? ‘Georgia’? Okay, give me anything with serifs, please.” It would be easier for both parties if the server could just package up a font and send it along—or so it seems to me.

This is not a new idea. Most of the recent discussion of embedable or linkable fonts focuses on legal impediments, which seem quite manageable to me (though of course I’m not a lawyer, nor am I an owner of typeface copyrights). Are there other reasons to back away from such a proposal? The bandwidth required seems moderate by present-day standards, and there are obvious schemes for avoiding waste by sending only glyphs that are actually needed in a document and not already present on the client computer. Publishers and advertisers would doubtless abuse the facility with garish attempts to attract attention, but we survived the <marquee> and <blink> tags back in the nineties, and an attack of circus-poster typefaces seems no more obnoxious. Am I missing something obvious?

jsMath

Sunday, December 28th, 2008

A famous equation:

$$\nabla\times\mathbf{E}=-\frac{\partial\mathbf{B}}{\partial{t}}$$

Imagine that—mathematics on the Web!

Double-click for the LaTeX source. There’s more info in the tiny box you ought to find in the lower right hand corner of your browser window.

I’ll have more to say about all this in a few days.

Zimaths

Tuesday, December 16th, 2008

A friend pointed me to Zimaths, a magazine for high school mathematics students, published in Zimbabwe in happier times. (The last issue available online is from 2005. The next-to-the-last issue has a helpful tutorial on inflation, which was then running at about 600 percent per year in Zimbabwe. The rate has gone up considerably since then, and lately the country has had even worse troubles.)

Looking back to the first issue of Zimaths, from October of 1996, I happened upon a list of problems from the Zimbabwe Maths Olympiad of that year. Problem 5 looked like something I might be able to do in my head:

The number a is randomly selected from the set {0, 1, 2, 3, …, 98, 99}. The number b is selected from the same set. What is the probabilty that the number 3a + 7b has a units digit equal to 8?

Clearly, all positive integer powers of 3 and 7 are odd numbers not divisible by 5, and so they must end in 1, 3, 7 or 9. Running through the first few of those powers (1, 3, 9, 27, …; 1, 7, 49, 343, …) suggested that all possible terminal digits appear with equal probability. The sum of two such odd numbers must be an even number. It seemed like a good bet that all even terminal digits would be equally likely—that is, each of 0, 2, 4, 6 and 8 would appear with probability 1/5. I checked this hypothesis by listing all possible pairs of the four allowed odd digits (with addition done modulo 10):

1 + 1 = 2
1 + 3 = 4
1 + 7 = 8
1 + 9 = 0
3 + 3 = 6
3 + 7 = 0
3 + 9 = 2
7 + 7 = 4
7 + 9 = 6
9 + 9 = 8

Sure enough, in this list of 10 combinations each even digit appears twice, confirming that the probability of seeing an 8 should be 0.2. If I had been competing in the Olympiad, I would have written down that answer and gone on to the next problem without a further thought.

Later in the day, though, I was looking for something I could use to test a new toy—the latest version of Mathematica. Here’s what I came up with:

f[] := Mod[3^RandomInteger[{0, 99}]
         + 7^RandomInteger[{0, 99}], 10]
BarChart[Part[#,2]&
    /@ Sort[Tally[Table[f[], {100000}]]]]

This generates 100,000 values of 3a + 7b mod 10 and tallies them up:

Zisums.png

So I’ve discovered a bug in Mathematica, right? I know there’s nothing wrong with my program, and as for a mistake in my thinking—well, that’s just unthinkable.

Update: Thank you, commenters.

Svat’s question — “How do you think that error could have been avoided?” — goes right to the heart of the matter, but I need to start with a prior question: “How do I know which of my answers is the erroneous one?” When a computer program and my own reasoning yield different results, how should I resolve the conflict? Where should I look first for a mistake?

I don’t expect to find a universal and eternal answer to this question. On the contrary, everything depends on one’s personal equation. Some of us are better at analytical thinking and some of us are bit-players. Nevertheless, in this case I think it’s pretty clear that the computer calculation relies on fewer and simpler assumptions. It asks us to trust the random number generator and certain arithmetic operations, but if we grant that the implementation of those functions is correct, the program can hardly go wrong. Its structure follows directly from the problem statement. (Of course there’s also statistical uncertainty, but if we’re patient enough, we can reduce that to any level we please.)

The probability analysis depends crucially — as I discovered to my chagrin — on properly counting all the cases. My mistake was a very elementary one; I suspect it is also a very common one. It was when I saw the empirical results, with probability values clustered near 3/16, that I realized I should be looking at 16 cases rather than 10.

The point I want to make here isn’t about mathematical truths but about the art of problem solving — even when, as in this case, there’s no artistry involved. Barry’s variation on the problem (see the comments) offers me another sterling opportunity to make a fool of myself. So I present below, in first-person present-tense stream-of-consciousness diary style, a record of my wary engagement with that problem.

  • First, why the shift from {0..99} to {1..100}? Because including zeroth powers would contaminate each term in the sum with a 1. For example, 2n mod 10 is {2, 4, 6, 8} for n={1..100}, but for n={0..99} there’s a 1 percent chance of turning up a 1. That would make the sum much messier.
  • 1n and 5n are easy. After a further moment’s thought: So is 6n. So every sum will include a term equal to (1 + 5 + 6) mod 10, or in other words 2.
  • 4n is {4, 6} and 9n is {1, 9}. We can combine them — but carefully, now, let’s not make the same mistake again — to produce a single term of {3, 5, 5, 7}.
  • We already know that 3n and 7n both yield {1, 3, 7, 9}. And 2n and 8n generate {2, 4, 6, 8}. Therefore the whole sum is
    {2}1 + {3, 5, 5, 7}1 + {1, 3, 7, 9}2 + {2, 4, 6, 8}2,

    where the superscripts indicate the number of elements we are to choose (independently at random with replacement) from each of these sets. (Not sets, really. Bags.)

  • Parity: There are an odd number of odd terms, so the sum will be odd.
  • Is there some cute number-theoretical trick that’s going to make this easy? I don’t see it.
  • Stuck. Back to the computer to do it the unclever way:

    g[]:=Mod[Sum[i^RandomInteger[{1,100}],{i,1,9}],10]

    Tallying 100,000 repetitions of this function produces the following result:

    barrysums.png

    Repeating the computation a few times more suggests that the slight excess of 7s is not statistical noise. So now I think I know the answer, but I don’t know where it came from.

  • Back to the probability calculation. The 4 × 4 sum matrix for the original problem was easy (once I knew what to do with it). This one is 1 × 4 × 4 × 4 × 4 × 4. There must be a shortcut; that’s too much to do with pencil and paper.
  • [After a break.] Aha: The sum matrix for {2, 4, 6, 8}2 is exactly the same as the matrix for {1, 3, 7, 9}2 (after permuting some columns and rows). Thus the sum can be expressed as:

    {2}1 + {3, 5, 5, 7}1 + {1, 3, 7, 9}4.

    However, this still looks like a 4 × 256 matrix, which exceeds my patience.

  • Back to the computer again, this time to do all those little sums. What I want is something along the lines of:

    Outer[PlusMod, {1, 3, 7, 9}, {1, 3, 7, 9}],

    where Outer is like a cross product; but instead of multiplying it applies the PlusMod procedure, which I define to do addition modulo 10. The result returned by the expression above is the answer to the original Zimbabwe Olympiad problem:

    {{2, 4, 8, 0},
     {4, 6, 0, 2},
     {8, 0, 4, 6},
     {0, 2, 6, 8}}
    

    For Barry’s more-elaborate variation the full program (without shortcuts) is:

    Sort[Tally[Flatten[
      Outer[PlusMod, {1}, {2, 4, 6, 8}, {1, 3, 7, 9},
         {4, 6}, {5}, {6}, {1, 3, 7, 9},
         {2, 4, 6, 8}, {1, 9}]]]]
    

    The Outer[] expression yields 1,024 values, which are then tallied up. The returned value is:

    {{1, 204}, {3, 204}, {5, 205}, {7, 206}, {9, 205}}

    This appears to be our answer. Barry asked about the frequency of the digit 5: It is 205/1024. The 7 digit is most common by a slight edge, at 206/1024.

  • Going back to the simulation program and running 1,024 × 100,000 iterations to get better statistics yields this result (about an hour later):
    {{1, 20400852},
     {3, 20396224},
     {5, 20501004},
     {7, 20601096},
     {9, 20500824}}
  • In the end I still wonder if there’s an easier way. Barry?

Cows, Colleges and Contentment

Friday, November 7th, 2008

It must be something in the milk. For expository writers who specialize in mathematical subject matter, there’s a strange attractor in Northfield, Minnesota. That’s where Lynn Arthur Steen and his colleagues produced “telegraphic reviews” of mathematical writing for many years. Math Horizons, the student math magazine published by the MAA, was edited there for a while by Steve Kennedy and Deanna Haunsperger. Barry Cipra lives there. And I used to live there too.

I’ll be making a pilgrimage back to my old hometown next week. Thanks to the generosity of Paul Zorn and the hospitality of Barry Cipra, I’ll be giving a talk at St. Olaf College Tuesday afternoon. Details here. And Rosalind Reid and I will be meeting with Mary Steen’s class of potential future math scribes.