Light-field photography

14 May 2012

Beyond digital photography lies computational photography, which holds out the promise of extracting more bits from every photon. When I wrote about this idea a few years ago, I had no hands-on experience with computational cameras. The first widely available “light-field camera,” called the Lytro, was announced last fall. Mine finally arrived last week. So far I’ve taken only a few dozen pictures, so this is a very preliminary report. But the camera itself is a very preliminary product, so perhaps it’s appropriate to treat it as a preview of things to come.

Here are a couple of pictures to play with. First, more magnet balls, with evidence that I’ve finally figured out how to stack them up in something resembling a hexagonal close-packed configuration.

And some flowers, which seem to be the front-running subject matter for Lytro photos. These are orchids at a street-corner stand, with extra inventory on display for mother’s day.

I trust that you’ve figured out the trick: Clicking on a point in the image refocuses on the depth of the scene at that point. (You can also double-click to zoom.)

How is this magic accomplished? The Stanford doctoral dissertation of Ren Ng sets forth the basic idea. Ng is the founder and CEO of Lytro Inc.

This dissertation introduces a new approach to everyday photography, which solves the longstanding problems related to focusing images accurately. The root of these problems is missing information. It turns out that conventional photographs tell us rather little about the light passing through the lens. In particular, they do not record the amount of light traveling along individual rays that contribute to the image. They tell us only the sum total of light rays striking each point in the image. To make an analogy with a music-recording studio, taking a conventional photograph is like recording all the musicians playing together, rather than recording each instrument on a separate audio track.

In this dissertation, we will go after the missing information. With micron-scale changes to its optics and sensor, we can enhance a conventional camera so that it measures the light along each individual ray flowing into the image sensor. In other words, the enhanced camera samples the total geometric distribution of light passing through the lens in a single exposure. The price we will pay is collecting much more data than a regular photograph. However, I hope to convince you that the price is a very fair one for a solution to a problem as pervasive and long-lived as photographic focus. In photography, as in recording music, it is wise practice to save as much of the source data as you can.

So we have a radically different kind of camera here. It has a lens, but there’s no focusing ring. Focusing is left to the computational post-processing of the image.

The after-the-fact refocusing is most dramatic in close-up shots, with a wide ratio between the distances of near and far objects. You can see the effect by clicking around in this image of pine tree putting out some exuberant spring growth.

The camera is capable of coming in even closer, as in this macro shot of some lichens and mosses on rocks in western Massachusetts. (The gray stalks are wo or three millimeters tall.)

Getting much of a focal range is harder when you’re dealing with more distant subjects. This was my best attempt to catch a red-wing blackbird in a marsh below Danehy Park in Cambridge.

When you pull back even farther, the camera optics have enough depth of field that everything is in focus at once. Ordinarily, that high depth of field would be a virtue, but it strangely takes the fun out of Lytro pictures.

The gallery on the Lytro web site includes lots more examples, including some classic pull-focus tricks like spider webs and raindrops on windowpanes.

•     •     •

Just a few years ago, the immediate reaction to the Lytro camera would have been: “How do you refocus after you print your pictures? You can’t click on paper.” Times change. The Lytro makes more sense in an age when people share their pictures on Facebook or Twitter instead of printing and framing them. Yet I still have misgivings about Lytro’s scheme for distributing and publishing images. The Lytro photographs displayed above are not hosted on bit-player.org, as this text is; they are embedded <iframe> tags, providing a window onto content hosted at lytro.com. That’s the only way I can post them here. This is an annoyance to a curmudgeonly control freak like me; I want to retain possession of my own work, as well as control its presentation.

There is no fundamental reason the images have to be hosted at lytro.com. The refocusing algorithms are not running on the Lytro servers. What’s embedded in the iframes is essentially a stack of images with different focal points, along with a Flash application that responds to clicks by displaying the appropriate image from the stack. (The use is Flash is another annoyance. Lytro evidently has a non-Flash version of the software, since the photos are viewable with Flashless devices such as the iPad, but there’s no readily accessible way to choose the non-Flash version on other platforms.)

The camera itself has an unconventional tubular design, but it fits the hand well enough and has a satisfying heft and solidity. The only physical controls are power, zoom and the shutter botton. The one severe problem with the camera hardware is that the viewing screen is too small (1 square inch) and too coarse; also, it’s useless in bright sunlight. Often, you can’t see what you’re about to photograph, and afterwards you can’t see what you’ve captured until you upload the image to a computer.

The software that runs on the computer has its own issues. For now it is Macintosh-only; a Windows version is promised, but there’s no mention of Linux. When importing images, the software hogs the CPU, setting off a tremendous whoosh of fan noise. And once you have the images loaded into the software, there’s not actually much you can do with them, other than add metadata or send them to the Lytro web site. There are no tools for cropping, correcting colors, sharpening, etc. You can export a JPEG version, but it’s of course merely a static pixel array. (The JPEGs are 1080 by 1080 pixels, with quality in the range you’d expect from a good cell-phone camera.)

The full light-field photos are stored in 16-megabyte “.lfp” files, but there’s no public documentation on the format of those files. As far as I know, no software other than Lytro’s own can read the files. If there are any plans for, say, a Photoshop plugin or a software development kit, they are not discussed publicly.

Is the Lytro the first chapter in the future of photography, or a novelty that will fade after a year or two? I suspect the answer will depend on how quickly Lytro is able to develop and release new features for the software and new models of the camera. At the moment, what the camera offers is a single trick: focusing after rather than before you press the shutter button. It’s a neat trick, but probably not neat enough to support a whole new photographic infrastructure. The light-field technique could offer more. Lytro has promised that a future version of the software will allow control not just of focus but of depth of field, so that you can choose a version of the image in which everything is in focus at once. There are still more possibilities, even including shifting the camera’s apparent point of view after the picture is taken. But it remains to be seen whether such features can be brought to market before people lose patience or interest.

Statistical mechanics of magnet balls

4 May 2012

They come out of the can in a gleaming 6 × 6 × 6 cubic crystal. It took me a day to figure out how to get them back into the can. But that’s not the deepest mystery about these curiously powerful little ferromagnetic balls.

216 jumbo magnet balls in a cubic array

A web search turns up lots of sites that sell the magnets under various brand names, and a few more web pages that warn of their dangers. There are also videos and picture galleries of interestingabout 100 magnet balls arranged to form a Mobius strip constructions, such as polyhedra or a Möbius band. But I haven’t been able to find anything on the questions that intrigue me most: For a set of N magnet balls, what is the ground-state configuration—the geometric arrange­ment of lowest energy? How about the state of lowest free energy? Informally: Given a handful of magnet balls, what is the shape they most “want” to assume?

You can get some rough intuition about these matters by using your fingertips. Take a random clump of magnet balls and try to pull them apart. How much force do you have to apply when you tug in various directions? How does the cluster break apart? I find that the balls usually peel off in long strings of pearls, going directly from a three-dimensional aggregate to a one-dimensional chain. This behavior is not entirely surprising. After all, the magnets are dipoles, and so they can reduce their total energy by lining up north-south-north-south-north-south….

Once you have a long chain, you can reduce its energy a little bit further by connecting head to tail to form a closed loop. But then, when you play around with the resulting bracelet of beads, you soon discover that the circular configuration is not at the bottom of the energy spectrum. The loop—if it’s long enough—tends to collapse on itself, with the strands on opposite sides zipping together in the middle, forming what RNA chemists would call a double-ended hairpin structure.

Bracelet and zippered hairpin

Note the alignment of the beads in the zippered region: The arrangement is rectilinear, as in a square lattice. By bringing together more strands of beads, we can grow this zippered arrangement into a fully two-dimensional, planar pattern. However, when you try this experiment with half a dozen short strands, you quickly discover that there’s more than one way of combining them. The dipoles give each strand an orientation, and so adjacent chains can be either parallel or antiparallel. The rectilinear habit of the zippered loop comes from antiparallel alignments. Parallel strands assume a quite different pattern, with triangular or hexagonal symmetry. It’s the difference between Kansas and Tennessee:

Kansas and tennessee

It might look as though you could convert one of these arrangements into the other just by squeezing and skewing, but that’s not the case at all. If you could see the north and south poles of each ball magnet, they would look something like this:

Kansas and tennessee dipoles

Which of these arrangements lives in the deeper energy well? Kansas is stabilized by a multitude of local rectangular flux loops, which link adjacent antiparallel rows. (There may be weak longer-range attractions as well.) The parallel strands in Tennessee, in contrast, form one big global flux tube, with highly favorable interactions within the fabric of the layer but with nothing to help close the flux loops that exit one end of the state and re-enter the other.

Both kinds of planar sheets are happy to roll up into hollow cylinders. The resulting tubes (which can also be made by stacking rings) hollow cylinders with rectangular and triangular symmetry in the wall fabricare notably sturdy, stable and stiff. More than any other structures I’ve discovered in playing with the magnet balls, the tubes seem to have the quality that Buckminster Fuller used to call tensegrity. The Tennessee roll-up is slightly stronger than the Kansas model. (I’ve tried constructing hemispherical endcaps for the tubes, without success.)

What about a fully three-dimensional, space-filling lattice? We already know about the simple cubic lattice, because that’s the configuration that comes out of the shipping container. But my attempts to build multiple layers of the hexagonal close-packed lattice have all failed. A two-layered Tennessee will not lie flat. Or, looking at it another way, magnetic cannonballs cannot be stored in the classic Keplerian heap. Even a tetrahedral pile with just four balls is violently unstable and spontaneously rearranges itself into a linear chain or a flat square.

Looking at all these forms, I see an analogy with carbon chemistry. [Warning: half-baked ideas ahead!] According to the analogy, the cubic lattice of magnet balls is like diamond. Not that it has the same geometry as diamond, but it is the most symmetrical arrangement, and the only one that fills three-dimensional space. The Tennessee pattern with its hexagonal symmetry is analogous to graphene or graphite—a substance that is actually more stable than diamond but has reduced symmetry. Continuing in this scheme, the Tennessee roll-up has to be a buckytube.

•     •     •

Whatever the true identity of the lowest-energy configuration, it’s a state you would expect to see emerge spontaneously only after an eternity of gradual cooling toward zero temperature. For those of us with a shorter attention span, the state of lowest free energy might be of greater interest. Let’s define free energy as

A = U – TS

where U is the ordinary internal energy—the stuff we were trying to minimize in the paragraphs above—T is the temperature and S is entropy. In this context, temperature is not what the thermometer in the room reads. It’s a measure of how vigorously we agitate the system of balls, for example by shaking them in a box. As for the entropy, let’s think of it as counting the number of microstates per macrostate. The free-energy formula, as I understand it, implies the following: If we take random samples from a population at temperature T, then the configurations we’re most likely to see are those that balance the imperatives to minimize U and to maximize S. The value of T determines the relative weight assigned to energy and entropy.

I tried the obvious experiment. I put some magnet balls in a Tupperware box and shook vigorously. The result was noisy and uninformative. The magnetic forces are so strong that it would take a whole of shaking to have any observable effect. Indeed, I think the box might disintegrate before the cluster of magnet balls did.

So I tried a different approach. I looked at very small clusters, typically two or three balls, and watched what happened when they collided. I arranged the collisions by having the balls slide or roll down the walls of a large china bowl, meeting at the bottom. In effect, the height and steepness of the walls play the role of temperature in this procedure. Here are my notes from the first series of experiments, in which each pairing was repeated 20 times:

results of collisions between pairs, trios, quads

I thought I noted a bias toward small, open rings. To explore this possibility a little further, I tried colliding individual balls with progressively larger rings or ringlike clusters.

Collisions of single balls with small rings generally making larger rings

These results also suggested a preference for symmetrical n-gons, especially pentagons and hexagons, with diminishing effects as n gets smaller than 5 or larger than 6. Again it’s tempting to interpret the results in light of organic chemistry, where the bond angles of carbon atoms favor 5- or 6-member rings (cyclopentane, cyclohexane, benzene); smaller rings are very hard to make, while larger ones are floppy and fragile.

But perhaps I’m a little too eager to believe these chemical analogies. After all, when you start with ring-shaped ingredients, you might expect to get ring-shaped products. Here’s what I saw after colliding various small linear chains:

Products of collisions of linear molecules 450

The preference for rings has largely disappeared; noncyclic clusters predominate. Bashing together five-bead strands produces quite a zoo of exotic shapes, hinting at still more diversity as the size of the molecules increases.

Performing these experiments is tedious, and the results are probably not trustworthy. When dumping balls into a bowl, many factors are hard to control, and some of them cannot easily be randomized either. An important example is the impact geometry when two chains meet in a collision. Coming together end-to-end might well produce a different outcome than meeting broadside.

Rather than work on refining my experimental techniques, I would like to try simulating the system. Doing all this inside a computer allows for greater control, better statistics and better randomness; besides, we can do some tricks that would be difficult in the physical world, such as turning magnetism on and off whenever it’s convenient. However, creating an accurate simulation looks difficult and messy. Magnetic forces are harder to calculate than the simple inverse-square-law forces of most n-body simulations. Moreover, the program might also have to include friction and angular momentum. (Some clusters of beads can roll, whereas others slide.)

I’ve been able to find just one example of such a program, mentioned in a thread at Physics Forum. Perhaps there are others.

I have the persistent sense that I am retracing the footsteps of others, but I have not been able to spot their tracks. The arXiv, the American Journal of Physics and the IOP journals all seemed like good prospects, but my searches have come up empty. I’ll be grateful for any pointers.

Kepler’s snowflake

28 April 2012

The Kepler conjecture—the one about stacking cannonballs or oranges—is now the Hales theorem, though with a cruelly lingering asterisk. Thomas C. Hales announced his proof in 1998, and it was published in 2005 and 2006 (links). But the referees were unable to fully verify all the details, including some 5,000 nonlinear optimization problems solved by computer. Hales has continued to work on refinements to the proof, both simplifying the arguments and exploring formal methods of verification. 

Johannes Kepler 1610 225px

Johannes Kepler in 1610, an age of fanciful extravagance in collars and mustaches; from Wikipedia.

Over the years I’ve read parts of Hales’s proof, but I realized the other day that I had never looked at Johannes Kepler’s much earlier contribution to this discussion. It turns out that Kepler’s essay on the subject is a little gem, an amiable work by an affable fellow, showing a lively mind at play. The conjecture appears in The Six-Cornered Snowflake, a pamphlet pub­lished in 1611 and offered as a New Year gift to Johann Matthäus Wacker von Wackenfels, who was then Kepler’s patron. Apparently Wacker was more than a moneybags; he had studied law in Strasbourg and Geneva, took a doctoral degree in Padua, and had literary interests. Kepler’s essay includes a lot of learned banter addressed to Wacker, suggesting the two men may have been genuine friends. Some of the jokes involve bilingual puns—Latin and German.

Kepler’s main subject, as you might well guess from his title, is the hexagonal symmetry of snowflakes. He writes:

There must be some definite cause why, whenever snow begins to fall, its initial formations invariably display the shape of a six-cornered starlet. For if it happens by chance, why do they not fall just as well with five corners or with seven? Why always with six…?

Kepler hexagons

The idea we now know as the Kepler conjecture is introduced in one of several speculative attempts to solve this puzzle. Kepler describes the familiar stack-of-cannonballs geometry for packing spheres and then makes a bold claim about it, phrased not as a conjecture but as a fact that stands on its own, without need of demonstration: “The pack­ing will be the tightest possible, so that in no other arrangement could more pellets be stuffed into the same container.” The only alter­native he considers is the simple cubic lattice. He does not calculate the actual density of either lattice (\(\pi/\sqrt{18}\) vs. \(\pi/6)\). In the woodcut above, Kepler deconstructs a tetrahedral heap of 35 spheres into five horizontal layers; he clearly understood that the packing could be extended throughout an unbounded volume of space.

From the cannonball packing, Kepler passes on to other themes, including the arrangement of seeds in a pomegranate, the hexagonal cells of honeycombs and the symmetries of flowers (often fivefold); this last subject presents an opportunity for a further digression on the golden ratio and the Fibonacci numbers. But when Kepler finally returns to the snowflake, none of these ideas offers much help, and he is left with still more questions:

I will grant that, as flakes fall from above through steamy air, some incrustation on the plumes can occur from the vapor that comes in contact with them. But why at six points? What is the origin of the number six? Who carved the nucleus, before it fell, into six horns of ice? What cause is it that prescribes in that surface, which is now in the very act of condensing, six points in a circle for six prongs to be welded to them?

At this point Kepler comes up with a charming and highly creative idea, which also happens to be utterly preposterous. He proposes that snowflakes have six points because space has three dimensions:

While these starlets are falling, they consist of three feathered diameters, joined crosswise at one point, with their six extremities equally distributed in a sphere; consequently they fall on only three of the feathered prongs, and tower aloft with the remaining three, opposite those on which they fall, on the same diameters prolonged, until those, on which they rested, buckle, and the remainder, until then upright, sag onto the level in the gaps between them.

In other words, a pristine snowflake is not a planar hexagonal shape but a three-dimensional structure rather like a toy jack, with components aligned along three Cartesian coordinate axes. The flake collapses into a flat, six-pointed configuration only when it falls to earth. According to this theory, the number six is not something arbitrary or accidental but a direct consequence of the way the universe is put together.

Kepler did not have the vocabulary of Cartesian coordinate systems to describe his idea—Descartes would not invent it until 20 years later, incidentally after reading The Six-Cornered Snowflake—and so Kepler had to adopt a more roundabout description:

But is this perhaps the cause of the three diameters, that there is the same number of dimensions in animals? After all they have upper and lower parts, front and back, left and right.

Almost half of Kepler’s essay is given to the defense of this three-dimensional hypothesis, and then, at the end, all of his speculation is spoiled by a conflict with cold, wet reality:

For as I write it has again begun to snow, and more thickly than a moment ago. I have been busily examining the little flakes. Well, they have been falling, all of them, in radial pattern, but of two kinds: some very small with prongs inserted all the way round… But scattered among them were the rarer six-cornered starlets of the second kind, and not one of them was anything but flat, whether it was floating or coming to earth, with the plumes in the same plane as their stem.

In the end Kepler abandons his inquiry without reaching any conclusion. He throws the problem out for the chemists and physicists to solve—which they did, three centuries later.

Scientific discourse has changed a great deal in the past 400 years. When Hales wrote up his proof of the Kepler conjecture, he did not include jocular asides to his program officer at the National Science Foundation. And we don’t see a lot of published papers these days that end with the admission, “I have not yet got to the bottom of this.” The negative capability that Kepler embraces is part of what makes his essay so appealing.

I read the Kepler essay in an edition published in 1966, which prints the Latin text and an English translation on facing pages:

Kepler, Johannes. 1966. The Six-Cornered Snowflake. Edited and translated from the Latin by Colin Hardie, with essays by L. L. Whyte and B. F. J. Mason. Oxford, U.K.: Oxford University Press.

There’s also a more recent edition, which I have not seen, from a small publishing house in Philadelphia:

Kepler, Johannes. 2010. The Six-Cornered Snowflake. Translation by Jacques Bromberg, with essays by Owen Gingerich and Guillermo Bleichmar. Philadelphia, Penna.: Paul Dry Books.

World3, the video

15 April 2012

Still more on World3 and The Limits to Growth. Two weeks ago I gave a talk at Harvard on all this, and the video is now online. Look for the Brian Hayes–March 30 link near the bottom of the page.

The slides are here.

And, by the way, the slides were done with deck.js, a JavaScript-and-HTML framework I had never tried. Both preparing the slides and presenting them went very smoothly. Running the talk from within a browser has obvious advantages when you’re talking about Internet things; you don’t have to break out of Powerpoint or Keynote to go to a web page. Using Mathjax for TeX stuff is effortless. So is posting your slides on the web. On the other hand, navigation may not be obvious at a glance. Use the arrow keys. The M and G keys also do helpful things.

World3, the public beta

15 April 2012

Forty years ago I had my first close encounter with mathematical models of doomsday. The Limits to Growth, published in the spring of 1972, offered a grim vision of environmental and economic collapse, based on the implacable logic of a computer simulation called World3. For extra nerd-cred authenticity, the results of the simulation were set forth in crude black-and-white graphs reproduced directly from line-printer output.

ASCII infographics from The Limits to Growth

I wrote about The Limits to Growth and World3 back in 1993. Now I have revisited the subject in my newly published American Scientist column. Buried deep within the new column is a note mentioning that I’ve been working to re-implement the World3 model in JavaScript. “The result of this exercise is at http://bit-player.org/limits,” the column says.

If you follow that link, you’ll find it’s true: There’s a rudimentary version of the model you can play with (if you have the right browser).

Screen image of the JavaScript Wortld3 model

But I have to tell you, it was a near thing. When the magazine was shipped to the printers three weeks ago, the program was unfinished. Two weeks ago, it was finally running but giving weird results. A week ago the output was still nonsense. Since then I’ve had more anxious moments, late nights, and occasions to ponder the foolishness of publicly announcing vaporware. For a while it looked like I might have to admit defeat and write a sheepish apology for promising something I couldn’t deliver. Never again, I said to myself. And yet, when it’s all over, I get such a kick out of building a thing like this.

Herewith a few notes, mostly technical, on the building process.

What the model models. For background on World3—where the project came from, who did it, and whether you should worry about the model’s bleak predictions—please see either of my columns. Very briefly, the model traces interactions among five main components of the global ecosystem and economy: the human population, agriculture, industry, nonrenewable resources and pollution. If you could strip the model down to its mathematical essentials, it would be a system of coupled differential equations, something like the Lotka-Volterra equations for predator-prey populations. But the model is actually formulated in the language of “system dynamics,” a simulation methodology invented in the 1950s by Jay W. Forrester of MIT, with heavy influence from control theory and servomechanisms.

The key elements of a system dynamics model are represented by levels and rates, or less formally vats and valves. Here’s the population section of the World3 model:

vats and valves in the population section of the World3 model

The orange rectangles are levels, which integrate the inflow and outflow of people. The rates of flow are determined by the valves, represented here as hourglass-shaped icons (a graphic device borrowed from control theory and industrial engineering).

Twiddling the knobs. If you go and play with the model, most of the controls should be pretty obvious. The original World3 model extended over 200 years, from 1900 to 2100; the model duration slider can extend the horizon out to 2400. The time step slider controls the integration interval (the variable dt within the model); if you set it to a value greater than 1 year, you’re likely to see spurious short-period oscillations caused by undersampling. The initial resources multiplier affords control over a variable that turns out to be crucial in determining the fate of World3. Note that the resources curve in the graph shows the fraction of resources remaining, so it always begins at the same initial value; the slider setting effectively determines the rate of depletion. The final slider, output consumed, provides access to another variable that can greatly alter the outcome of the simulation. This quantity is the fraction of industrial output that is diverted into consumption, defined as anything nonproductive; the output not consumed is reinvested in agriculture, industry and the extraction of natural resources. High consumption acts as a damping or friction term, curtailing the positive feedbacks that lead to much of the future unpleasantness in the model. As I snidely commented in 1993: “The model seems to be telling us to invest less in farms and factories and to spend more on frippery and fast cars. Armaments also fall into the category of nonproductive spending, so perhaps we need a good vigorous war every few decades.”

A few snapshots of model behavior. My purpose in writing this program was not really to explore the behavior of the World3 model; for that there are several more versatile and more trustworthy implementations out there (including at least one that runs within a web page). I just wanted to understand what’s inside the box. Nevertheless, now that the model is running, I may as well point out a few of its tricks.

Here’s the spurious oscillation mentioned above, with dt = 2:

spurious oscillations caused by setting the integration interval to too large a value

A quite different kind of oscillation appears when the initial stock of natural resources is set to a very high value (32×):

oscillations with a period of about 150 years, observed when the resource base is very large

These oscillations, with complex waveforms and a period of roughly 150 years, are not caused by integration error but probably represent a natural behavioral mode of the model itself (analogous to the population cycles seen in predator-prey models).

We get a much calmer vision of the future by setting the consumption fraction to 0.51, rather than the World3 default of 0.43:

Setting consumption to a higher value tames the overshoot phenomenon

Siphoning off some of the capital that would otherwise fuel rapid growth tames the overshoot-and-collapse regime of the standard run. However, the outcome is hardly utopian. Life expectancy and food per capita remain permanently low; so does industrial output, which can be taken as a proxy for wealth.

Putting World3 in a web browser. The original World3 model ran on mainframe hardware—IBM 360 and 370 machines. Now it fits into a web browser on a laptop or an iPad. I try to keep my sang froid about such things, but the fact is I’m just plain astounded by the march of progress.

I should emphasize that the entire JavaScript computation is happening on your computer, not my server. The program is downloaded once and then executed locally each time you press the Run button. And it’s not even much of a computational burden. The gradual unfolding of the graphs across the years is a deliberate animation effect, not a reflection of actual computation speed. (The Run Fast button is meant to eliminate artificial delays, but it doesn’t quite achieve that yet.)

DYNAMO and toposorting. The 1972 version of World3 was written in a language called DYNAMO, created a decade earlier by Phyllis Fox and Alexander Pugh, who were then part of Forrester’s group at MIT. In lexical and syntactic structure DYNAMO is what you’d expect of a language from the punch-card era—six-letter variable names and ALL CAPS—but in other respects it’s an interesting early experiment, with a programming style that falls somewhere between procedural and declarative.

One feature is particularly noteworthy. In DYNAMO a model is defined by a set of “equations” (really assignment statements) that can be written down in any order but have to be executed in a sequence that takes into account the way one equation depends on others, so that every variable is evaluated before it is used. The DYNAMO compiler reordered the equations automatically. This was an early application of topological sorting; the first efficient algorithms for this process were developed circa 1960, in connection with PERT project-scheduling methods.

Topological sorting takes a directed graph and reduces it to a linear list of nodes satisfying the following constraint: If the graph includes a directed edge uv, then u appears before v in the list. This ordering is possible only if the graph is acyclic, with no loops of directed edges. As it happens, the network of equations for the World3 model is not acyclic. Here’s one section of the network that violates the no-loops rule:

Causal loop marked

If you try to assign a value to Labor Utilization Fraction (near the upper left), you’ll see that you first have to know the value of Jobs; before you can evaluate Jobs, you need to know Potential Jobs in Service Sector; etc. Continuing to trace through the red arrows reveals that before you can calculate Labor Utilization Fraction, you need to know Labor Utilization Fraction. Uh oh. The model can be made computable only by artificially interrupting such loops. In this instance the break is made by assigning an arbitrary initial value to the variable Labor Utilization Fraction Delayed.

From DYNAMO to JavaScript. In a 1989 memoir, Forrester tells this story about the origins of system dynamics:

An expert computer programmer, Richard Bennett, worked for me when I was writing the 1958 article, “Industrial Dynamics—A Major Breakthrough for Decision Makers,” for the Harvard Business Review…. For that article I needed computer simulations and asked Bennett just to code up the equations so we could run them on our computer. However, Dick Bennett was a very independent type. He said he would not code the program for that set of equations but would make a compiler that would automatically create the computer code.

Bennett’s policy in this matter was sensible and wise; I foolishly ignored it. I didn’t want a general-purpose compiler for system-dynamics models; I just wanted to implement this particular model. So I didn’t bother to structure the code in a way that would separate the model equations from the algorithms that process those equations. Big mistake.

My original plan was to write a prototype version in Lisp (my native tongue) and then redo it in JavaScript for wider distribution. I abandoned that idea when I ran short of time—another mistake. I was ignoring a Brooksism: Plan to throw one away; you will anyhow. The program now running is the one I need to throw away. The code is a mess. Please avert your eyes.

I don’t blame JavaScript for this situation. This is the third Javascript project I’ve taken on in the past few months, and I’m finding the whole ecosystem—JavaScript itself plus HTML5 and CSS3, along with the developer tools built into Google Chrome—quite a pleasant place to work and play.

My basic strategy was to make a line-for-line translation of DYNAMO statements into JavaScript. A typical DYNAMO equation looks like this:

SC.K = SC.J + (DT)(SCIR.JK - SCDR.JK).

SC is service capital, a level (or vat) variable; SCIR and SCDR are rate (or valve) variables representing the service capital investment rate and the service capital depreciation rate; DT is the numerical integration period; juxtaposed parentheses indicate multiplication. And what about the appended letters J, K and JK? They are “timescripts”: J designates the previous moment, K the current moment and JK the interval between J and K. All this notation carries over into JavaScript with remarkably little fuss. If we represent a variable such as SC as a JavaScript object, the timescript notation is unchanged, with SC.J and SC.K denoting properties of the object SC.

Apart from transcribing the equations, it’s also necessary to provide half a dozen special operations such as smoothing, delaying and clipping signals. And there is a kludgy “table” facility for piecewise linear approximations of arbitrary functions.

Canvas vs. SVG. My last JavaScript project used Scalable Vector Graphics, so for this one I decided to try the main alternative, the HTML “canvas” element. Drawing on the canvas is very fast, but that’s about the only nice thing I can find to say about it. The canvas is simply a rectangular array of pixels, and drawn objects have no structure apart from the pixels that compose them. The curves making up a World3 graph cannot be moved or rescaled or otherwise altered without redrawing the entire graph. The animation effect, in which all the curves seem to gradually elongate, is an illusion: At each time step the entire curve is redrawn from the beginning. Thus drawing a curve of 400 segments actually calls for 80,200 operations.

SVG offers friendlier facilities. Not only can you draw objects piece by piece, but the objects retain their identity as objects; they become part of the DOM, the document object model. You can address them individually, change their colors and other properties, transform their geometry. It would be easy, for example, to highlight and label a curve on mouseover.

Firefox is the new Internet Explorer. (But so is the new Internet Explorer.) Making stuff for the web has become a lot more fun in the past year or two, thanks in large measure to the WHATWG process. There’s an accelerated pace of change in the standards community, and browser makers have been quickly implementing the latest proposals. For this project I needed not only the canvas element but also the “range” input element, which is supposed to create a slider-type control widget. In the Chrome, Safari and Opera browsers both of these components worked out of the box. Missing from that list of compatible browsers is Internet Explorer—the perennial Think Different browser. Also missing is Firefox, which is a little more surprising. It turns out that sliders have been on the agenda of the Firefox development crew for six years, but they remain unimplemented.

With mixed feelings, I installed a polyfill that allows the slider code to run in Firefox. (Thanks Frank Yan!) My feelings are mixed because this sort of spackling does nothing to encourage the Firefox developers to address the problem.

The big bug. Getting the program to the point where it would run at all was a tedious chore (150 equations to be retyped from a marginally legible printout), but was otherwise unremarkable. After I fixed a few typos and misplaced semicolons, the code compiled and ran without throwing error messages. Then the real challenge began. The output looked nothing like the graphs published in The Limits to Growth. My World3 was a much nicer place, with gradual population growth and a slow but steady gain in industrial output and food production. Try as I might, I could not get the world system to collapse in ruins the way it’s supposed to.

This went on for more than a week.

Yes, I did consider the possibility that my program was correct and the dozens of other implementations over the past 40 years were all wrong. But I’m not quite that much of an egomaniac.

What was the bug that caused me so much grief? JavaScript experts will see the error immediately. In one of those initialization routines needed to break a cycle in the graph of dependencies, I had written something like this:

if (typeof(v) == Number) { return v }.

The problem is that (typeof(v) == Number) will return false no matter what the type of v happens to be. If v is not a number, the predicate is obviously false. If v is a number, the result of typeof(v) is not Number but "number". As a Lisp guy, I just can’t get used to JavaScript’s stringiness. (I could have said (v.constructor == Number), but I didn’t.)

Glitches. Given this evidence of my slapdash coding and testing practices, it’s fair to ask how many bugs infest the rest of the program and whether any of the results should be trusted.

An obvious validation strategy is to set all parameters to default values and then compare the output of the program with that of the original 1972 model. That’s not so easy. The graphs published in The Limits to Growth have no numerical scales (apart from markers for the endpoints of the time axis). Hence all I can do is check that the peaks and valleys have the right phase relationships. My eyeball says most of them match reasonably well, but this methodology does not inspire great confidence.

Some smaller-scale features of the curves also demand attention.

One conspicuous oddity is not a bug—or at least not my bug. Take a look at this detail of a graph of population (orange), birth rate (yellow), death rate (purple) and life expectancy (gray):

Glitch in 1940

It looks like something really strange happened in 1940. And in World3 something did: There’s an abrupt switch between two table functions, changing the effect of health services on lifespan. The death rate plunges; there’s a brief blip in the birth rate; life expectancy ratchets upward and then keeps growing steadily. The abruptness of the transition looks highly unrealistic, but this is not the result of a programming error. It’s part of the model specification.

On the other hand, the little blips in the birth and death curves at the very start of the simulation are not part of the model specification. I think I understand where they come from: It’s an initialization problem. The initial birth and death rates are not in equilibrium with other elements of the model, and it takes several iterations to eliminate the imbalances. As far as I can tell, however, these glitches do not appear in the 1972 output, so I must have misunderstood something about the model structure or the initialization procedure. I’m still looking into it.

It’s in the nature of writing software—or writing English prose, for that matter—that as soon as you finish a project, you see all the mistakes and missed opportunities with great clarity, and you feel that if you could just start over and do it all again, you’d finally get it right. I’m feeling that impulse right now, and I may act on it. But in light of my recent experience, I’m not making any promises.

Painting the world with pixels

4 March 2012

You are guiding your cart down the aisles of the supermarket when the price tag on the Cheerios beckons to you. Literally. An animated figure on the shelf tag waves and signals for you to come closer. When the array of cameras embedded in the shelf gets a better look at you, the tag offers you a deal, lowering the posted price. Enjoy your breakfast.

This fantasy of “dynamic pricing” is not a product of my own fevered imagination. I heard about it the other day at the March meeting of the American Physical Society, in a session on reflective color displays. Animated and interactive price tags were suggested as a motivating application for this technology. According to a speaker from Hewlett-Packard, the other components of the personalized pricing system (the cameras, the image-processing software, the communications network) are already available or soon will be; the main constraint is the display, which needs to provide reasonable image quality (comparable to newspapers) at low cost and low power. HP is at work on meeting this need (and so are several other companies and research groups). For some details of the Hewlett-Packard work, see this technical report.

I yield to no one in my technophilic ardor, but I have to say that the prospect of animated, haggling grocery-store price tags does not fill me with yearning for the future. And I suspect the idea might meet with a certain amount of legal and social resistance. (What are the acceptable criteria for adjusting prices offered to a given customer? Age? Gender? Race?)

But if the underlying display technology succeeds—and the demos already look pretty impressive—it’s not just price tags that I worry about. We could find ourselves up to our eyeballs in pixels. The aim is to manufacture the display material by a high-volume, roll-to-roll process. If they can get the cost down to $100 per square meter, a supermarket shelf tag might cost a dime or a quarter, a bumper sticker would be a dollar or two, an advertising card in the subway might be worth $10, and a billboard by the highway $20,000. Wrapping an entire Wal-Mart in digital ads could be done for roughly $1 million. Each of these surfaces would present a moving, video-like image—a window onto another world, covering up some fraction of this one.

Sign reading

Opelousas, La., 14 January 2001

Security theater on the web

3 March 2012

Perhaps the most important security concept within modern browsers is the idea of the same-origin policy. The principal intent for this mechanism is to make it possible for largely unrestrained scripting and other interactions between pages served as a part of the same site (understood as having a particular DNS host name, or part thereof), whilst almost completely preventing any interference between unrelated sites.

That’s Michal Zalewski of Google, in his Browser Security Handbook (now also available in expanded form as a book, The Tangled Web). I had thought I understood the same-origin policy, both how it works and what it’s for. Turns out I was totally wrong about the how-it-works part—about how the policy is enforced by the browser. Now that I’ve been straightened out on that point, I’m more confused than ever about the purpose of the policy.

This uncomfortable episode in my education began with my post about knowls, the little drawers full of knowledge that I look upon as a step toward footnotes on the web. In that article I pointed out a problem with the concept of a “knowlpedia,” a public repository of knowls: the same-origin policy won’t allow a web page loaded from one server to incorporate HTML content from another server. Thus when you are reading a web page hosted at bit-player.org, code within that page can freely access knowls that are also stored at bit-player.org, but it cannot retrieve knowls from aimath.org.

Harald Schilly, the author of the knowl code, wrote back that he had a one-line fix for the same-origin problem. The one line is “Access-Control-Allow-Origin: *”; it’s an HTTP header to be returned by the server of the “foreign” content. I was skeptical of this solution. In fact, I was absolutely certain it could not work. Let me explain why.

If a web browser is going to prevent illicit cross-site communication, how can it do so? Easy! Your browser knows the origin of the page you’re looking at right now: It came from http://bit-player.org:80 (where “http” designates the Hypertext Transport Protocol, “bit-player.org” is the host name, and “80″ is the port number). If code within this page tries to access foreign HTML—say by requesting a knowl at http://aimath.org:80—the browser detects the mismatched host names and refuses to allow the request. I had always assumed that in a case like this the requesting packets are never sent from your computer to the foreign server. That’s why I was so sure that no amount of fiddling with server configurations could have any effect on the same-origin policy, because the request would be blocked long before it reached the server.

This was my mental model of same-origin enforcement, and it still strikes me as the most efficient, sensible and even obvious solution. However, the model is totally wrong. Browsers do not block a request that violates the cross-origin rules. The browser merrily sends the request to the foreign server, awaits the response, and then dumps the content of the response without inserting it into the displayed document or otherwise showing it to the user. This behavior seems so pointless and wasteful, and possibly risky, that I had to confirm for myself that it really happens. It’s not hard to do so. The debugging tools built into modern browsers will show you the headers of each request and response. Here’s what Firefox reports when I try to get a knowl from aimath.org:

Request headers:
Accept:text/html, */*; q=0.01
Accept-Encoding:gzip, deflate
Accept-Language:en-us,en;q=0.5
Connection:keep-alive
DNT:1
Host:aimath.org
Origin:http://bit-player.org
Referer:http://bit-player.org/2012/the-knowl-post
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac
     OS X 10.7; rv:10.0.2)
     Gecko/20100101 Firefox/10.0.2

Response headers:
Accept-Ranges:bytes
Connection:close
Content-Length:489
Content-Type:text/html
Date:Sat, 03 Mar 2012 16:34:53 GMT
Server:Apache/2.0.64 (Red Hat)

Note that the response headers indicate a content length of 489 bytes. None of that content is actually loaded into the page or displayed to the reader, but the server is sending it. I confirmed this with a packet sniffer (Wireshark) that intercepts data moving over the network connection. The full content of the requested knowl is sent back to the browser, but then it’s deep-sixed before anybody sees it.

What I don’t get is why browsers implement the same-origin policy in this roundabout way. What’s the point of sending the request if you know you’re going to ignore the response? I suppose web-site redirection is one scenario where sending the message might not be futile: If aimath.org redirects the request back to bit-player.org, then the transaction can be allowed to proceed. But how common is that?

Very likely there’s some other good reason for doing it this way. (Having been persuaded that my first hypothesis was totally bogus, I’m willing to entertain the possibility that I still don’t understand clearly.) But none of the tutorials and reference documents I’ve consulted (see list below) have explained it to me.

The “Access-Control-Allow-Origin: *” header that Schilly mentioned is part of a recent W3C draft standard called Cross-Origin Resource Sharing, or CORS, that lifts some of the strictures imposed by the same-origin policy. For simple requests, the browser will accept and display results from a foreign site if the appropriate header is included in the response. Thus the third-party site is given a measure of control over whether or not cross-origin requests are allowed. The CORS proposal goes back to 2005, but browsers began supporting it only in 2009 or 2010. (Opera hasn’t caught up yet; Internet Explorer does it a little differently.)

I don’t pretend to understand all the implications of this change in the way the web works. Presumably, the scenario the designers have in mind is something like this:

Naive User visits sneakthief.com, a web site that plays amusing videos of kittens while running a JavaScript program that requests cross-origin access to fortknox.com, sending along the cookies that authenticate Naive User as an acount holder at Fort Knox. If fortknox.com responds with the keys to the vault, they will be transmitted back to sneakthief.com. The protection against this outcome is our faith that fortknox.com will not carelessly set the Access-Control-Allow-Origin header. I would have felt a little safer if the response from fortknox.com were blocked unconditionally, regardless of header flags. And safer still if the web worked my way, and the request were blocked before it could even be sent.

Zalewski comments that the main rationale for introducing CORS is that there are so many other ways of undermining or circumventing the same-origin policy (iframes, server-side proxies, JSONP, hidden forms, Flash, Java) that we might as well build a well-structured and well-documented facility for doing what everybody is doing anyway. In other words, leave the doors unlocked so nobody will smash a window while breaking in.

This may well be the wisest policy. Zalewski offers this meditation in the epilogue to his book:

I am haunted by the uncomfortable observation that in real life, modern societies are built on remarkably shaky ground. Every day, each of us depends on the sanity, moral standards, and restraint of thousands of random strangers—from cab drivers, to food vendors, to elevator repair techs…. In this sense, our world is little more than an incredibly elaborate honor system that most of us voluntarily agree to participate in. And that’s probably okay….

It’s difficult to understand, then, why we treat our online existence in such a dramatically different way…. The only explanation I can see is that humankind has had thousands of years to work out the rules of social engagement in the physical realm…. Unfortunately for us, we have difficulty transposing these rules to the online ecosystem, and this world is so young, it hasn’t had the chance to develop it’s own, separate code of conduct yet.

Other resources on this topic:

 

PDF vs. HTML

18 February 2012

The March–April American Scientist has been out for a couple of weeks. My “Computing Science” column looks at the future of scientific illustration in a world where we do most of our reading not on paper but on a screen of some kind—a screen that has a computational engine behind it.

My hope for those future illustrations is that they won’t just sit on the page looking pretty. They’ll do stuff. They’ll make good use of the available computing machinery. They’ll invite the reader to interact and explore. Below is my attempt to give an example of the kind of illustration I have in mind. It’s an interactive version of the familiar population pyramid, based on data from the U.N. and built with HTML, CSS, SVG and the d3.js JavaScript library from Michael Bostock. Please play.

I’m optimistic and enthusiastic about the digital future of science publishing. The digital present is another matter. One indicator of how far we still have to go is that some of you have no idea what that population-pyramid illustration is supposed to look like, because it doesn’t appear on your screen. If you’re reading this with an older web browser, you may see a static placeholder illustration or an error message or nothing at all. If you’re reading the RSS feed, I offer my apologies, but there’s nothing I can do to help.

Apart from problems of accessibility and compatibility, there’s a deeper issue I want to address here. If the aim is to enhance the literature of science, we have to keep in mind that very little of that literature takes the form of HTML, CSS, SVG and JavaScript. Almost all journal articles and preprints are distributed as PDFs. I have no idea how to make my little animated population pyramid work inside a PDF.

“PDF” stands for Faux Paper Document. PDFs look just like their printed prototypes, with all the typographical niceties—justification, hyphenation, kerning of letter pairs, ligatures. It’s as if the words and pictures had been skimmed off the page and pasted onto the screen (which is in fact pretty close to how the process works). This fidelity to the print tradition follows directly from the history of PDF: It was an outgrowth of PostScript, which was an outgrowth of InterPress, which was developed as software for use in the printing trades. The successful imitation of paper documents is a laudable achievement. As a culture, we have several hundred years of effort invested in learning how to present information effectively and attractively on the printed page; we shouldn’t let that go to waste. However, using a multigigahertz, multigigabyte computing machine as a standin for a sheet of dried cellulose seems a bit of a waste. It’s rather like early printers striving to reproduce the stylistic quirks of scribes writing with quill pens.

HTML can’t match the designerly refinement of PDF, but it is more versatile, livelier, and even playful. HTML documents do tricks. Where PDF is for the suits, HTML/CSS/JavaScript is the home of hackers.

But, again, few scholarly papers are published in HTML. There are a number of reasons for this, but to me the one that seems most salient is a certain lack of thinginess. In my column I write:

Why do authors and readers prefer PDFs for this kind of publication? One factor may be this: A PDF is something you possess. You download it from a server, give it a name, store it in a folder. It’s yours; it stays put. A website built out of HTML has a different character. It’s not a thing you own but a place you visit. You can’t take it home with you—although perhaps you can send a postcard or keep a small souvenir in the form of a bookmark.

“HTML” is an abbreviation for Highly Temporary Markup Language.

If we take this view seriously, then what the world needs is a way to encapsulate HTML documents so that they become first-class, discrete objects—things you can keep, rename, pass around, copy, delete, annotate, modify. For years I thought that this capability should be built into web browsers—that you should be able to press a button to download and store a fully functional and totally self-contained local copy of any web page. My favorite browser of the 1990s, called iCab, came pretty close to this ideal, but for many modern web pages the client-side approach is either unworkable or undesirable. With pages that rely on technologies such as AJAX, it’s not possible to make a fully self-sufficient local copy. And often there are page elements you don’t really want to include in your private copy, such as navigational menus and “Like” buttons and comment forms.

An alternative strategy is to let the author of the HTML document take responsibility for creating a downloadable, autonomous version, with content tailored for that environment. A technology for doing this sort of thing already exists: It’s the EPUB format used by various eBook readers. An EPUB document is essentially a collection of HTML files, CSS stylesheets and various forms of metadata wrapped up in a zip archive. SVG is supported, along with MathML. The 3.0 standard says JavaScript is also acceptable, but that statement is accompanied by a list of warnings that sound like the side-effect disclosures in a pharmaceutical ad.

Apple’s new iBooks Author program also produces some kind of encapsulated HTML, but of course it works only in the Apple sector of the universe. If you’d rather affiliate with a different proprietary dominion, there’s something called CDF from Wolfram Research. (The initials stand for “Conrad’s Document Format.”)

Still another approach would be to stick with PDF but make it more fun. According to the PDF 1.7 specification (which, as you might guess, comes in the form of a PDF), a JavaScript compiler is supposed to be available within PDF documents, and Adobe has an Acrobat JavaScript API document. But as far as I can tell scripting is commonly used only for validating forms and playing slideshows.

The latest version of Acrobat does have a pretty cool interactive viewer for three-dimensional objects. Three years ago Alyssa Goodman and her colleagues at Harvard published a paper in Nature that made use of that viewer. This was apparently the first scientific publication to include a 3D PDF. I don’t know of another example since. And I have never seen any other kind of interactive graphics embedded in a PDF.

For more on all this, I invite you to read my column in the format of your choice: good old-fashioned paper, fine artisanal HTML (with whiz-bang JavaScript graphics), or the pixelated form of paper we call PDF.

17 x 17 = $289.00

8 February 2012

This just in from Bill Gasarch: The quest for a rectangle-free four-coloring of the 17-by-17 grid is over. If you don’t know what that’s all about, and you’d like to find out, see Bill’s blog post from 2009 or my earlier comments (one, two, three).

Sq17 2012 01

The coloring above (along with several others) was found by Bernd Steinbach of Frieburg University and Christian Posthoff of the University of the West Indies. They win the $289 prize that Gasarch had offered for a solution.

Gasarch has posted more details on the Computational Complexity blog. But apparently we’re going to have to wait until May to learn how the coloring was found. The one interesting clue revealed so far is that the paper will be presented at the International Symposia on Multiple-Valued Logic.

Addendum 2012-02-11: For the benefit of those who don’t read the comments, I repeat a remark from reader Craig:

What I don’t like about this solution (or, indeed, the solutions to other related problems) is the arbitrariness. Is it not the case that you should be able to permute the rows and columns of any solution and arrive at a new solution? That being the case, I feel that you should be able to adjust the rows and columns here in order to highlight some kind of revealing pattern in the colouring.

Indeed, the matrix of dots shown above is one of (17!)2 equivalent solutions. How should we choose one of those permutations to serve as a representative of the class?

Personally, I’m not optimistic about finding a “revealing pattern” in any of the 126513546505547170185216000000 permuted matrices. If there were some simple, concisely described rule governing the arrangement of the dots—other than the no-monochrome-rectangle criterion itself—then we could make use of that rule to find a solution, or at least to reduce the search space. When Steinbach and Posthoff reveal their secret method, maybe we’ll learn that such a rule exists, but I doubt it.

Even if we can’t make a pretty picture by permuting rows and columns, it would still be useful to have some canonical ordering, so that we could easily determine whether two arrays are members of the same equivalence class. I’m not sure how best to do that, but if we assign an ordering to the colors, we can at least sort them.

sorted matrix

There is still some arbitrariness here. We have not dealt with the 4! orderings of the colors. Is there a smarter way to go about it?

The Knowl Post

5 February 2012

Imagine, if you will, the year 2020, when a billion people around the planet are at their screens. And each is able to withdraw from a great repository any fragments of anything that has been published, as well as the private documents he or she has access to. So, you’re able to bring to your screen not just encyclopedias, not just novels, not just the works of Horace and Cicero and Marcus Aurelius and Shakespeare and Goethe, but obscure stuff from South America and Africa that people have written in the last 5 minutes. And [you're able] to make comments and footnotes and to transclude and quote from anything else that’s published, with automatic royalty.

That’s Ted Nelson, the prophet of hypertext, writing in Byte in 1990. We still have a few years to go before 2020, and yet almost everything Nelson foresaw is already upon us. More than two billion people around the world are sitting at their screens. The encyclopedias and novels are online, along with Horace, Cicero and the rest. Twitter will show you what’s been written in South America and Africa in the last five minutes. For commenting, transcluding and quoting, we have Facebook and many other thriving channels of commerce and communication (even blogs like this one). So what remains to be invented before we arrive in Nelson’s stately pleasure dome of Xanadu? Well, “automatic royalty” hasn’t shown up yet. More surprising—and more annoying—we still don’t have footnotes on the web.

It’s a worrisome lack. How are scholars to annotate and document—not to mention digress and distract—without footnotes or some similar device? In a recent essay Alexandra Horowitz cites Edward Gibbon, J. L. Austin, Nicholson Baker and David Foster Wallace as authors whose work would be diminished or even destroyed if stripped of notes. I would add Vladimir Nabokov to the list, for his novel-in-notes Pale Fire. Martin Gardner deserves mention as well—not for his columns in Scientific American, where footnotes and marginalia were sadly forbidden, but for his annotated editions of Lewis Carroll and others. And the most important example of all is the Talmud, the archetypal hypertext of modern times. We should be able to do stuff like that that in HTML!

Why are footnotes not a standard feature of web life? Perhaps because the founders believed that the central mechanism of the web—the anchor tag <a href= … >—adequately served the purpose. And people do use <a> tags for notes. For example, there’s the Wikipedia protocol, where clicking on a footnote link[1] sends you on a trip to the bottom of the page, where you’ll find a helpful caret to take you back where you came from. This scheme does a good job of mimicking the ink-on-paper experience of footnotes—and ignores all the capabilities and possibilities of a new medium.

Back in 1995, the HTML 3.0 proposal included a <fn> tag for footnotes. But HTML 3.0 was stillborn, and the 3.2 version that replaced it a few months later dropped footnote support (and much else). HTML5 now offers the <aside> tag, which sounds like it ought to be the semantically correct way to mark up footnotes. But I have yet to see any website in the wild actually using <aside> that way. And the (still tentative) HTML5 standard suggests that footnotes were not the the original intent of the tag:

The aside element … can be used for typographical effects like pull quotes or sidebars, for advertising, for groups of nav elements, and for other content that is considered separate from the main content of the page.

It’s not appropriate to use the aside element just for parentheticals, since those are part of the main flow of the document.

Yeah, whatever. But what makes the web such a grand playground is that you can always build your own tools and toys if you don’t like the standard kit. Which brings me to all those blue links with dotted underscores in the paragraphs above. I assume you’ve tried them out by now. The device is called a knowl, and I discovered it the other day while browsing on the home page of the American Institute of Mathematics in Palo Alto. The knowl’s inventor is Harald Schilly of the University of Vienna.

It’s all done with a dab of jQuery and a dollop of CSS. The markup in the HTML file is just a mutant anchor tag:

<a knowl="wikitag.html">Wikipedia protocol</a>

where the normal “href” has been replaced by “knowl.” The jQuery code installs an onClick handler for each instance of this structure found in the text; the onClick function calls out to the server, loads the content of the note, and adds that text to the document tree as a sibling of the current node. CSS rules add a bit of border styling. The little drawer-like pane opens just below the current paragraph (or whatever other HTML element contains the reference to the knowl).

The knowl is not quite everything I would wish for in a footnote utility. A minor issue is that the “next-sibling” rule for placing the text of the knowl doesn’t always do the right thing. A less-minor issue is that the authoring process is overly arduous. Every knowl goes into a separate file, so writing the text requires a change of mental focus and also creates a lot of file-system clutter. I’d rather include the text of the note at the point of reference (the way it’s done in LaTeX, say). That could be accomplished with an easy change to the JavaScript. On the other hand, the system’s reliance on JavaScript is in itself problematic. The world of knowls is off-limits to those who write on hosted platforms such as WordPress.com or Blogger, because they forbid JavaScript in posts. I wonder if the same visual effects could be achieved entirely with CSS3 animations?

When Schilly and his colleagues at AIM developed the knowl, their goal was not to satisfy my footnote fetish. As a matter of fact, they seem to have a rather different vision of how knowls might be used—not as a medium for the Shandean digressions of self-indulgent writers like me but as shared nuggets of wisdom, offered as a public resource. David Farmer, director of programs at AIM, writes: “I envision a time when the Internet has a repository of such knowls, reliable and ready to be referenced anywhere.”

Unfortunately, there’s a technical impediment to making that vision a reality. When I started writing this post, I thought it would be only appropriate to transclude the definition of transclusion that’s given in one of the knowls above. So I constructed a knowl linking to the file on aimath.org where that text exists. This knowl has such a link. If you click it, you’ll find that it doesn’t work: The little blue drawer slides open, but it is empty. The reason is that the page you are reading now was downloaded from bit-player.org, but when you click that knowl, it tries to access HTML content from aimath.org; that attempt runs afoul of the browser’s “same-origin policy.” There are ways of evading this security provision, but they come with a faint scent of hackery. Without them, though, I don’t think we can have our public repository of knowls. It looks like we’ll each have to serve up our own wit and drollery.

Update 2012-02-28: Harald Schilly has shown me that transclusion is not a vain dream after all. The solution relies on a fairly recent technology called Access Control for Cross-Site Requests, and it requires a cooperating server to set an appropriate flag in an HTTP header. This knowl is fetched from appspot.com, the hosting arm of the Google App Engine, which allows cross-site connections from anywhere. If you are reading this page with a recent Webkit browser (Google Chrome, Safari) or a recent Mozilla browser (Firebox), the magic drawer should open with a note inside. But Opera does not implement the method, and I think Internet Explorer is also a holdout, although I don’t have the means to check. I was totally unaware of this trick, and I’m grateful to Schilly for enlightening me.

^ Hi! I’m a Wikipedia-style footnote. I feel lonely and exposed and painfully out of context down here.