Archive for November, 2009

El Farol Highway

Friday, November 27th, 2009

traffic-jam-9146.jpg

I got caught Wednesday night in the national pre-Thanksgiving traffic jam. As I was approaching Baltimore, an electronic signboard announced:

Delays on I-95 and I-895

Suggest I-695

Of course I immediately thought: But everyone will read the sign and take I-695, so that road will be jammed too. Then I thought: But everyone who reads the sign will realize that everyone else will also read the sign, and so they’ll not choose 695. Then I thought….

Hmm. There’s something familiar about this problem.

I’m not telling which road I took, but I can report that it was the only congestion-free segment of my trip.

The birth of the giant component

Friday, November 20th, 2009

The waste product of my document scanning project is a slag heap of extracted staples:

staples-in-bowl-2667.jpg

The other day I made a discovery: If you grab one of the discarded staples and lift it, the whole ball of tangled, mangled metal comes along, leaving behind only a few stragglers in the bottom of the bowl.

ball-of-old-staples-closer-2691.jpg

When I noticed this, my first thought was “Hmm, that’s funny.” My second thought was “Oh, of course: Erdős-Rényi.” And my third thought–well, I’m still working on my third thought, as well as thoughts four, five and six.

Erdős and Rényi are Paul Erdős and Alfréd Rényi, who wrote a big paper on “The Evolution of Random Graphs” 50 years ago (Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 1960, 5:17–61). Their paper wasn’t quite the debut appearance of random graphs in the mathematical literature, but it’s usually cited as the theory’s point of origin.

In one version of the Erdős-Rényi process, you start with a set of n isolated vertices and then add random edges one at a time; specifically, at each stage you choose two vertices at random from among all pairs that are not already connected, then you draw an edge between them. It turns out there’s a dramatic change in the nature of the graph when the number of edges reaches n/2. Below this threshold, the graph consists of many small, isolated components; above n/2, the fragments coalesce into one giant component that includes almost all the vertices. “The Birth of the Giant Component” was later described in greater detail in an even bigger paper–it filled an entire issue of Random Structures and Algorithms (1993, 4:233–358)–by Svante Janson, Donald E. Knuth, Tomasz Luczak and Boris Pittel.

What made me think of a connection between Erdős-Rényi graphs and my hairball of staples? What I had in mind was something like this: The long, straight, middle part of a staple corresponds to an edge of a graph, and the bent end pieces, which can grab on to each other, are the vertices. Thus a single staple is a graph consisting of two vertices connected by one edge. When two staples hook up, two of their vertices merge and we’re left with a connected graph of three vertices and two edges. Since each staple contributes one edge and at most two vertices to the graph, the number of edges must be at least half the number of vertices. Thus the graph is always over the threshold for forming a giant component, according to Erdős and Rényi.

The counting part of this analysis seems okay, but I’m afraid the rest of it doesn’t hold up very well. Whatever is going on in the staple ball, the evolution of the system is not well modeled by the Erdős-Rényi process of adding edges to a fixed set of vertices. Instead, each staple brings both an edge and two vertices. The crucial event that makes the cluster hang together is the merging of vertices when staples link hands; this merging has no counterpart in the Erdős-Rényi process.

The underlying problem here is that Erdős-Rényi graphs are purely topological–there’s no concept of distance, and any two vertices are equally likely to have an edge joining them. But the staple graph has important geometric constraints. Two vertices can be joined by an edge only if the distance between them is approximately equal to the length of a staple.

The geometric structure suggests trying a different kind of model–perhaps the kind that describes the molecular structure of liquids and solids. Water molecules, for example, are linked together by a network of hydrogen bonds; each hydrogen atom in one molecule can bond to the oxygen atom in another molecule. But the bonds cannot extend over arbitrary distances; they reach only between neighboring molecules. In the resulting three-dimensional structure the basic motif is a tetrahedron with an oxygen atom at its center and hydrogen atoms at the four corners. (There’s also a schematic two-dimensional model known as square ice.) We might imagine something similar going on with the staples, where the two bent ends can form hydrogen-bond-like links to other nearby staples.

But there’s a problem with such chemical models as well. Atoms have a fixed valence (more or less–let’s not quibble); old-staples-detail-2697.jpgin water, for example, each hydrogen atom can form a hydrogen bond with only one oxygen atom. But we have no reason to suppose that the hooked ends of a staple can attach to only one other staple. As a matter of fact, if such a restriction were enforced, then the staples could form only chains and rings, not dense clusters. In a close look at the actual clusters, it’s easy to find places where three or more staples are all hooked together at the same point. By carefully teasing apart the cluster, I have spotted vertices that appear to have a degree of at least six.

Yet another physical process that might provide a model for the staple graph is diffusion-limited aggregation. This is the mechanism responsible for the filigree pattern in the banner at the top of the bit-player web page. It is generated by sticky particles that drift at random until they wander onto the substrate or touch another particle that is already in contact (directly or indirectly) with the substrate. For staples, I suppose the drifters would be tumbling dumbbells with sticky ends–somewhat harder to simulate.

Another factor to keep in mind is that spatial dimensions are surely important here. For one thing, there’s just more room to maneuver in three dimensions, with more opportunities to glom onto a neighbor. But in the specific case of staples, there’s another reason: Confined to a plane, they have a hard time linking up:

staples-on-plate-detail-2681.jpg

Dispersed on a flat plate, they refuse to coagulate even when swirled vigorously. The reason, presumably, is that secure links form only when the staples can turn 90 degrees and interlock.

In this connection it would seem significant that these are used staples, somewhat varied in shape, with hooked ends that had been bent approximately 180 degrees in the process of stapling and that mostly retained an angle greater than 90 degrees after being pried out the papers. I wondered how shiny new staples would behave, and so I tried the experiment. (Materials and methods: 630 Stanley Bostitch chisel-point staples, model SBS191/4CP, freshly dispensed from an open-jaw Swingline stapler.)

ball-of-new-staples-crop-2700.jpg

I was mildly surprised at the result. Although the aggregation was somewhat looser and more delicate, it really wasn’t that much different. Again we witness the birth of a giant component.

Information is physical

Wednesday, November 11th, 2009

I’m still busy digitizing a lifetime’s accumulation of clippings from magazines and journals, along with heaps of old tech reports, memos, and miscellaneous other cruft. There’s something slightly eerie about the process. So far I’ve emptied out a dozen file drawers, run several hundred pounds of paper through the scanner, and created thousands of PDFs. Yet my laptop is not a gram heavier. The glib explanation is that I’m just scraping pure information off the pages, leaving behind the ink and cellulose; I’m saving the bits and recycling the atoms. But is information so readily dematerialized? One of the manila folders I have just dredged up out of a filing cabinet is bulging with publications by the late Rolf Landauer, including several papers on the theme “Information is physical!”

I first met Rolf circa 1980. I had written a brief Scientific American article about some recent developments in optical computing technologies, and Rolf called to tell me I should never do anything so reckless and foolish and tasteless again. He took a dim view of photonics. This initial encounter was not a promising start to a friendship, but we got over it. He put me on his mailing list, which meant that I got a fat envelope once or twice a year, with reprints or preprints of his own latest work and often copies of other papers he thought I should being paying attention to.

Four of the articles in my Landauer folder have very similar titles:

  • Information is Physical (Physics Today, 1991).
  • The Physical Nature of Information (Physics Letters A, 1996).
  • Information is a Physical Entity (Physica A, 1999).
  • Information is Inevitably Physical (In Feynman and Computation, 1999).

If Landauer had lived longer (he died in 1999), I like to think that the next installment in the series would have been titled even more emphatically: Information is Physical, Damn It!

In all of these essays, Landauer’s thesis is straightforward:

Information is inevitably tied to a physical representation. It can be engraved on stone tablets, denoted by a spin up or down, a charge present or absent, a hole punched in a card, or many other alternative physical phenomena. It is not just an abstract entity; it does not exist except through a physical embodiment. It is, therefore, tied to the laws of physics and the parts available to us in our real physical universe.

This notion is obvious and totally uncontroversial–except to those who think it’s totally wrong. Doubters tend to focus on mathematical entities. Surely the integers exist as abstractions, independent of stone tablets and punchcards, no? And triangles would have three sides even if all the matter in the universe were annihilated–right? When it comes to numbers like π and e, one might well argue that they can exist only as abstractions; they can never be given a complete physical representation.

Landauer did not argue strenuously for his constructivist position within mathematics itself, but he did take a hard line about mathematical methods in the physical sciences:

There is a tendency to think of mathematics as a tool which somehow existed before and outside of our physical world. Mathematics, in turn, allowed the formulation of physical laws which then run the world, much as a process control computer runs a chemical plant. Here, instead, we emphasize that information handling has to be done in the real physical world, and the laws of physics exist as instructions for information handling in that real world. It, therefore, makes no sense to invoke operations, in the laws of physics, which are not executable, at least in principle, in our real physical world.

Our accepted laws of physics invoke continuum mathematics, which is, in turn, based on the notion that any required degree of precision can be obtained by invoking enough successive operations. But our real universe is unlikely to allow an unlimited sequence of totally reliable operations. The memory size is likely to be limited, perhaps, because the universe is limited. Even in an unlimited universe it is a strong presumption to invoke the possibility of assembling an arbitrarily large organized memory structure. Furthermore, in a world full of deleterious processes including noise, corrosion, electromigration, incident alpha particles and cosmic rays, earthquakes and spilled cups of coffee, it would be unreasonable to assume that each step in an unlimited sequence of operations can be carried out infallibly.

Those alpha particles and spilled cups of coffee bring me back to my little document-scanning project–my kitchen-table version of Google Books. I am well aware that my digitized archives are not disembodied abstractions, that the information I’ve scanned from Rolf’s preprints is still physical even if it’s less tangible, and that the bits remain vulnerable to all the perils of a material world. Indeed, the thought of losing all the files I’ve scanned–now that the paper originals are beyond recall–makes me itchy to plug in the back-up drive.

But the process of replicating the bits–which is even easier than capturing them in the first place–sends my mind off on another tangent. As Rolf said, we can represent information in many physical forms: as marks on paper, as magnetized domains on a metal-coated disk, as packets of electric charge, as base pairs in a DNA molecule, as beads on an abacus. When we build machinery to process this information, we can choose among many different computing technologies: transistors, brass gears, neurons, rubber bands and tinker toys, quantum dots, even photons in optical waveguides (though Rolf despised that last possibility, and he was skeptical about the quantum dots).

Somehow, this proliferation of physical embodiments for information does not strengthen the conviction that information is subordinate to its physical representation. When we can write the same message in so many forms–everything from lines in the sand to holograms–the message itself begins to seem just as substantial as the physical medium, and perhaps more enduring. I have digital documents that began life on eight-inch floppy disks 20 years ago. The files have migrated a dozen times or more to other media: five-and-a-quarter-inch floppies, three-and-a-half-inch floppies, Zip drives, digital audio tapes, CD-ROMs, a succession of hard disks. Most of the physical objects making up that long chain of transmission have long since succumbed to coffee spills, corrosion and other hazards, or else they have simply gotten lost. Yet the data persists, a sort of standing wave in the river of hardware rushing toward obsolescence and oblivion. Under the circumstances, it can be hard to keep in mind that the information depends for its very existence on those delicate shards of matter. It goes against the grain of the whole apparatus of computer science, where automata theory, the Church-Turing thesis, and the Turing equivalence of programming languages all encourage us to think that abstractions come first, and implementation is secondary.

Euclid1703.png

These musings are not meant as an attempt to refute Landauer’s assertion. I still have to concede that I cannot record or express a pattern of bits without resorting to some physical medium, if only the gray matter in my own head. But the notion sits uncomfortably; it’s a conundrum. I wish I had a chance to chat with Rolf about it. But, sadly, Rolf Landauer is no longer physical.

Note: As far as can tell, none of the four Landauer papers I mention above are available online without payment. I am therefore taking the liberty of posting my scan of Rolf’s preprint of the information-is-a-physical-entity paper. I would also like to call attention to two recent articles about Landauer: a discussion of his contributions to solid-state physics by Bertrand I. Halperin and David J. Bergman, and a biographical memoir by Charles H. Bennett and Alan B. Fowler.

A comment on comment spam

Friday, November 6th, 2009

Someone out there is being paid to post comments on bit-player.org–and doubtless on tens of thousands of other blogs as well. The comments are mostly bland and inoffensive, sometimes effusive, always hastily composed. “Thanks for article..good work,” they say. “Amazing!!” “i like your article and i will be wating your net article….”

The payload attached to each of these comments is a link to a web site that someone wants to promote. Some of the sites are selling goods or services; others are billboards full of pay-per-view ads; a fair number are mysterious to me, being written in languages I don’t understand. I would not be astonished to learn that some of the sites are distributing malware.

Years ago, the first wave of comment spam was powered by scripts that flooded blogs and wikis and forums with hundreds of postings full of program-generated gibberish and long lists of links. That abuse was stopped by captchas and other simple filters, like the one I’ve been using here on bit-player. Another important defense is the “nofollow” tag, which instructs search engines to ignore links in comments, thereby eliminating the incentive of gaining PageRank points.

The comment spam arriving now is not generated by a Perl script. Somewhere in the world a person is being paid to read these very sentences, then to prove his or her humanity to the Turing-test filter, and finally to write a few words in response and sneak in a paid link. I’m both fascinated and appalled to learn that the Internet economy can support this activity. What’s the going rate for writing comment spam? Is it worth a penny to get your link briefly exposed to the vast daily readership of bit-player.org? How about a tenth of a penny?

I have a sinking feeling that the people doing this work are themselves victims of a scam, and that they’ll never see even the tenth of a penny. They have probably succumbed to a 21st-century version of the ads I used to see on matchbook covers: “Work at home! Make $500 a week stuffing envelopes in your spare time!”

Of all the ways that poor and desperate people are exploited, this is not the worst. Presumably the work is safe and sanitary, and it even rewards literacy. Some of my comment spammers would surely have interesting ideas to contribute if only they had the luxury of time.

All the same, this kind of commercial graffiti is not something I want to encourage. The available countermeasures include prohibiting all links in comments, holding all comments until a moderator approves them, or requiring commenters to register with a verifiable email address. None of these options appeals to me, but I may have to consider them if the problem persists. For now, though, I’m going to continue the human approach–manually deleting spammy comments as quickly as I can get to them. I am also closing comments on all but the 10 most-recent items on bit-player; the spammers seem to favor older posts.

I have to add that spotting comment spam is not always as easy as you might think. Consider this comment, which came in response to a story about editorial changes at Scientific American magazine:

Many times, when i read your American Scientist columns, I have asked myself that is any other country’s scientist didn’t give anything to the world?

The text of the comment is pertinent to the topic; it raises a question that’s entirely appropriate in this context; and there’s clear evidence that the author has actually been reading bit-player (and even my American Scientist columns) rather than merely spewing comments at random. This is someone I would like to be able to welcome into the community. But the link associated with the comment was an ad for a web-hosting service, and another comment from the same IP number advertised a different service. Was I wrong to hit the delete button?

You’re welcome to comment below, but without spammy links, please.