Archive for the ‘science’ Category

PageRank for physicists

Thursday, April 20th, 2006

Scientists are selfless seekers after truth, unswayed by worldly emoluments, immune to the tawdry enticements of fame, indifferent to prizes and honors. Thus I can’t quite imagine why anyone would bother ranking a collection of scientific papers by applying the algorithm that Google uses to decide which Web pages deserve the most prominent display. Nonetheless, if you’re an author of anything published in Physical Review or its various offshoots over the past century or so, P. Chen, H. Xie, S. Maslov and S. Redner have calculated the Google PageRank of your publications. The database they analyzed consists of 353,268 papers published in various American Physical Society journals between 1893 and 2003.

The best-known measures of “impact” for scholarly publications are based on counting citations; a paper’s impact rises when other papers refer to it. The PageRank algorithm applies this principle recursively. If paper A is cited by paper B, that fact will be weighted more heavily if paper B itself gets many citations. Chen et al. adapted this method of evaluation to the 3,110,839 citations within their network of Physical Review articles. They found that most of the papers with a high PageRank score are well-known works that also get high marks under other kinds of citation analysis. But there were some surprises—papers they call scientific gems. An example is a 1980 article by H. Rosenstock and C. Marquardt titled “Cluster formation in two dimensional random walks: Application to photolysis of silver halides.” Only three other articles in the database cite Rosenstock and Marquardt, and thus it remains an obscure publication from the point of view of most citation analyses; but it rises to 85th place in the PageRank standings because one of those three citing papers is itself a highly rated work. (The latter paper, by T. Witten and L. Sander, is “Diffusion-Limited Aggregation, a Kinetic Critical Phenomenon,” with 680 citations as of June 2003. Chen et al. note: “The Witten and Sander article has only 10 references; thus a substantial fraction of its fame is exported to [Rosenstock and Marquardt] by the Google PageRank algorithm.)

Unfortunately, the database used by Chen, Xie, Maslov and Redner is not publically available, and so there is no convenient way for physicists to check their own PageRank standings. But, then again, physicists are so egoless, I’m sure they wouldn’t bother anyway.

See: Finding Scientific Gems with Google, on the arXiv.

Today’s catch

Tuesday, March 28th, 2006

Every morning I go fishing in the arXiv. Or at least that’s the way I’ve been thinking about this daily ritual: I cast my net over the waters and look to see what strange and wonderful creatures I’ve brought up from the deeps. Today it hit me that I have the metaphor backwards. I’m the fish, and what’s going on here is that the arXiv dangles lures in front of me to see if I’ll take the bait. Some days I’m just not biting. Today, however, I was snapping at one hook after another. I’ll be the first to admit that I sometimes choose a brightly colored bit of fluff in preference to a nutritious worm. (Although I did pass on today’s proof that P ≠ NP—or was it the converse?)

physics/0603229
Title: Laws of Graph Evolution: Densification and Shrinking Diameters
Authors: Jure Leskovec, Jon Kleinberg, Christos Faloutsos

How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing super-linearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log (log n))).

cond-mat/0603718
Title: Statistical Mechanics of Community Detection
Authors: Joerg Reichardt, Stefan Bornholdt

Starting from a general ansatz, we show how community detection can be interpreted as finding the ground state of an infinite range spin glass…. The community structure of the network is interpreted as the spin configuration that minimizes the energy of the spin glass with the spin states being the community indices. We elucidate the properties of the ground state configuration to give a concise definition of communities as cohesive subgroups in networks that is adaptive to the specific class of network under study. Further we show, how hierarchies and overlap in the community structure can be detected….

physics/0603215
Title: Computer simulation of language competition by physicists
Authors: Christian Schulze, Dietrich Stauffer

… About every ten days a human language dies out, and in Brazil already more than half of the indigenous languages have vanished as a result of the European conquest. On the other hand, Latin has split in the last two millennia into several languages, from Portuguese to Romanian…. Thus similar to biology, also languages can become extinct or speciate into several daughter languages.

In contrast to biology, humans do not eat humans of other languages as regular food, and thus one does not have a complex ecosystem of predators eating prey as in biology. Instead, languages are meant for communication, and thus there is a tendency of only one language dominating in one region, like German in Germany etc. Will globalisation lead to all of us speaking one language in the distant future? For physics research, that situation has already arrived many years ago….

Thus in the history mankind we may have had first a rise, and later a decay, in the number of different languages spoken. In Papua New Guinea there are now 103 languages, each spoken by about 103 people; can this situation survive if television and mobile phones become more widespread there?

While we cannot answer these questions, we can at least simulate such ”survival of the fittest” among languages, in a way similar but not identical to biology….

physics/0510151
Title: Trainspotting: Extraction and Analysis of Traffic and Topologies of Transportation Networks
Authors: Maciej Kurant, Patrick Thiran

The knowledge of real-life traffic pattern is crucial for good understanding and analysis of transportation systems. This data is quite rare. In this paper we propose an algorithm for extracting both the real physical topology and the network of traffic flows from timetables of public mass transportation systems.

math.SP/0603630
Title: Sharp bounds for eigenvalues of triangles
Authors: B. Siudeja

The purpose of this paper is to prove the following theorem.

Theorem 1.1. Let T be a triangle in a plane of area A and perimeter L. Then the first eigenvalue λT of the Dirichlet Laplacian on T satisfies

\\frac{\\pi^2L^2}{16A^2} \\le \\lambda_T \\le \\frac{\\pi^2L^2}{9A^2}

The constants 9 and 16 are optimal.

math-ph/0603065
Title: Quasicrystals: algebraic, combinatorial and geometrical aspects
Authors: Edita Pelantová, Zuzana Masáková

Voronoi tiling

math-ph/0603068
Title: A Spinorial Formulation of the Maximum Clique Problem of a Graph
Authors: Marco Budinich, Paolo Budinich

In this paper we propose a new representation of the maximum clique problem in complex space. After a brief review of this famous NP-complete problem, we show how the adjacency matrix of a graph can be expressed as the square of a symmetric complex matrix. The vectors forming this matrix have zero length and Cartan has shown that this geometry can be treated elegantly with spinors…. We finish with a formulation of the maximum clique problem in this formalism and show that each graph uniquely identifies a spinor whose properties surely deserve deeper studies.

Old Man River

Wednesday, March 22nd, 2006

Tobol and Tavda Rivers, from Space Shuttle

Luna B. Leopold, the foremost American student of rivers and the landscapes they create, died February 23 at age 90. I never met him, but I’ve been a follower and a fan of his work, which I first encountered in 1966, when Scientific American published an article titled “River Meanders,” by Leopold and W. B. Langbein. Not many magazine articles stay with you for 40 years; this one was unforgettable. Before I read the article, landform features such as the sinuous path of a stream had seemed quite arbitrary or accidental—or they would have seemed so if I had ever bothered to think about them at all. Now they were suddenly revealed to have an explanation, an internal logic. With a bit of physics and mathematics and imagination, you could follow the steps of the dance by which rivers carve the land and the land guides the rivers.

Many years later I caught up with Leopold’s major textbook, Fluvial Processes in Geomorphology (co-authored with M. Gordon Wolman and John P. Miller, published in 1964 by W. H. Freeman). This is a much more engaging and accessible work than the title might suggest, and I can recommend it without reservation as summer-vacation reading, especially if you’re going to spend your vacation in a fluvial landscape—perhaps kayaking the Colorado. The book begins with a chapter composed almost entirely of questions provoked by simply looking at the land. Why do the Sandia Mountains in New Mexico rise abruptly and steeply on their west front but slope more gradually to the east? How did the Susquehanna River in Pennsylvania carve a channel straight through Kittatinny Mountain, Blue Mountain and Tuscarora Mountain, whereas lesser streams flow parallel to these ridges? How did ancient beaches on the coast of California wind up stranded 30 or 40 feet above sea level? In 1964 Leopold and his colleagues didn’t yet have answers for all these questions, but they did have a methodology for approaching them, as well as faith that the forces shaping the surface of the earth were something we could ultimately expect to understand.

The methodology was largely mathematical, and it’s interesting that Leopold, Wolman and Miller felt it necessary to apologize for that. “We are aware that the feelings of professional geomorphologists about numbers, graphs, and formulas range from acceptance and enthusiasm to bewiderment and forthright hostility. We have not gone out of our way to be mathematical, but whenever we felt that mathematics contributed either clarity or brevity to the discussion, it has been used.”

A notable early success of Leopold’s quantitative method was his discovery that the shape of a river channel and the flow of water through it can be understood through a set of simple power laws. At any given point along the course of a river, the width and the depth of the channel and the velocity of the water are each proportional to some power of the total quantity of water passing that point per unit time. For most rivers, the total volume increases steadily downstream; hence the width, the depth and the velocity must all increase monotonically as a river flows from its headwaters to its mouth. When this finding was first published (in a 1953 paper written by Leopold and Thomas Maddock, Jr.) it was surprising and controversial. Everyone knew that rivers get wider downstream, but the common impression was that they also get slower (because the landscape is generally flatter), and in most cases shallower (because of sediment buildup). Leopold and his colleagues made extensive measurements along some 20 rivers and showed that it just isn’t so. Width, depth and speed all increase steadily in the downstream direction. (The graphs below are for the Powder River at Locate, Montana. Straight lines plotted on a log-log scale are the telltale mark of a power law.)

Power River graphs: width, depth and velocity versus discharge

Leopold’s explanation of river meandering also came as a surprise to many people. One might guess that anything falling from a height—including water flowing from the mountains to the sea—would seek out the line of steepest descent, or in other words the shortest path to the bottom. But a meandering river is obviously not following the shortest route to the sea. In some cases the length of the river is four or five times that of a direct route.

The image at the top of this article, made from the Space Shuttle Endeavour in 1994, shows the elaborate meanders of the Tobol and Tavda rivers in the West Siberian Plain. Why do rivers adopt (or create) such sinuous channels? The key to understanding Leopold’s answer to this question is the curvature of the path—a measure of how sharply it twists and turns. Assuming the path is reasonably smooth, it’s possible to measure the curvature at each point along a river’s course and to calculate a total curvature: the sum (or the integral) of the curvatures measured at all points along the path. Actually, the quantity of greatest interest is the sum of the squares of all the measured curvatures. Leopold found that for any given total channel length, the favored route is the one that minimizes the total squared curvature. In other words, the river tries to distribute its “budget” of curvature as evenly as possible all along its course, avoiding sharp bends where the square of the curvature would be very large. If such a sharp bend should ever develop, its banks would be rapidly eroded, restoring a minimum-total-curvature configuration.

I am fascinated by the way that Leopold and his colleagues happened upon this least-curvature principle. In the 1966 Scientific American article Leopold and Langbein write:

We first recognized the principal characteristics of the actual curve traced out by a typical river meander in the course of a mathematical analysis aimed at generating meander-like curves by means of “random walk” techniques…. In our random-walk study one of the constraints we adopted was that the path was to begin at some point A and end at some other point B in a given number of steps. In other words, the end points and the length of the path were fixed but the path itself was “free.”

The mathematics involved in finding the average, or most probable, path taken by a random walk of fixed length had been worked out in 1951 by Hermann von Schelling of the General Electric Company. The exact solution is expressed by an elliptic integral, but in our case a sufficiently accurate approximation states that the most probable geometry for a river is one in which the angular direction of the channel at any point with respect to the mean down-valley direction is a sine function of the distance measured along the channel.

Note that the path itself is not a sine wave; it is a curve generated by letting the direction vector vary sinusoidally as one moves along the length of the path. (Leopold calls it a “sine-generated curve.”)

There’s more to this story than I can tell here. If the goal is to minimize the total squared curvature of the path, that’s easy enough to do: Just draw a straight line from A to B. This suggests again that the line of steepest descent ought to be the favored route for a river. The reason for lengthening the path and then bending it into a sinuous curve is not fully explained in the Scientific American article, but Leopold and Langbein discuss it in a 1962 paper titled “The Concept of Entropy in Landscape Evolution.” Here they also give details about some remarkable early experiments in computer simulations of river action.

Leopold transformed the study of landscapes by insisting on quantitative methods, but he was certainly not just a numbers guy. Luna Leopold was the son of Aldo Leopold, one of the founders and heroes of the conservation movement in the U.S. It was Luna who assembled and edited his father’s essays, which were published posthumously in 1948 as A Sand County Almanac. Luna Leopold himself later became a passionate advocate for environmental preservation, entering into the controversies over the Alaska pipeline and a proposal to build an airport in the Florida Everglades.

Leopold studied civil engineering and meteorology before taking a Ph.D. in geology at Harvard. He had a long career at the U.S. Geological Survey, serving as Chief Hydrologist for a decade. During his years in Washington he continued doing field work by setting up gaging stations on Watts Branch, a stream that flows into the city from the eastern Maryland suburbs, joining the Anacostia River a few miles from the Capitol. In a 1994 book, A View of the River, Leopold urged amateurs in “backyard hydrology” to do the same with any nearby stream, and he offered some practical advice: “The easiest way to measure velocity is by floats, and the best float is an orange peel. It has just the right specific gravity to float nicely at the surface, it is brightly colored and thus easily seen, and it is readily available.”

After leaving the Geological Survey in 1972, Leopold taught at the University of California, Berkeley. He was still actively working in the weeks before his death. In an obituary for the Washington Post Patricia Sullivan writes:

Well known for his scientific fieldwork, he also made bows and arrows, hunted and fished, rode horses, composed piano and guitar music, danced, flew planes, painted landscapes, wrote poetry, bound books, acted on stage, built furniture, claimed to cook strawberry shortcake in a camp Dutch oven and told campfire stories. He floated on a raft through the Grand Canyon to measure the depth of the Colorado River.

The portrait below (courtesy University of California, Berkeley) shows him at work on the East Fork River near Boulder, Wyoming, circa 1974, “attired in his typical Stetson, Filson cruiser coat and gloves.” When we make the movie of his life, we’re going to have to bring back Gary Cooper.

Luna B. Leopold

One further note: The Department of Earth and Planetary Science at Berkeley has set up a web site, The Virtual Luna Leopold Project, that lists 182 of Leopold’s publications and provides PDF copies of more than 100 of them, including all the articles and papers mentioned above, and also an excellent American Scientist article from 1962, titled simply “Rivers.”

Rediscovering America

Sunday, February 5th, 2006

In today’s New York Times (registration required), Gina Kolata writes on rediscoveries and reinventions in the sciences. Her essay is based in part on the experience of Rakesh V. Vohra of Northwestern University, who has discovered that one of his own results has been rediscovered at least 16 times. Kolata also quotes Stephen M. Stigler of the University of Chicago, who points out that the very notion of repeated rediscovery is something that has been repeatedly rediscovered. (But Stigler modestly does not mention Stigler’s Law of Eponymy, which says that discoveries are never named for the people who first discovered them.)

In keeping with this theme of oft-repeated repetition, there’s a recent paper in the arXiv that re-covers some of the same ground. M. V. Simkin and V. P. Roychowdhury of the University of California at Los Angeles review 20th-century work on various probabilistic models, on the structure of networks, on branching processes in genetics and taxonomy, on self-organized criticality, and more. Over and over again, they find that things have been discovered over and over again. For example:

The distribution of words by the frequency of their use follows a power law. This fact was discovered sometime before 1916 by Estoup, re-discovered in 1928 by Condon, and once more in 1935 by Zipf. Nowadays it is widely known as Zipf’s law. To explain this observation, Simon proposed [a] model…. In 1976 Price used Simon’s model to explain a power-law distribution of citations to scientific papers, which he discovered in 1965. He proposed to call it “cumulative advantage process”. Simon’s model was re-discovered in 1992 by Günter et al and in 1999 by Barabasi and Albert. In the latter case it acquired a new name: “preferential attachment”.

(In the quotation above I have silently elided several references, as well as a long passage describing Simon’s model.)

Although Simkin and Roychowdhury don’t say so directly, I get the impression that they take a dim view of all this rediscovery, seeing it as wasteful duplication of effort. But it’s time for someone to stand up in defense of rediscovery. When we find that the same mathematical law turns up in distant realms of life, that in itself is a discovery of some note, and surely worth rediscovering a time or two. Simkin and Roychowdhury conclude: “We estimate that scientists are busy re-discovering America about 2/3 of time.” I would respond by quoting Alvin E. Roth and Marilda A. Oliveira Sotomayor, in Two-sided Matching: A Study in Game-theoretic Modeling and Analysis:

Columbus is viewed as the discover of America, even though every school child knows that the Americas were inhabited when he arrived, and that he was not even the first to have made a round trip, having been preceded by Vikings and perhaps by others. What is important about Columbus’s discovery of America is not that it was the first, but that it was the last. After Columbus, America was never lost again; no subsequent explorers can claim its discovery.

Update: Soon after writing the above I discovered that the Kolata article had already been discovered by Lance Fortow, who also (will coincidence never cease?) quotes another version of the Columbus quote.