Archive for the ‘social science’ Category

Dotted lines

Tuesday, October 5th, 2010

Where I grew up, a dotted line ran through the neighborhood, just beyond my back yard. On maps, that line marked the boundary between the city of Philadelphia and its inner-ring suburbs. On the ground, it was a racial divide—absolute and knife-edge sharp. Our side was all white. The public schools I attended had an enrollment of roughly 8,000, with just three black students. The community on the other side of the line was our racial mirror image, almost entirely black.[*]

Revisiting the old neighborhood 50 years later, I have been pleased to find the boundary softened and blurred somewhat. Families have drifted across the line in both directions. However, it’s not yet time to celebrate the end of residential segregation in American cities.

I’ve recently learned about a remarkable set of maps showing population distribution by race and ethnicity in more than 100 metropolitan areas. The maps were created by Eric Fischer, a Bay Area programmer with an interest in cartography and urban life (and also, incidentally, the author of a wonderfully detailed history of ASCII). Here’s Fischer’s map of the Philadelphia area, based on block-level data from the 2000 U.S. Census:

Eric Fischer map of race and ethnicity in Philadelphia

color-key.pngEach dot represents 25 people, coded according to the color key at right. The image is at reduced resolution, and I’ve had to crop it slightly to fit this space. For a clearer view I recommend looking at the full-size and full-resolution images (3,000 × 3,000 pixels), which are all available on Fischer’s Flickr stream under a Creative Commons license.

Below is a detail of Philadelphia’s western boundary. The Schuylkill River winds along the right edge of the frame; I’ve added a black circle to mark my childhood turf.

close up of Philadelphia's western boundary

And here’s Fischer’s map of Detroit, the most extreme case in the whole collection, with the city’s northern boundary sharply delineated along Eight Mile Road:

Eric Fischer's map of race and ethnicity in Detroit

I suppose no one will be shocked to learn that racial divisions persist in the U.S., but I do think these maps offer particularly vivid evidence. Fischer was inspired to create the maps by earlier work of Bill Rankin, a historian and cartographer currently at Harvard. Using Census data, Rankin mapped the distribution of income as well as the geography of race and ethnicity in neighborhoods of Chicago and its suburbs. Rankin writes:

Any city-dweller knows that most neighborhoods don’t have stark boundaries. Yet on maps, neighborhoods are almost always drawn as perfectly bounded areas, miniature territorial states of ethnicity or class.

An apt example of those “miniature territorial states” is on exhibit in the map of Philadelphia reproduced below, which was prepared in 1936 by the Home Owners’ Loan Corporation:

1936 Philadelphia redlining map

The boundary lines drawn here determined where home mortgages were available; the red “hazardous” areas were effectively off-limits to lenders. It’s well known that there was a strong correlation between race and the “redlined” areas of such maps. (At the time, the West Philadelphia neighborhood near where I would later live was rated “still desirable,” or in other words mostly white. The change came after World War II.)

The process of creating a “miniature states” map from distribution data involves at least two levels of abstraction. First you have to carve the mapped area into distinct regions, choosing where to draw the boundaries either by eye or by some algorithmic method. Then you flatten the data within each region, turning what is surely a heterogeneous area into a uniformly pink or blue or yellow district.

For his Chicago maps Rankin adopted a more direct alternative: In each census block he drew a dot of the appropriate color for each 25 people of a given racial group or income category. The dots were randomly placed within the blocks. For example, my boyhood block near Philadelphia is listed in the 2000 census as having a total population of 95, of whom 66 are white, 25 are black, and 4 are Asian. Thus there ought to be 2.64 red dots, one blue dot and 0.16 green dots in the map area corresponding to that block. (How best to deal with fractional dots is an interesting methodological question.) Rankin drew the maps with the ArcGIS geographic information system, which has a built-in function for random-dot mapping. When Fischer undertook his 100-city mapping project, he wrote his own code for dot placement, based on a simple approximation. Instead of choosing random coordinates within the polygons that define the census blocks, Fischer placed the dots at random within disks of equivalent area, centered on an “internal point” that the Census Bureau specifies for each block. With this scheme some of the dots may stray outside the bounds of a census-block polygon, but the inaccuracy is probably minor at the scale of a metropolitan area. If one were to refine the technique, it might be helpful to replace the arbitrary constant of 25 persons per dot with a parameter that depends on the scale of the map. Thus close-up views of neighborhoods would have finer resolution.

To my taste, the dotty style of mapping has at least two major advantages over the “miniature states” approach. First, a single graphic device successfully conveys two kinds of information; it shows overall population density as well as racial/ethnic composition. (The empty areas of these maps are sometimes as intriguing as the populated ones—it’s fascinating to see how much land we are willing to cede for golf courses, airports and cemeteries.) Second, in the dotted maps, boundary lines are not imposed on the data but rather emerge from the data. Moreover, we can see at a glance just how hard-edged or fuzzy each boundary is.

•     •     •

Beyond matters of cartographic technique, there is the question of what social meaning we should attribute to these maps. Why are so many cities divided into large monochrome domains? In the 1950s, the whites-only status of some neighborhoods was enforced by coercive means—legal and illegal, and occasionally violent. That has changed, and yet the boundaries persist. Why? This is a huge question, the subject of learned dissertations, and I don’t pretend to have an answer. But I would like to say a word about one well-known mathematical model that seems to offer hope of a benign explanation.

 

In the late 1960s Thomas C. Schelling, an economist now at the University of Maryland, devised a simple lattice model of residential segregation. Quoting myself:

Black and white residents, initially scattered at random over the nodes of the lattice, were assumed to prefer living among neighbors of the same race; those who were unhappy with their current surroundings could move. Schelling’s most provocative finding was that it doesn’t take vicious bigotry to produce a sharply segregated housing pattern; even the mildest preference for neighbors of the same race leads to a phase separation.

Thus we are invited to believe that our social landscape is a product of congregation rather than segregation.

Do the Rankin and Fischer maps lend any support to this notion? Well, the maps don’t look much like computer simulations of the Schelling model (there are dozens on the web), which tend to yield sinuous, pulsing blobs of population, like zebra stripes or leopard spots, and not at all like Eight Mile Road. But maybe that’s just because the simulations are run on a perfectly uniform background, whereas real cities have rivers and freeways and other physical barriers, as well as political and administrative boundaries, not to mention gradations in the size and price of houses. I suppose it’s appropriate to say that further research is needed.

It will probably be a few years before we have block-level results from the 2010 census. When those numbers start coming in, I look forward to revised versions of these maps. I’m hoping they’ll look a little fuzzier.

[Note: Schelling's main paper on the segregation model does not seem to be available online. The journal reference is: Schelling, Thomas C. 1971. Dynamic models of segregation. Journal of Mathematical Sociology 1:143--186. Dietrich Stauffer and Christian Schulze have written a lucid, somewhat critical, description and evaluation of Schelling's model, available at arXiv:0710.5237.]

Update 2010-10-24: Bill Rankin writes to let me know that he has a Philadelphia map prepared with his more-precise technique of placing dots at random within the bounds of census-block polygons. Below is a detail of West Philadelphia and some of the adjacent suburbs. The complete map is available here.

race distribution in West Philadelphia and suburbs, map prepared by Bill Rankin

 

The big blip

Monday, May 24th, 2010

If you were an astute or lucky stock trader on the afternoon of May 6, you could have bought shares of Accenture PLC for a penny each and sold them a minute later for almost $40. Or you could have invested in Sotheby’s for about $30 a share and, if your timing was right, sold out at a price of $99,999.9999. Did you miss those moneymaking opportunities? Don’t kick yourself too hard. Those particular trades were canceled by the exchanges as “clearly erroneous errors.” But millions of other bizarre transactions were allowed to stand, even though prices were fluctuating wildly.

A preliminary report on these events was released last week by a joint committee of the Commodity Futures Trading Commission and the Securities and Exchange Commission. The report reads a lot like an inquiry into an airplane crash, evoking both horror and fascination. But whereas the investigators of aircraft accidents usually come up with a likely cause, the CFTC/SEC committee makes clear that they don’t yet understand what happened on May 6, and it seems possible we’ll never know.

daylong-avg-prices.png

Throughout that day, stock prices were trending lower, a decline attributed mainly to worries about the European economy. But those concerns can’t account for the extraordinary crevasse the market fell into and then climbed out of between 2:30 and 3:00 p.m. The Dow Jones Industrial Average (blue) and the Standard and Poor’s 500 index (green) both lost 6 or 7 percent of their value in less than 10 minutes, then gained it all back. If those price changes are extrapolated to all U.S. stocks, something like a trillion dollars went missing for half an hour. (The red line in the graph, labeled E-Mini S&P 500, refers to a stock futures contract, which I’ll discuss below.)

What could cause such rapid whipsawing? The first speculations implicated a “fat-finger trade”—a data-entry error. There have been several such events in recent years; for example, in 2005 a Japanese broker who meant to sell 1 share of stock at a price of 610,000 yen keyed in instructions to sell 610,000 shares at 1 yen. However, the committee finds no evidence of such goofs on May 6.

The committee also dismisses the Procter & Gamble theory, put forward by commentators on CNBC who noticed a particularly sharp break in the stock of that company (one of the 30 Dow components).

The decline in PG did not begin until 2:44 p.m., well after the broader market indices, which began their precipitous drop at approximately 2:40 p.m. Accordingly, early reports that an inordinately large trade in PG may have triggered the broad market decline do not appear well founded.

Various kinds of deliberate mischief have also been mentioned as possible causes. Maybe some secretive hedge fund has found a way to manipulate the market to its own advantage. Or a hacker might have infiltrated the computer networks that handle stock transactions. The glitch could even be an act of international terrorism. Again, the committee finds no signs of such malevolence but can’t entirely rule out the possibility.

The committee gives closer scrutiny to high-volume trading on the stock futures market, and in particular to the E-Mini S&P 500 futures, which offer a mechanism for betting on the value of the S&P 500 index a few weeks in the future. Traffic in S&P 500 futures was unusually heavy on May 6, and it spiked at the time of the big dip:

E-mini-price-and-volume.png

The price excursions were wide enough to trigger a “Stop Logic” system that halted trading for five seconds. Furthermore, transactions initiated by a single firm accounted for some 9 percent of the trading volume in the critical half-hour, and all of that firm’s activity was on the selling side. (The committee report does not name this firm, but others have identified it as Waddell & Reed, a mutual fund in Overland Park, Kansas.) So, do we blame it all on a mutual fund run amok in the KC suburbs? The committee thinks further investigation is warranted, but they also note that the same firm has made similar trades in the past, as have many other parties, all without causing a ripple in the wider market.

Two more items of Wall Street arcana that get a lot of attention in the report are stop-loss orders and stub quotes. A stop-loss order causes a stock to be sold automatically if the price falls below a specified threshold. Traders enter such orders in the expectation that the sale will take place at a price near the threshold level, but if prices are falling rapidly, there’s no assurance of that. For a few minutes on May 6, certain stop-loss orders had the effect not of stopping losses but of maximizing them. At the instant when the orders were executed, there were no purchase offers at any price higher than a penny, and so that’s the price the stocks sold for. The offers of $0.01 are thought to have been “stub quotes,” placed by brokers who act as market-makers and who are therefore obliged always to have both buy and sell orders in place. Stub quotes are a way of meeting this obligation at times when the broker doesn’t really want to be in the market. Trades are never supposed to be executed at the stub price, but that’s what happens if no one else is buying. (Transactions at $100,000 per share reflect stub quotes at the other end of the scale, for shares that no one else is willing to sell.)

•     •     •

If the Commodity Futures Trading Commission and the Securities and Exchange Commission don’t know what went wrong on May 6, then I’m sure I don’t know either. But a couple of points seem pretty obvious (which may be why the committee left them unstated).

First, whatever happened on May 6 must have been driven by the internal dynamics of the securities markets, not by events in the larger economy. No changes in the business prospects of Accenture PLC would justify 4,000 percent swings in the company’s market value within half an hour.

Second, there’s got to be some instability at work here—some positive feedback loop. A thousand-point dip in the Dow wasn’t just a freak coincidence, where millions of stockholders acting independently all chose to sell at the same moment, and then a few minutes later changed their minds and decided to buy. Rather, there must have been some mechanism whereby one trader’s decision to buy or sell induced other traders to do the same.

The committee report points out that stop-loss orders create one such destabilizing loop, which is hard-wired into the market machinery. If a stop-loss order on a particular stock is activated at $100, say, the sale of those shares might drive the market price down to $95, triggering more stop-loss orders and lowering the price still further, in a runaway cascade. More generally, any trading strategy that calls for following trends or tracking “market momentum” is susceptible to this kind of instability. For any one individual, selling out when the market sags may or may not be a prudent policy; but if everyone adopts such a rule, the outcome is certain disaster.

Positive feedbacks of some kind surely had a role in the crash of May 6, but they can’t be the whole story. If a wave of self-reinforcing selling accounts for the sudden dive in prices, what explains the equally sudden turnaround and recovery? And there’s an even deeper question. It’s not hard to dream up models in which every random fluctuation is amplified by positive feedback, but the result is an economy that experiences weird jolts and hiccoughs all the time. A useful theory of May 6 has to explain not only what happened on that day but also why it doesn’t happen routinely.

Some analysts have compared the May 6 event with the stock market crash of October 1987, which was even deeper than the recent dip, although it played out over a period of days rather than minutes. I have vivid memories of this event; I followed it on the radio (no CNBC in those days) and then I read the post-mortem reports. But apparently my memory is faulty in certain crucial details. The crash was blamed in large part on “program trading,” which I took to mean that computer programs were making buy and sell decisions in real time. The root of the problem, as I understood it then, was that multiple programs controlling large investments all shared the same basic logic, so that they would all react in the same way to changing market conditions. It turns out, though, that the computing machinery of the time was not up to operating in this online regime. Instead, the economic models were run in batch mode, and the trades were executed after the fact. There were people in the loop.

Today, in contrast, thousands of computers are plugged directly into the markets, and program trading is everywhere. The big hedge funds and other major players install their servers in colocation facilities next door to the major exchanges, as a way of reducing communication latency. For “high frequency traders,” transactions are routinely completed in about a third of a millisecond. From the point of view of these firms, the sudden market collapse on May 6 played out in slow motion. During the 10 minutes of tumbling prices, a trading rate of three transactions per millisecond allows time for 180,000 transactions.

Perhaps, then, the much-feared runaway automation of 1987 has finally caught up with us in 2010. Ironically, though, the CFTC/SEC report hints that if automated trading was behind the May 6 glitch, the problem might not be the presence of these traders but rather their sudden withdrawal from the market. Julie Creswell tells the story in The New York Times:

RED BANK, N.J. — Above the Restoration Hardware in this Jersey Shore town, not far from the Navesink River, lurks a Wall Street giant.

Here, inside the humdrum offices of a tiny trading firm called Tradeworx, workers in their 20s and 30s in jeans and T-shirts quietly tend high-speed computers that typically buy and sell 80 million shares a day.

But on the afternoon of May 6, as the stock market began to plunge in the “flash crash,” someone here walked up to one of those computers and typed the command HF STOP: sell everything, and shutdown.

According to Creswell, high-frequency traders account for between 40 and 70 percent of all the trading volume on U.S. securities markets, so the sudden departure of these market participants would certainly have a noticeable effect.

Almost everything about the stock market has changed utterly in the years since 1987. Back then, trading was done by guys in colorful blazers yelling at one another on the floor of the New York Stock Exchange. That trading floor still exists, but it’s a kind of Wall Street theme park, maintained for the benefit of visiting high school classes and CNBC cameras. Most of the actual trading in NYSE stocks is done across the river in Jersey City by electronic ”matching engines” that line up offers to sell with bids to buy. Once there were “specialists” in each stock who were expect to intervene with their own capital to damp out unwarranted price fluctuations. That role has not disappeared entirely, but in most modern markets no one has legal responsibility for maintaining stability. In 1987 most stocks could be bought and sold in only one venue; now, transactions are automatically routed to whatever exchange offers the best terms, including the ominously named “dark pools,” where shares change hands anonymously. Back then, brokerage fees and other transaction costs were high enough to discourage strategies such as high-frequency trading; now there is much less friction in the market. It’s a new world.

Even though the CFTC and the SEC have not yet sorted out the causes of the May 6 blip, they are already proposing remedies. The basic tool is the time out: When the market throws a tantrum, it will be told to sit in the corner for a few minutes. Many such rules already exist, some of them going back to 1987. The rationale is that a pause in trading will allow time for “additional liquidity to enter the market.” In other words, if everyone is selling in a panic, we wait a little while for some buyers to show up. Of course the pause might also allow time for more sellers to join the stampede.

A year ago, I was writing about the uneasy relations between economics and the engineering discipline known as control theory. That was in the context of macroeconomics, where the aim is to control cycles of boom and bust with a time scale of years or decades. The challenges of controlling securities markets are rather different: The time scale is much shorter, which means you have to act quicker, but on the other hand it’s much easier to measure what’s happening, to gather information second by second. But the biggest impediment to effective control is the same in both cases: It’s hard to control the dynamics of a system when you don’t understand those dynamics—when you can’t reliably predict what the system will do in the absence of control or how it will respond to control actions. Given the human element in economic affairs—including the likely presence of actors who will try to subvert any control strategy—it’s not clear that we can ever have that kind of predictive power.

 

The linguistic arrow of time

Sunday, February 24th, 2008

Two recent notes on the Language Log, by Sally Thomason and Mark Liberman, discuss a nutty book, The Secret History of the English Language, by M. J. Harper. I haven’t read the book, but according to the Language Loggers, Harper contends that everybody has the history of European languages totally backwards. We’ve been taught that Latin gave rise to Italian, French, Spanish and the other Romance languages, and that English comes from Germanic roots with an important dash of Romance. The real chronology is just the opposite, Harper says. Liberman gives this precis:

[T]he history, according to Harper, is that English developed into French, which developed into Provençal, which developed into Italian; and then at some point, say around 400 B.C., some Italian merchants invented Latin as a form of shorthand.

I mention this curious thesis not because I believe anyone should take it seriously, or even because I want to defend it under the constitution’s Freedom of Wiftiness clause. But there’s an interesting mental exercise here: Can we refute this notion without resorting to mere dull historical facts? Suppose we had no documentary evidence bearing on the history of languages, and we ignored giveaways such as vocabulary items that betray their time of origin. From internal clues alone, could we deduce that Latin came before French or English? Without the labels, how would we know that Old English is older than Middle English, which in turn is older than modern English?

The challenge is rather like that of doing evolutionary biology without the fossil record. Could we look at the fishes, amphibians, reptiles, birds and mammals, and from their anatomy and physiology alone determine which groups arose earlier and which later? For the biological case, there’s a widely accepted premise that a trend toward increasing complexity defines the arrow of time. The vertebrate heart, for example, has two chambers in fishes, then three in amphibians and reptiles, and four in birds and mammals. Although there are exceptional cases where this kind of reasoning will lead you astray, it seems to work more often than not.

If there’s a similar principle in linguistics, however, I don’t know what it is. When it comes to grammatical complexity, the arrow of time seems to point the other way. Latin, for example, had a more elaborate system of inflection in nouns and adjectives than the languages descended from it. English went through a similar decline in declensions, losing case and gender markers on adjectives and abandoning its thee’s and thou’s. So maybe the rule is that simpler languages come later? But that can’t be universally true, unless we accept the implausible assumption that the very first languages were immensely complicated. Furthermore, if there is a monotonic trend toward simpler syntax, where are we headed?

Many linguists would dispute the assertion that languages show a consistent tendency to become either simpler or more complex. Yes, English has lost the word endings that once marked nouns as accusative, dative, instrumental, etc.; but in compensation it has acquired a more nuanced system of prepositions and stricter rules about word order. In this view languages do not evolve from some primitive state toward greater sophistication; nor, contrary to Miss Thistlebottom’s dire predictions, do languages degenerate into brutish grunts whenever someone splits an infinitive or dangles a participle. Changes in grammatical structure could be nothing more than a random walk through the space of all possible linguistic features. But if that’s the case, then there’s not much hope of finding an intrinsic marker of priority between pairs of languages. And so Latin really could have been invented by a bunch of Italian-speaking merchants.

Even if pairwise comparisons are problematic, though, perhaps we could find a “thermodynamic” arrow of time in the overall evolutionary pattern of a large family of languages. In biology, speciation is generally a one-way process: lines that diverge almost never reconverge. Fishes and finches have a common ancestor but they will have no common descendants. Thus the graph of relations among species is a tree, with a suppositional single root (the progenitor of all living things) and lots of leaves and branches, but no closed loops. If languages evolve in roughly the same way as living organisms, then we should be able to orient ourselves along the time axis by observing whether branches split or merge. By this argument, it’s far more likely that Latin underwent fission to produce Italian, Spanish, Catalan, French, etc., than that a dozen closely related Romance languages underwent fusion to create Latin.

There are at least two problems with this line of reasoning. First, although measures of lexical or grammatical similarity yield a tree of language relations, they don’t provide a sure-fire method of identifying the root of that tree. If you gather various words for ‘100′, you might construct a tree that includes this fragment:

kemtree.png

The conjectural k’mtom form is the reconstructed Proto-Indo-European root from which all the other terms—and many more—are thought to derive. The pattern of connections in the tree is based on judgments of lexical similarity: hundred is closer to hundert than it is to cento or cent. This linkage pattern is an invariant of the topology: It survives intact no matter how you choose to present the tree geometrically. But the identification of k’mtom as the root of the tree is not something that follows directly from a comparison of the words themselves. The diagram below shows exactly the same tree:

huntree.png

It has the same nodes and the same pattern of connections between them; the only thing that has changed is the choice of which node to designate as root. And if we knew nothing else about the chronology of the languages, there would be no obvious reason for preferring one layout over the other. (But notice that we can’t produce any arbitrary tree without doing violence to the network structure. In particular, Harper’s fantasy of going from English to French to Italian to Latin doesn’t work.)

The second problem with trying to derive a chronology from the language tree is that the tree isn’t a tree; it’s a DAG, a directed acyclic graph. Languages drift apart, but then they merge again. English is a prime example: Its deepest roots are in the West Germanic languages of the Angles, the Saxons and the Frisians, but English also received important later contributions from the Danish and the Norman French. Thus if we took seriously the idea that languages undergo fission but never fusion, we would have to conclude that English was the source of all those other languages rather than the product of their merger.

Perhaps I’m missing something important. Maybe there really is some intrinsic clue to the direction of language evolution—some way of looking at the internal structure of Latin and English and saying which came first. Even if not, though, I’m not buying the idea of English as Ursprache and Latin as shorthand Italian.

Update 2008-02-28: Richard E. Dickerson of UCLA alerts me to an earlier and perhaps even more florid bit of nuttery in the same genre. It’s a book published in 1883 by Charles Lassalle: Origin of the Western Nations & Languages Showing the Construction and Aim of Punic; Recovery of the Universal Language; Reconstruction of Phoenician Geography; Asiatic Source of the Dialects of Britain; Principal Emigrations from Asia; and Description of Scythian Society. With an Appendix, Upon the Connection of Assyrian with the Languages of Western Europe and Gaelic with the Languages of Scythia. This is one of those works where the title tells all (and then some), but the complete volume is available through Google Books, and I couldn’t resist having a look. Here’s how Lassalle begins his story (I would quote briefly if that were possible, but…):

HAVING made scientific discoveries which, on account of their great importance and extent, have not been accomplished without heavy sacrifices—having, in fact, abandoned my business to follow up with more freedom, ardour and unity of action, the Scents that had offered themselves to me when following in literary leisure certain historical and linguistic researches which seemed and have turned out to be of the utmost significancy; having also recognised that I must, for a time, entirely give myself up to the study of my discoveries, or I might never arrive at the solution that was looming before me at a distance; not knowing even where my task was leading me, and, therefore, not at liberty to form an opinion whether my work would occupy me a longer or a shorter time; having arrived at the conclusion of the task I had imposed upon myself, and been successful far beyond my ambition and expectations; having, moreover, been several times stimulated and sent to seek deeper into the channels of science by the incredulity I met from many, that a commercial man could be successful upon subjects which, until now, had baffled all the efforts of learned professors, though their common sense should have told them, that upon topics so simple and technical as those of history, geography, and languages, a travelled commercial man, acquainted with most of the Western languages and some of the old ones, had, at least, as much chance to arrive at a linguistic discovery, and enlarge it upon geographical and historical bearings which his personal experience permitted him to grasp, as a sedentary professor, who, though much versed in Greek and Latin, was generally not familiar with many of our commercial Western languages, and had not the opportunity of comparing the various customs and dialects which so often meet the eye and ear of a mercantile man.

HAVING now reached a period, though not yet a full sentence, I stop.