bit-player | An amateur's outlook on computation and mathematics

Two recent notes on the Language Log, by Sally Thomason and Mark Liberman, discuss a nutty book, The Secret History of the English Language, by M. J. Harper. I haven’t read the book, but according to the Language Loggers, Harper contends that everybody has the history of European languages totally backwards. We’ve been taught that Latin gave rise to Italian, French, Spanish and the other Romance languages, and that English comes from Germanic roots with an important dash of Romance. The real chronology is just the opposite, Harper says. Liberman gives this precis:

[T]he history, according to Harper, is that English developed into French, which developed into Provençal, which developed into Italian; and then at some point, say around 400 B.C., some Italian merchants invented Latin as a form of shorthand.

I mention this curious thesis not because I believe anyone should take it seriously, or even because I want to defend it under the constitution’s Freedom of Wiftiness clause. But there’s an interesting mental exercise here: Can we refute this notion without resorting to mere dull historical facts? Suppose we had no documentary evidence bearing on the history of languages, and we ignored giveaways such as vocabulary items that betray their time of origin. From internal clues alone, could we deduce that Latin came before French or English? Without the labels, how would we know that Old English is older than Middle English, which in turn is older than modern English?

The challenge is rather like that of doing evolutionary biology without the fossil record. Could we look at the fishes, amphibians, reptiles, birds and mammals, and from their anatomy and physiology alone determine which groups arose earlier and which later? For the biological case, there’s a widely accepted premise that a trend toward increasing complexity defines the arrow of time. The vertebrate heart, for example, has two chambers in fishes, then three in amphibians and reptiles, and four in birds and mammals. Although there are exceptional cases where this kind of reasoning will lead you astray, it seems to work more often than not.

If there’s a similar principle in linguistics, however, I don’t know what it is. When it comes to grammatical complexity, the arrow of time seems to point the other way. Latin, for example, had a more elaborate system of inflection in nouns and adjectives than the languages descended from it. English went through a similar decline in declensions, losing case and gender markers on adjectives and abandoning its thee‘s and thou‘s. So maybe the rule is that simpler languages come later? But that can’t be universally true, unless we accept the implausible assumption that the very first languages were immensely complicated. Furthermore, if there is a monotonic trend toward simpler syntax, where are we headed?

Many linguists would dispute the assertion that languages show a consistent tendency to become either simpler or more complex. Yes, English has lost the word endings that once marked nouns as accusative, dative, instrumental, etc.; but in compensation it has acquired a more nuanced system of prepositions and stricter rules about word order. In this view languages do not evolve from some primitive state toward greater sophistication; nor, contrary to Miss Thistlebottom’s dire predictions, do languages degenerate into brutish grunts whenever someone splits an infinitive or dangles a participle. Changes in grammatical structure could be nothing more than a random walk through the space of all possible linguistic features. But if that’s the case, then there’s not much hope of finding an intrinsic marker of priority between pairs of languages. And so Latin really could have been invented by a bunch of Italian-speaking merchants.

Even if pairwise comparisons are problematic, though, perhaps we could find a “thermodynamic” arrow of time in the overall evolutionary pattern of a large family of languages. In biology, speciation is generally a one-way process: lines that diverge almost never reconverge. Fishes and finches have a common ancestor but they will have no common descendants. Thus the graph of relations among species is a tree, with a suppositional single root (the progenitor of all living things) and lots of leaves and branches, but no closed loops. If languages evolve in roughly the same way as living organisms, then we should be able to orient ourselves along the time axis by observing whether branches split or merge. By this argument, it’s far more likely that Latin underwent fission to produce Italian, Spanish, Catalan, French, etc., than that a dozen closely related Romance languages underwent fusion to create Latin.

There are at least two problems with this line of reasoning. First, although measures of lexical or grammatical similarity yield a tree of language relations, they don’t provide a sure-fire method of identifying the root of that tree. If you gather various words for ’100′, you might construct a tree that includes this fragment:

The conjectural k’mtom form is the reconstructed Proto-Indo-European root from which all the other terms—and many more—are thought to derive. The pattern of connections in the tree is based on judgments of lexical similarity: hundred is closer to hundert than it is to cento or cent. This linkage pattern is an invariant of the topology: It survives intact no matter how you choose to present the tree geometrically. But the identification of k’mtom as the root of the tree is not something that follows directly from a comparison of the words themselves. The diagram below shows exactly the same tree:

It has the same nodes and the same pattern of connections between them; the only thing that has changed is the choice of which node to designate as root. And if we knew nothing else about the chronology of the languages, there would be no obvious reason for preferring one layout over the other. (But notice that we can’t produce any arbitrary tree without doing violence to the network structure. In particular, Harper’s fantasy of going from English to French to Italian to Latin doesn’t work.)

The second problem with trying to derive a chronology from the language tree is that the tree isn’t a tree; it’s a DAG, a directed acyclic graph. Languages drift apart, but then they merge again. English is a prime example: Its deepest roots are in the West Germanic languages of the Angles, the Saxons and the Frisians, but English also received important later contributions from the Danish and the Norman French. Thus if we took seriously the idea that languages undergo fission but never fusion, we would have to conclude that English was the source of all those other languages rather than the product of their merger.

Perhaps I’m missing something important. Maybe there really is some intrinsic clue to the direction of language evolution—some way of looking at the internal structure of Latin and English and saying which came first. Even if not, though, I’m not buying the idea of English as Ursprache and Latin as shorthand Italian.

Update 2008-02-28: Richard E. Dickerson of UCLA alerts me to an earlier and perhaps even more florid bit of nuttery in the same genre. It’s a book published in 1883 by Charles Lassalle: Origin of the Western Nations & Languages Showing the Construction and Aim of Punic; Recovery of the Universal Language; Reconstruction of Phoenician Geography; Asiatic Source of the Dialects of Britain; Principal Emigrations from Asia; and Description of Scythian Society. With an Appendix, Upon the Connection of Assyrian with the Languages of Western Europe and Gaelic with the Languages of Scythia. This is one of those works where the title tells all (and then some), but the complete volume is available through Google Books, and I couldn’t resist having a look. Here’s how Lassalle begins his story (I would quote briefly if that were possible, but…):

HAVING made scientific discoveries which, on account of their great importance and extent, have not been accomplished without heavy sacrifices—having, in fact, abandoned my business to follow up with more freedom, ardour and unity of action, the Scents that had offered themselves to me when following in literary leisure certain historical and linguistic researches which seemed and have turned out to be of the utmost significancy; having also recognised that I must, for a time, entirely give myself up to the study of my discoveries, or I might never arrive at the solution that was looming before me at a distance; not knowing even where my task was leading me, and, therefore, not at liberty to form an opinion whether my work would occupy me a longer or a shorter time; having arrived at the conclusion of the task I had imposed upon myself, and been successful far beyond my ambition and expectations; having, moreover, been several times stimulated and sent to seek deeper into the channels of science by the incredulity I met from many, that a commercial man could be successful upon subjects which, until now, had baffled all the efforts of learned professors, though their common sense should have told them, that upon topics so simple and technical as those of history, geography, and languages, a travelled commercial man, acquainted with most of the Western languages and some of the old ones, had, at least, as much chance to arrive at a linguistic discovery, and enlarge it upon geographical and historical bearings which his personal experience permitted him to grasp, as a sedentary professor, who, though much versed in Greek and Latin, was generally not familiar with many of our commercial Western languages, and had not the opportunity of comparing the various customs and dialects which so often meet the eye and ear of a mercantile man.

HAVING now reached a period, though not yet a full sentence, I stop.

Update 2019-11-15: Twelve years later, the author of The Secret History of the English Language has taken note of my critique and has “comprehensively demolished” it. The wreckage is at http://www.applied-epistemology.com/phpbb2/viewtopic.php?p=45123#45123.

The linguistic arrow of time