I have a few questions for the complexity theorists among us.
Have you ever tried to explain to your grandmother why NP is named NP? Does she get it when you say that problems labeled NP-complete are the hardest problems in NP, but NP-hard problems might be harder, and not in NP?
My concern here is not with the difficulty of the concepts—there’s not much we can do about that—but with the notation and terminology. Do locutions like P#P and NISZK and (NP ∩ coNP)/poly roll trillingly off your tongue? How about EXP, EEXP, NEXP, PEXP and SUBEXP? And while we’re on the subject of EXP and friends, I’ve been wondering how to pronounce NEXPTIME. (I’m kind of hoping the “P” is silent, as in Pterodactyl.)
Presumably, the argot of complexity theory works well enough for communication among members of the club—those of you, say, who know that “IP” is neither “internet protocol” nor “intellectual property” but “interactive proof.” If you’d ever like to talk to the rest of the world, however, I think there might be room for improvement.
Let’s start with the term “complexity” itself. The root meaning of “complex” is “made up of many interconnected parts”; for example, the innards of a mechanical clock are complex. Is this the best word to describe the property of being hard to compute? To me, it seems the difficulty of an exponential algorithm is not the complexity of the task but the mere mind-numbing repetition of the same operations. An exponential heap of manure is no more complex than a polynomial heap of manure, it just takes longer to shovel it.
Another gripe: WHAT’S WITH ALL THE CAPITAL LETTERS? IS IT REALLY NECESSARY TO SHOUT ALL THE TIME?
But the thing that worries me most is a certain opacity in the names of complexity classes. According to the Complexity Zoo, there are currently 465 named classes, which is quite a menagerie. It’s enough that you can’t really expect people to keep track of meanings and relations without some internal cues from the structure of the names themselves.
There are hints of structure in the naming scheme, with terms assembled from stems, prefixes and suffixes, and combined according to quasi-grammatical rules. For example, the superscript in P#P indicates the action of an “oracle”: The class encompasses problems that can be solved in polynomial time with the help of a #P (“sharp-P” or “counting-P”) oracle. The prefix “co-” in “coNP” designates the complement of NP: If a “Does there exist…?” question is in NP, then the corresponding “Does there not exist…?” question is in coNP. These devices are applied consistently; as far as I can tell, superscripting and the “co-” prefix always mean the same thing. Unfortunately, other elements of the names are not so easy to parse. The letter P generally stands for “polynomial” (except where it’s “probabilistic”). N usually denotes “nondeterministic” (but NC is “Nick’s Class”). Likewise the prefix D is for “deterministic” (except that it’s usually omitted, and sometimes it means “difference” or “dynamical” instead). B stands for “bounded-error” (except that BH is “Boolean hierarchy” and “BPd(P)” is “Polynomial Size d-Times-Only Branching Program”). Q is for “quantum” (except “QH” is the “query hierarchy” and “QP” is “quasi-polynomial time”).
The sad truth is, the naming conventions for furniture at Ikea make for a more consistent language than those of complexity theory.
How did we get into this fix? Back in 1974, when complexity theory was still young and impressionable, Don Knuth published “A Terminological Proposal” in SIGACT News, the quarterly newsletter of what was then the ACM Special Interest Group on Automata and Computability Theory. (By the way, the issue in which Knuth’s letter appears, Vol. 6, No. 1, seems to be missing from the ACM Digital Library. Can anyone supply a copy for scanning?) Knuth had polled 30 of his friends, asking for advice on the best adjective to describe problems “at least as difficult to solve in polynomial time as those of the Cook-Karp class NP.” His own suggestions were “Herculean,” “formidable” and “arduous,” but none of those got strong support. The people he consulted had proposals of their own: “impractical,” “intractable,” “prodigious,” “exorbitant,” “Sisyphean” and several others. The idea I like best came from Shen Lin, who suggested the abbreviation PET:
PET stands for “probably exponential time”. But if this gets proved, PET stands for “provably exponential time”. And if the proof goes the other way, it stands for “previously exponential time”.
On the whole, I think it’s just as well that we are not now speaking of Herculean or Sisyphean or PET problems, but I haven’t much enthusiasm for the solution that was adopted, either:
The “winning” write-in vote is the term NP-hard, which was put forward mainly by several people at Bell Labs, after what I understand was considerable discussion. Similar if not identical proposals were made by Steve Cook, by Ron Rivest, and in earlier publications by Sakti Sahni. This term is intended for use together with another new one, NP-complete, which abbreviates ‘polynomial complete’ and is at the same time more exact.
Knuth endorses “NP-hard” and “NP-complete” primarily because they have precise definitions and thus meet the needs of the insiders, the “automata-theorists.” But he also goes on to address the issue of communicating with a wider audience, and his thoughts are worth quoting at some length:
I don’t consider it a major goal to invent completely descriptive terminology for every conceivably interesting question of the type considered here; the major goal is to have a good term to use for the masses, in the one case which experience shows is almost omnipresent. To say NP-hard actually smacks of being a little too technical for a mass audience, but it’s not so bad as to be unusable; fortunately the term does easily generalize to a lot of other cases considered by automata theorists, so it appears to be a good compromise. When more technicalities are introduced, there is less need for a universally accepted term or notation. Although it’s very interesting to consider, e.g., log-space reductions, I don’t think it’s necessary to come up with a special short name for them.
If I understand this passage correctly, Knuth is arguing that although “NP-complete” and “NP-hard” may be a bit abstruse for casual coffeeshop chatter, they are the only terms that nonspecialists will ever need to learn, and so they won’t be too much of a burden. But that’s not quite the way it has worked out.
I suppose that if I’m going to complain about somebody else’s notation, I ought to propose an alternative of my own that I believe to be better. I can’t do that, but I would like to point out that certain other fields of study offer models that might be emulated. The most obvious model is organic chemistry, where the naming problem is orders of magnitude larger than the collection of 465 items in the complexity zoo. The sesquipedalian names of organic molecules are not exactly pretty, but the notational system has the wonderful property that if you know the name of a thing, you also know its composition and structure. Wouldn’t it be grand to have the same kind of perspicuous scheme for complexity classes, so that just naming a class would tell you where it lies in the inclusion diagram?
A saving grace of complexity theorists everywhere is that they are not without a sense of humor when it comes to the obscurity of their own jargon. For example, there is a complexity class named PINC, which the Zoo translates as “Polynomial Ignorance of Names of Classes.” And you don’t want to miss Alan T. Sherman’s discourse (or descant) on superpolylogarithmic subexponential functions (.ps).
Finally, I should mention a discovery I’ve made at the end of Knuth’s terminological proposal. Everyone knows that settling the P = NP question will earn you a prize of $1 million from the Clay Mathematics Institute. I suspect that many are unaware of another prize offer, made more than 30 years ago. Knuth writes:
I’m willing to give a prize of one live turkey to the first person who proves that P = NP.
Note that this appears to be a one-way-only offer. Proving that P ≠ NP won’t win you the turkey. (You have three days left until Thanksgiving.)
“The sad truth is, the naming conventions for furniture at Ikea make for a more consistent language than those of complexity theory.”
Ouch !! But not too off the mark. I guess people can get used to anything :)
A musing on the same topic:
http://weblog.fortnow.com/2006/07/naming-complexity-classes.html
Really, the most egregiously stupid names should be changed. And too bad for the literature. I nominate NC.
Wow, a lot of questions. Since no one else has tried to answer them, I might as well…
Have you ever tried to explain to your grandmother why NP is named NP?
Curiously, my mammaws have never asked. (I’m from the South, and sometimes when you’re from the South, you call them mammaws.)
Does she get it when you say that problems labeled NP-complete are the hardest problems in NP, but NP-hard problems might be harder, and not in NP?
It might be easier to try this translation. Take something of the form X-Y and think of it as “Y for X.” So NP-complete is “complete for NP” and NP-hard is “hard for NP.” This probably doesn’t begin to solve your problem, but the idea is that something that NP-hard means that it’s “at least as hard as NP”, and something that is complete will essentially characterize NP itself.
Do locutions like P#P and NISZK and (NP ∩ coNP)/poly roll trillingly off your tongue?
Not really. “P to the sharp-P” rolls OK for me.
How about EXP, EEXP, NEXP, PEXP and SUBEXP?
Yeah, those roll. They’re eksp, ee-eksp, en-eksp, pee-eksp, suhb-eksp, etc.
And while we’re on the subject of EXP and friends, I’ve been wondering how to pronounce NEXPTIME.
The P isn’t silent, it’s pronounced “en-eksp-time.” Note N is a prefix, like in NP.
Presumably, the argot of complexity theory works well enough for communication among members of the club—those of you, say, who know that “IP†is neither “internet protocol†nor “intellectual property†but “interactive proof.†If you’d ever like to talk to the rest of the world, however, I think there might be room for improvement.
I hear what you’re saying, but I think that the phrase “interactive proof” is very suggestive, one of the best names that we have actually. It would be cumbersome to write out the phrase everytime we want to talk about the class of problems, wouldn’t it? And what better way to abbreviate “interactive proof” than…
Let’s start with the term “complexity†itself. The root meaning of “complex†is “made up of many interconnected partsâ€; for example, the innards of a mechanical clock are complex. Is this the best word to describe the property of being hard to compute? To me, it seems the difficulty of an exponential algorithm is not the complexity of the task but the mere mind-numbing repetition of the same operations. An exponential heap of manure is no more complex than a polynomial heap of manure, it just takes longer to shovel it.
When a complexity theorist talks about complexity, he/she doesn’t talk about algorithms being complex. Complexity is a property of problems. Computational problems (which require us a great deal of time to determine an answer) are complex, in that the encoding of the problem instance does not readily help us find the answer– somehow, the answer is embedded deep in the problem and to uncover it requires substantial effort, no matter what we try. So in a sense, complexity can be seen as a property of the problem and its presented encoding. For example, there are no NP-complete problems that are written in unary (instead of binary), unless P = NP. This is a theorem of Piotr Berman from the 70′s.
Another gripe: WHAT’S WITH ALL THE CAPITAL LETTERS? IS IT REALLY NECESSARY TO SHOUT ALL THE TIME?
YES
The sad truth is, the naming conventions for furniture at Ikea make for a more consistent language than those of complexity theory.
Funny, but it’s not quite that bad.
How did we get into this fix?
For the examples you gave, it is because taking the first letter of a word is a bad hash function.
I suppose that if I’m going to complain about somebody else’s notation, I ought to propose an alternative of my own that I believe to be better. I can’t do that, but I would like to point out that certain other fields of study offer models that might be emulated. The most obvious model is organic chemistry, where the naming problem is orders of magnitude larger than the collection of 465 items in the complexity zoo. The sesquipedalian names of organic molecules are not exactly pretty, but the notational system has the wonderful property that if you know the name of a thing, you also know its composition and structure. Wouldn’t it be grand to have the same kind of perspicuous scheme for complexity classes, so that just naming a class would tell you where it lies in the inclusion diagram?
Yes; the trouble is that new classes and definitions are still being invented. The situation is not like organic chemistry in that respect, and I’m not sure when it will be. We don’t know what how important all these classes will be–some are obviously important, some we just don’t know about–maybe they’ll all be of vital importance someday, in which case a more systematic approach will be absolutely vital. For example, few people knew what on earth PPAD was, until two-player Nash was shown to be PPAD-complete. Now many researchers are studying PPAD.
There are many ways to define a computational model, and for each one you can define a corresponding complexity class, and maybe throw in a time bound on top, for good measure. I don’t know how we could systematize this defining process in a nice way, and maybe we can’t even do it (by some diagonalization argument). Right now we’re cataloging everything we know so far, and I agree it’s a mess, but Scott et al.’s efforts to better organize our knowledge is a good first step.
I am grateful to Ryan Williams for all of his help, and especially for the advice on pronunciation. (Although I can’t help wondering if some of his choices might not be Sutherun variants. I’ve heard tell there’s a Motown version of E-X-P-T-I-M-E.)
No problem. I think you’re on to a good idea for a new complexity theme song… many researchers work on simulating randomized algorithms efficiently using algorithms that use no random sources; this fascinating area is called “derandomization.” A major open problem is whether or not BPP (a certain class of problems that can be solved efficiently with randomness) is different from EXPTIME. Thus, I submit:
“E-X-P, T-I-M-E,
Find out if it’s B-P-P
E-X-P, T-I-M-E,
Take care, it’s tricky!”
With more lyrics, this might supplant “Find the Longest Path” (sung to the tune of “For the Longest Time” by Billy Joel).
Brian: Sorry I’m late! (Just discovered your blog.) Fortunately Ryan already said most of what I wanted to.
One of my first blog posts was about whether complexity theory took a wrong turn in denoting every model of computation by an incomprehensible-looking acronym. I completely (NP-completely?) agree with you that these acronyms present a daunting obstacle to the would-be theoretical computer science popularizer. Maybe the only thing to do is to play up the classes’ aura of mystery and obscurity — though I suspect that works better with the physicists’ terms, like “quark” and “supersymmetry,” which at least sound cool.
But just like you, I’ve tried to think of a better convention (just as a thought experiment — there’s no hope of actually switching at this point), and have failed completely. As Ryan points out, the key problem is that unlike in organic chemistry, we’re not combining some fixed collection of elements in new ways. Instead, we’re constantly inventing new elements (quantum, interactive, multi-prover, etc.) and combining them with previous elements — and even inventing new ways of combining the elements — and then we need a short name for each of the resulting combinations (at least if it turns out to be important, which of course is hard to predict in advance).