The Oracle of Wolfram

In a comment on my earlier note about Wolfram Alpha, Daniel Asimov takes me to task for failing to explain “what Wolfram Alpha is.” I’ll accept the criticism, but I have to add that the question he raises is a real toughie. What, indeed, is Wolfram Alpha? Much of the prerelease hype (e.g., CIO, ZDNet, Telegraph) suggested it was to be some kind of search engine—a “Google killer,” or else, as Steven Levy wrote, “more like the anti-Google.” Another common theme (Infotoday, Guardian) suggests that Alpha is a manifestation of the semantic web, “that thing that Sir Tim Berners-Lee has been banging on about.”

There have been lots of other attempts to answer the ontological question. Jonathan Zittrain (via the New York Times) calls Alpha a “computable almanac.” Larry Greenemeier, writing for Scientific American, says: “Think of it as Ask Jeeves with [a] PhD.” Stephen Wolfram, the creator of Alpha, tells Rudy Rucker: “If anything, you might call it a platonic search engine, unearthing eternal truths that may never have been written down before.” Yuri Alkin, in a blog called Connections, had the wit to present the question to Alpha itself: “Who are you?” he asked. The polite reply, worthy of HAL or Commander Data, was “I am a computational knowledge engine.”

After a few weeks of sporadic poking around with Alpha, I’m finally ready to take my own shot at answering the big question. If you ask me, Wolfram Alpha is an oracle. Not an oracle in the computer-science sense—a hypothetical black box that simplifies complexity analysis by always giving the correct answer for queries of a specific form. I mean an oracle in the Greek-mythology sense—a sybil in a cave or a temple, whose responses to questions are often helpful but tend to be enigmatic and require careful interpretation. Sometimes you get just the answer you were looking for. Sometimes you get no answer at all. Sometimes the answer leaves you more perplexed than when you began.

All in all, perhaps it’s better to set aside the question of what Alpha is and ask what it can do.

It can do your homework (or your students’ homework):

Query: Limit (x^p)^(1/p) as p->0

Answer: \(\lim_{p \to 0}(x^p)^{1/p} = x\)

It can graph a function:

Query: plot sin(x)/x from -10 to 10

Answer:

It makes a handy desk calculator:

Query: 12 choose 3

Answer: 220

Query: factor 8549176323

Answer: 3 × 127 × 22438783 (3 distinct factors)

It provides access to a rich trove of “curated” data:

Query: molecular weight vanadium dioxide

Answer: 82.9403 (grams per mole)

It offers links to “live” data, updated in real time, on topics such as the weather and financial markets:

Query: weather Buenos Aires

Answer:

But the big payoff of a service like this lies in combining factual queries with mathematical or algorithmic analysis. Surely that’s what a “computational knowledge engine” should be good at, no? I’ve been trying hard to make Alpha perform in this way. So far I’ve found the process pretty frustrating.

Here’s a case study. Remembering an old story about Kansas being flatter than a pancake, I submitted the query, “flattest state in the U.S.” The response was another question: “Did you mean ‘fastest state in the U.S.?’”

Well, no, I didn’t mean that, but out of curiosity I clicked the link to learn which is the fastest state in the U.S. The reply, in its entirety, was this:

The oracle was in deep enigma mode. I decided to go back to the “flattest” question. Let me add that I hadn’t really expected my first query to work; a ranking of states by flatness is not something you’d find in an almanac (computable or otherwise), and indeed the concept of flatness has various possible definitions. I thought I could give Alpha some help by being more explicit.

Query: All US states maximum elevation - minimum elevation

Answer: Did you mean: US states maximum elevation minimum elevation

I wasn’t quite sure how to respond here, but it doesn’t cost anything to try, so I accepted Alpha’s rephrasing of the query. What I got back was not the answer I was looking for, but it was not entirely without interest:

Query: US states maximum elevation minimum elevation

Answer:

The scatterplot of highest and lowest elevations by states tipped me off that the data I’m looking for are in the system somewhere. Indeed, one of those dots in the lower left corner, with both lowest and highest elevations near zero, is probably the answer to my question (at least if we define flatness as the difference between maximum and minimum elevation). But how to identify the dot? Or, for that matter, how to identify the conspicuous outlier—the one state with a minimum elevation well above 1,000 feet?

I allowed myself to be distracted by the latter question. There are a couple of obvious guesses for the state with the highest lowest elevation, so I tried one:

Query: Colorado minimum elevation

Answer: 3314 feet

Hmm. That’s not the outlier in the scatterplot; 3300 feet is well off the chart. That means at least one state was clipped from the graph. Another query makes this more obvious:

Query: US states minimum elevation

Answer:

It appears that ranks 1 through 4 lie somewhere above the top edge of the graph. Is there some way to force Alpha to plot the complete data set, without arbitrary cropping? For some kinds of plotting, I’ve figured out how to control the range of the independent variable (see the command “plot sin(x)/x from -10 to 10″ above), but in this context I’ve not discovered the key, if there is one. And, as far as I can tell, there is no warning given when a plot is chopped.

Nevertheless, I was able to identify the four missing states. Accompanying the rank-order graph above was a helpful list, whose first entries were: Colorado 3314, Wyoming 3100, New Mexico 2844, and Utah 2001. The highest visible dot in the graph represents the fifth state in the sequence—Montana, with a minimum elevation of 1801 feet. The list gave the first five states and the last five in the ranking. This looked promising. If I could get a complete list of minimum elevations for all the states, and then the corresponding list of maximum elevations, perhaps Alpha could also give me the differences. I would ask it to alphabetize both lists, then subtract them element by element, and finally take the minimum of the result, or else sort again according to magnitude.

A button next to the truncated list of states promised “More.” I pressed it. Now I had the first 10 and the last 10 states, but I was still missing the 30 in the middle. Something else had changed as well: All the numbers were different, with the list of elevations beginning 1010, 945, 867. After a moment’s perplexity, I realized that Alpha had decided to shift from feet to meters. No matter. Three more presses of the “More” button finally got me a complete list of minimum state elevations (in meters). And the same rigmarole soon produced the analogous list of maximum elevations (again in meters).

But now I was stumped. How do I sort the list alphabetically? Can I subtract one list from another? Can I do anything to transform the output of a command? Is there any way to compose commands, so that the output of one routine becomes the input of another? Not a clue.

But perhaps I could do it the other way, slicing the salami crossways instead of longitudinally. Instead of compiling a list of maxes and a list of mins and then subtracting, I could subtract lowest point from highest point state by state and then list the results. Searching through various help files and lists of examples, I eventually came to a page on “Elevation Data,” with a subcategory “Minimum and Maximum Elevations.” And there, at the bottom of the page, was this suggested query: “Montana maximum elevation - minimum elevation.” Clicking on it gave me the result “11,007 feet.” So I could get the elevation range for a single state. All that remained was to persuade Alpha to map the same computation over all the states….

But wait. That’s where this story began, with the query “All US states maximum elevation - minimum elevation.” It didn’t work when I tried it before, and it still doesn’t work now.

I tried some minor variations in phrasing and punctuation, such as this one:

Query: (US states maximum elevation) - (US states minimum elevation)

Answer: 4341 feet

What does the number 4341 mean? A “Show Details” button led to the explanation:

Instead of subtracting the vectors element by element, the program is taking the median of each elevation list and then subtracting. (If I had wanted to do that, I wouldn’t have known how to ask for it.)

Finally, shown below in full detail is what came back after one further attempt to formulate the “flattest state” query:

Who asked about Albanian currency? I guess this is what the sybil says when she’s tired of listening to all of my questions.

* * *

Wolfram Alpha is an ambitious project, as its makers would be the first to proclaim. Here’s what the “About” page tells us:

Wolfram|Alpha’s long-term goal is to make all systematic knowledge immediately computable and accessible to everyone. We aim to collect and curate all objective data; implement every known model, method, and algorithm; and make it possible to compute whatever can be computed about anything.

It’s hard to resist making fun of these lofty and all-encompassing aims, especially when a fairly simple geographic query returns a result expressed in units of Albanian Lek-feet. All the same, I still applaud the attempt to create such a service, and I hope that Stephen Wolfram and his colleagues achieve some reasonable fraction of their goals.

The main sticking point, it seems pretty obvious, is not in collecting and curating data or in formulating models, methods and algorithms. It’s the access part. How am I to communicate with the system? How am I to specify which bits of systematic knowledge I’d like to retrieve, and how do I tell Alpha which models, methods and algorithms to apply? For more than 50 years the answer to this question has generally been a programming language of some kind. The designers of Wolfram Alpha have deliberately turned their back on that option, in favor of a natural-language interface. I’m sure they made this choice with the best of motives, in order to reach out to a wider audience that might be intimidated by formal notation. Unfortunately, the natural-language interface is so limited that we’re effectively left with no notation at all.

In a way, talking to Wolfram Alpha is rather like communicating in a natural language—a foreign language you don’t happen to speak. With grunts and gestures and a few stray nouns you may be able to get across the most rudimentary touristic needs—”Where toilet?” or “How much?”—but if you want to carry on a real conversation, you need more vocabulary and, most of all, you need grammar. I’m skeptical that Wolfram Alpha will ever be of much use without such a linguistic structure.

13 Responses to The Oracle of Wolfram

Evgeny says:

26 June 2009 at 1:58 am

Great things take time to mature, W|A is just too new to be as intuitive as you wish it to be. My guess is that it will take some time, maybe a couple of years even, when the “access” part of W|A will as easy as Google. I am sure Wolfram wished it would be this way from day one, and you wish it as well, but unfortunately things just don’t work this way - it takes time for an interface to become good. (Also, non-excluded it may become worse, or just not-good-enough anymore and die, see AltaVista example)

PS: Just for fun, you may ask W|A to “Make me a child”.
Alexander says:

26 June 2009 at 4:43 am

Based on my experience with natural languages, I can tell that it usually takes several sentences to describe a computation. In your case it might be “Consider all US states. For each state subtract lowest elevation from highest elevation. Find minimum difference.” Regrettably, Wolfram doesn’t seem to have a notion of sentence yet.

I also feel that in case of Wolfram, natural language is there to stay. It is probably the whole point of it. Fuzziness of expression may be a deliberate goal. I’ll try to explain, what I mean.

To store and access huge amounts of data, one has to enforce some structure into it. That is, to split it into topics, categories, sub-categories, etc. Structure, however, is not uniquely defined. There is a multitude of solutions, without a single “right one”. So, to some extent, fuzziness is inherent to the problem from outset. As your data set grows, the question of “which piece of data falls under which sub-(sub-sub-)category” becomes increasingly painful and frustrating.

In Wolfram, I think, they deliberately turned away from this approach in favor of a different one, inspired by Web search engine, effectively emulating Web-like structureless data storage. In a search engine there is no such thing as “right” or even “correct” query. There are only queries which (mysteriously) work, and which (mysteriously) fail to do so. Although, in case of Wolfram all is a bit different, since they actually keep all of their data under control. Therefore design of search engine is likely to be very different from common Web variaties.

Natural language may appear to be just vague enough in a setting, where no structure in data is assumed (or, equivalently, all possible structures are assumed simultaneously). On the other hand, I don’t see any reason why they should extend with this trend beyond data retrieval. Well, at least Albanian foot-leke (with conversion table to US foot-dollars and foot-euros, based on current market value) are fun. I am not sure if, say, Google can be as half as hilarious.

My understanding is that they haven’t yet really come up with a full language. What we see is a very basic “expression evaluation” to debug aforementioned data retrieval capabilities.
Larry (IEOR Tools) says:

26 June 2009 at 8:25 am

It seems to me that WolframAlpha team would benefit greatly by opening up its data and structure to the community a la wikipedia. I understand they thrive on being a closed system but this type of “community knowledge sharing” should best reside with subject matter experts in the community.

Perhaps there is an opportunity for a new Wikipedia solution. Perhaps a Knowledgepedia or Infopedia.
Jeff says:

27 June 2009 at 4:00 pm

If you enter a name into Alpha (say, “Brian”) you get all kinds of interesting statistical and demographic information about the name and its popularity over the years. Clearly someone at Wolfram put some time into coding that response.

However, the natural language aspect of Alpha provides no sense of multiple meanings. I asked for “world population projection” and got a response that treated “world population” as a vector and gave its projection on another vector. I wish that there was some way to give feedback so that the programmers could focus the same care they put into the name feature onto other types of searches. Perhaps Alpha should be labeled ‘Beta’ and every search end with a survey on how close it got to what you were really looking for.
Richard says:

28 June 2009 at 11:20 am

The way Wolfram Alpha is developing, it is certain that in due course, it is going to be a revolutionary search engines around. WolfamAlpha is the trailblazer for many other search engines.

Richard
aafter search
Rory says:

3 July 2009 at 8:05 pm

ALL is the currency symbol for the Albanian Lek…
brian says:

4 July 2009 at 7:42 am

@Rory: Good grief, that’s it! The cosmos makes a little more sense this morning, now that you’ve cleared that mystery up.
Daniel Bigham says:

6 July 2009 at 8:05 pm

I think you’re hitting the nail on the head. Like yourself, I’ve noticed that in many cases, W|A contains the data needed to answer my query, but it isn’t able to transform my English query into whatever structure it needs to compute the answer.

If you ask me, this is the area of improvement that is needed the most… and not surprisingly, it is the most difficult problem to solve.
Jim Ward says:

7 July 2009 at 3:35 pm

Delaware has the lowest ‘us states mean elevation’ by far.
Roger Williams says:

12 July 2009 at 6:48 pm

Thanks for this well-written critique. I think the most telling response is the one about the english sentence description of a computation.

I use Mathematica and the biggest hurtle to using its awesome power is syntax. I am a programmer and you still have to get your head in the “functional mode” before you can use it in the way it was designed.

Right now I have been working on the “translation” for 2 sentences:

1) give me a list of all of the schools attended by the children of Republican presidents (can be answered with no code and Freebase Parallax), and
2) give me a list of all days in 2009 when the Google stock price declined and it rained in Columbus Ohio (about 40 lines of Mathematica code)

I am nowhere near close to how the “translation” is (or should be) done.

Regards..
Pacha Nambi says:

13 July 2009 at 9:34 pm

I like the potential of Wolfram Alpha but it has a long way to go. The answers I get from it reminds me of my ex-wife!. Sometimes, her answers made sense; other times, I got more confused by her answers and wondered if she really understood my questions in the first place!.
Jim Ward says:

14 July 2009 at 10:14 am

You can do individual states, with something like ‘Florida maximum elevation - minimum elevation’. Haven’t figured out how to loop over all the states to get a table.

Florida 344.5 ft
DC 410.1 ft
Delaware 449.5 ft
Louisiana 541.3 ft
Kansas 3363 ft
Jim Ward says:

14 July 2009 at 10:27 am

You could try entering ‘Wolfram|Alpha isn’t sure what to do with your input.’