Googling for graphs

The latest cute trick from Google is a service for producing quantitative graphics on demand. For example, here’s a bar chart of the traffic to this web site over the past year:

bar chart from Google Charts

The cute part is how you get Google to produce the graphic. The whole specification—including data, scales, labels and stylistic preferences—gets packed into a URL. In this case, the URL reads as follows (with line breaks added for clarity):

http://chart.apis.google.com/chart?
chs=450x300
&cht=bvs
&chbh=27,6
&chtt=bit-player.org+hits+per+month
&chco=ee2233
&chf=bg,s,efedea
&chxt=x,y
&chxl=0:|J|F|M|A|M|J|J|A|S|O|N|D|
      1:|0|20,000|40,000|60,000|80,000|100,000
&chd=t:54.9,59.1,61.3,69.5,83.6,77.6,60.3,60.5,69.2,77.3,86.0,-1

(If you want the key to decode all this curious bric-a-brac, see the Google Chart API Developer’s Guide.)

There’s something vaguely unsettling about this whole idea. When the web was young, a URL was a resource locator—in other words, the address of a document stored on a server somewhere. Times have changed, and most web servers now assemble pages by issuing queries to a database rather than by grabbing documents from a simple file system; the “back end” server fetches some text here, an image there, an ad from a third source. Still, the basic paradigm remains information retrieval; the vast majority of web sites are in the business of serving up some kind of pre-cooked static content, even when it’s embellished with dynamic sauce. But I really don’t think that Google Charts works that way. I don’t think they’ve produced an archive of all possible graphs and were just waiting for me to send the URL that retrieved the one you see above.

The charts service raises anew a question that comes up all the time in computing: What should we compute just once and store away in case it’s needed again later, and what should we recompute every time? Years ago we had precomputed tables of logarithms and trig functions, but now it’s easier to just calculate those numbers on the fly. Evidently graphs have now entered that realm as well—things so cheap to compute it’s not worth hanging onto them. Note that the graph you’re looking at above is not one that I produced when I wrote and published this page; instead, when you opened the page, your web browser (or RSS reader) sent a new request to Google’s servers, which then regenerated the graph from scratch. If a thousand people read the page, the graph will be recreated a thousand times. It’s monstrously profligate and inefficient, but I guess that’s what makes computers so wonderful.

One constraint on this mechanism is the length of the URL. Is there a limit? I didn’t know the answer, and so I turned—where else?—to Google. I quickly found a helpful web page that says the standards are silent on this issue, but some web browsers and servers impose a limit of as few as 2,000 characters. The Google Charts designers probably had this limit in mind when they created their scheme for cramming data into a URL. There are actually three encodings available. Above I chose a method in which each data point is a number between 0.0 and 100.0 and values are separated by commas; this allows for a few hundred data points before you start bumping into the 2,000-character limit. The most compact encoding packs each data item into a single ASCII character in the ranges from A=0 to Z=25, a=26 to z=51 and 0=52 to 9=61. Not the most intuitive encoding, but it allows for a further cute trick: graphing words and phrases.

http://chart.apis.google.com/chart?chs=200x200&cht=lc&chd=s:Agraphisworth1000words
bar chart from Google Charts

Copy the URL into the location bar of a browser window, change the text at the end of the line, and you can see a graph of your name and address.

This entry was posted in computing.

6 Responses to Googling for graphs

  1. Kurt says:

    If a thousand people read the page, the graph will be recreated a thousand times.

    I’m willing to bet that Google has some clever caching strategies so that high-volume web pages get their graphs stored so they don’t have to be regenerated for each hit.

  2. brian says:

    Yes, they could surely cache the most popular graphs. The question I’m curious about is whether it’s worthwhile to do so. Obviously there’s a storage cost for keeping things, but more important there’s a computational cost. You’ve got to examine every URL that comes in to see if it matches one of the saved graphs. The routine might go like this:

    1. Receive a URL.
    2. Apply a hash function.
    3. Look up the hash in a table.
          3a. If you’ve never seen this hash before, set a counter to 1.
          3b. If you’ve seen the hash before, increment its counter.
    4. Compare the count with the popularity threshold:
          (count < threshold) –> generate a fresh graph
          (count = threshold) –> generate a fresh graph and cache it
          (count > threshold) –> retrieve the cached copy

    (I’m ignoring the possibility of hash collisions, which could make matters worse.)

    Note that the overhead of hashing and table lookup and counting applies to every URL received, not just the popular ones. Is it worth the bother? I don’t know. Perhaps we could find out by experiment: Try requesting the same graph repeatedly, and see if at some point the response time drops significantly. But note the following “usage policy”:

    Use of the Google Chart API is subject to a query limit of 50,000 queries per user per day. If you go over this 24-hour limit, the Chart API may stop working for you temporarily. If you continue to exceed this limit, your access to the Chart API may be blocked.

    This suggests that they are at least bothering to keep track of the IP number that requests come from. And of course many of us suspect that Google knows all and keeps everything.

  3. Kurt says:

    Well, this raises more questions for me. The IP number from which the chart request comes from belongs to the person browsing, not the host site. Of course, the web browser is supposed to pass along the referring URL along with other info about the requester, so Google can still keep track of which host sites are generating a lot of requests. And I suppose that any site which generates 50,000 hits per day can afford to generate their own graphs in-house.

  4. brian says:

    Yes, if you load the page 50,000 times, I get away clean but Google puts you on the no-fly list.

  5. Pingback: New Year’s To-Do List « Learning Computation