<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/1.5.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Googling for graphs</title>
	<link>http://bit-player.org/2007/googling-for-graphs</link>
	<description>An amateur's outlook on computation and mathematics.</description>
	<pubDate>Wed, 23 Jul 2008 19:18:41 +0000</pubDate>
	<generator>http://wordpress.org/?v=1.5.2</generator>

	<item>
 		<title>Comment on Googling for graphs by: mersin web tasarım hosting</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-1562</link>
		<pubDate>Wed, 19 Dec 2007 00:47:35 +0000</pubDate>
		<guid>http://bit-player.org/2007/googling-for-graphs#comment-1562</guid>
					<description>thanks</description>
		<content:encoded><![CDATA[	<p>thanks
</p>
]]></content:encoded>
				</item>
	<item>
 		<title>Comment on Googling for graphs by: brian</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-1560</link>
		<pubDate>Fri, 14 Dec 2007 04:26:54 +0000</pubDate>
		<guid>http://bit-player.org/2007/googling-for-graphs#comment-1560</guid>
					<description>Yes, if you load the page 50,000 times, I get away clean but Google puts you on the no-fly list.</description>
		<content:encoded><![CDATA[	<p>Yes, if you load the page 50,000 times, I get away clean but Google puts you on the no-fly list.
</p>
]]></content:encoded>
				</item>
	<item>
 		<title>Comment on Googling for graphs by: Kurt</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-1559</link>
		<pubDate>Fri, 14 Dec 2007 02:31:25 +0000</pubDate>
		<guid>http://bit-player.org/2007/googling-for-graphs#comment-1559</guid>
					<description>Well, this raises more questions for me.  The IP number from which the chart request comes from belongs to the person browsing, not the host site.  Of course, the web browser is supposed to pass along the referring URL along with other info about the requester, so Google can still keep track of which host sites are generating a lot of requests.  And I suppose that any site which generates 50,000 hits per day can afford to generate their own graphs in-house.</description>
		<content:encoded><![CDATA[	<p>Well, this raises more questions for me.  The IP number from which the chart request comes from belongs to the person browsing, not the host site.  Of course, the web browser is supposed to pass along the referring URL along with other info about the requester, so Google can still keep track of which host sites are generating a lot of requests.  And I suppose that any site which generates 50,000 hits per day can afford to generate their own graphs in-house.
</p>
]]></content:encoded>
				</item>
	<item>
 		<title>Comment on Googling for graphs by: brian</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-1558</link>
		<pubDate>Thu, 13 Dec 2007 15:02:50 +0000</pubDate>
		<guid>http://bit-player.org/2007/googling-for-graphs#comment-1558</guid>
					<description>Yes, they could surely cache the most popular graphs. The question I'm curious about is whether it's worthwhile to do so. Obviously there's a storage cost for keeping things, but more important there's a computational cost. You've got to examine every URL that comes in to see if it matches one of the saved graphs. The routine might go like this:

1. Receive a URL.
2. Apply a hash function.
3. Look up the hash in a table.
&amp;#160; &amp;#160; &amp;#160;  3a. If you've never seen this hash before, set a counter to 1.
&amp;#160; &amp;#160; &amp;#160;  3b. If you've seen the hash before, increment its counter.
4. Compare the count with the popularity threshold:
&amp;#160; &amp;#160; &amp;#160;   (count &amp;#60; threshold) --&amp;#62; generate a fresh graph
&amp;#160; &amp;#160; &amp;#160;  (count = threshold) --&amp;#62; generate a fresh graph and cache it
&amp;#160; &amp;#160; &amp;#160;  (count &amp;#62; threshold) --&amp;#62; retrieve the cached copy

(I'm ignoring the possibility of hash collisions, which could make matters worse.)

Note that the overhead of hashing and table lookup and counting applies to &lt;em&gt;every&lt;/em&gt; URL received, not just the popular ones. Is it worth the bother? I don't know. Perhaps we could find out by experiment: Try requesting the same graph repeatedly, and see if at some point the response time drops significantly. But note the following &quot;usage policy&quot;:

&lt;blockquote&gt;Use of the Google Chart API is subject to a query limit of 50,000 queries per user per day. If you go over this 24-hour limit, the Chart API may stop working for you temporarily. If you continue to exceed this limit, your access to the Chart API may be blocked.&lt;/blockquote&gt;

This suggests that they are at least bothering to keep track of the IP number that requests come from. And of course many of us suspect that Google knows all and keeps &lt;em&gt;everything&lt;/em&gt;.</description>
		<content:encoded><![CDATA[	<p>Yes, they could surely cache the most popular graphs. The question I&#8217;m curious about is whether it&#8217;s worthwhile to do so. Obviously there&#8217;s a storage cost for keeping things, but more important there&#8217;s a computational cost. You&#8217;ve got to examine every URL that comes in to see if it matches one of the saved graphs. The routine might go like this:</p>
	<p>1. Receive a URL.<br />
2. Apply a hash function.<br />
3. Look up the hash in a table.<br />
&nbsp; &nbsp; &nbsp;  3a. If you&#8217;ve never seen this hash before, set a counter to 1.<br />
&nbsp; &nbsp; &nbsp;  3b. If you&#8217;ve seen the hash before, increment its counter.<br />
4. Compare the count with the popularity threshold:<br />
&nbsp; &nbsp; &nbsp;   (count &lt; threshold) &#8211;&gt; generate a fresh graph<br />
&nbsp; &nbsp; &nbsp;  (count = threshold) &#8211;&gt; generate a fresh graph and cache it<br />
&nbsp; &nbsp; &nbsp;  (count &gt; threshold) &#8211;&gt; retrieve the cached copy</p>
	<p>(I&#8217;m ignoring the possibility of hash collisions, which could make matters worse.)</p>
	<p>Note that the overhead of hashing and table lookup and counting applies to <em>every</em> URL received, not just the popular ones. Is it worth the bother? I don&#8217;t know. Perhaps we could find out by experiment: Try requesting the same graph repeatedly, and see if at some point the response time drops significantly. But note the following &#8220;usage policy&#8221;:</p>
	<blockquote><p>Use of the Google Chart API is subject to a query limit of 50,000 queries per user per day. If you go over this 24-hour limit, the Chart API may stop working for you temporarily. If you continue to exceed this limit, your access to the Chart API may be blocked.</p></blockquote>
	<p>This suggests that they are at least bothering to keep track of the IP number that requests come from. And of course many of us suspect that Google knows all and keeps <em>everything</em>.
</p>
]]></content:encoded>
				</item>
	<item>
 		<title>Comment on Googling for graphs by: Kurt</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-1557</link>
		<pubDate>Thu, 13 Dec 2007 01:26:14 +0000</pubDate>
		<guid>http://bit-player.org/2007/googling-for-graphs#comment-1557</guid>
					<description>&lt;blockquote&gt;If a thousand people read the page, the graph will be recreated a thousand times.&lt;/blockquote&gt;I'm willing to bet that Google has some clever caching strategies so that high-volume web pages get their graphs stored so they don't have to be regenerated for each hit.</description>
		<content:encoded><![CDATA[	<blockquote><p>If a thousand people read the page, the graph will be recreated a thousand times.</blockquote>
I&#8217;m willing to bet that Google has some clever caching strategies so that high-volume web pages get their graphs stored so they don&#8217;t have to be regenerated for each hit.
</p>
]]></content:encoded>
				</item>
</channel>
</rss>
