<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Fake fits</title>
	<atom:link href="http://bit-player.org/2010/fake-fits/feed" rel="self" type="application/rss+xml" />
	<link>http://bit-player.org/2010/fake-fits</link>
	<description>An amateur's outlook on computation and mathematics.</description>
	<pubDate>Thu, 17 May 2012 10:09:51 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
		<item>
		<title>By: unekdoud</title>
		<link>http://bit-player.org/2010/fake-fits#comment-2769</link>
		<dc:creator>unekdoud</dc:creator>
		<pubDate>Thu, 08 Apr 2010 08:48:06 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=625#comment-2769</guid>
		<description>What happens if the coefficients of the quadratic are perturbed, rather than the y-coordinates of points on it?
Another variation: What if the y-coordinates are randomly scaled up or down by small factors (say about 5%), rather than simply adding a unit standard normal? (I believe the effect would be different for different choice of X)</description>
		<content:encoded><![CDATA[<p>What happens if the coefficients of the quadratic are perturbed, rather than the y-coordinates of points on it?<br />
Another variation: What if the y-coordinates are randomly scaled up or down by small factors (say about 5%), rather than simply adding a unit standard normal? (I believe the effect would be different for different choice of X)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Iain</title>
		<link>http://bit-player.org/2010/fake-fits#comment-2748</link>
		<dc:creator>Iain</dc:creator>
		<pubDate>Wed, 31 Mar 2010 13:15:40 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=625#comment-2748</guid>
		<description>Correction, with apologies for multiple comments. I missed out a superscript: A = X * (X’ * X)^{-1} * X’</description>
		<content:encoded><![CDATA[<p>Correction, with apologies for multiple comments. I missed out a superscript: A = X * (X’ * X)^{-1} * X’</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Iain</title>
		<link>http://bit-player.org/2010/fake-fits#comment-2745</link>
		<dc:creator>Iain</dc:creator>
		<pubDate>Wed, 31 Mar 2010 04:34:04 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=625#comment-2745</guid>
		<description>Another comment: the observation of growing coefficients (apparent from the process identified in ivansml's post) might provide some intuition as to why regularization or shrinkage can make sense.</description>
		<content:encoded><![CDATA[<p>Another comment: the observation of growing coefficients (apparent from the process identified in ivansml&#8217;s post) might provide some intuition as to why regularization or shrinkage can make sense.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Iain</title>
		<link>http://bit-player.org/2010/fake-fits#comment-2744</link>
		<dc:creator>Iain</dc:creator>
		<pubDate>Wed, 31 Mar 2010 04:23:55 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=625#comment-2744</guid>
		<description>Using the same notation as the last comment, one can also write down the progression of the y's:

y_{t+1} = A * y_t + e_{t+1}, where A = X * (X' * X) * X'.

This is a "vector autoregression model", specificially a VAR(1) model, http://en.wikipedia.org/wiki/Vector_autoregression . For those interested, a better starting point might be http://en.wikipedia.org/wiki/Autoregressive_model which involves less linear algebra.</description>
		<content:encoded><![CDATA[<p>Using the same notation as the last comment, one can also write down the progression of the y&#8217;s:</p>
<p>y_{t+1} = A * y_t + e_{t+1}, where A = X * (X&#8217; * X) * X&#8217;.</p>
<p>This is a &#8220;vector autoregression model&#8221;, specificially a VAR(1) model, <a href="http://en.wikipedia.org/wiki/Vector_autoregression" rel="nofollow">http://en.wikipedia.org/wiki/Vector_autoregression</a> . For those interested, a better starting point might be <a href="http://en.wikipedia.org/wiki/Autoregressive_model" rel="nofollow">http://en.wikipedia.org/wiki/Autoregressive_model</a> which involves less linear algebra.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ivansml</title>
		<link>http://bit-player.org/2010/fake-fits#comment-2741</link>
		<dc:creator>ivansml</dc:creator>
		<pubDate>Tue, 30 Mar 2010 21:41:07 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=625#comment-2741</guid>
		<description>Interesting problem (and a nice opportunity to procrastinate a little...):

I also think it should be a random walk (though with correlated increments): say that at t-th round, you have vector of coefficients b_t, and generate vector of y-coordinates as  y = X * b_t + e, where X is your data matrix (so in your case, first column would be vector of 1's, second column would be vector (1,2,3,4,5) and the third column (1,4,9,16,25)), and e is vector of random errors. Then, OLS estimate of coefficients will be:

b_{t+1} = (X' * X)^{-1} * X' * y = (X' * X)^{-1} * X' * (X * b_t + e) = b_t + (X' * X)^{-1} * X' * e = b_t + u,

where u is a transformed error term: u = (X' * X)^{-1} * X' * e (so in general, if elements of e were independent, elements of u will be correlated). In your example, you pick the same x-coordinates, so X would stays constant in all rounds - if you randomized those as well, I think the result would be the same, only we would have X_t with time index, the transformation e -&#62; u would be different in each round and thus the covariance matrix of errors u would be changing randomly at each round as well.</description>
		<content:encoded><![CDATA[<p>Interesting problem (and a nice opportunity to procrastinate a little&#8230;):</p>
<p>I also think it should be a random walk (though with correlated increments): say that at t-th round, you have vector of coefficients b_t, and generate vector of y-coordinates as  y = X * b_t + e, where X is your data matrix (so in your case, first column would be vector of 1&#8217;s, second column would be vector (1,2,3,4,5) and the third column (1,4,9,16,25)), and e is vector of random errors. Then, OLS estimate of coefficients will be:</p>
<p>b_{t+1} = (X&#8217; * X)^{-1} * X&#8217; * y = (X&#8217; * X)^{-1} * X&#8217; * (X * b_t + e) = b_t + (X&#8217; * X)^{-1} * X&#8217; * e = b_t + u,</p>
<p>where u is a transformed error term: u = (X&#8217; * X)^{-1} * X&#8217; * e (so in general, if elements of e were independent, elements of u will be correlated). In your example, you pick the same x-coordinates, so X would stays constant in all rounds - if you randomized those as well, I think the result would be the same, only we would have X_t with time index, the transformation e -&gt; u would be different in each round and thus the covariance matrix of errors u would be changing randomly at each round as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Barry Cipra</title>
		<link>http://bit-player.org/2010/fake-fits#comment-2740</link>
		<dc:creator>Barry Cipra</dc:creator>
		<pubDate>Tue, 30 Mar 2010 19:57:29 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=625#comment-2740</guid>
		<description>OK, I admit to cheating, I went straight to the sneak peek before thinking about the question.  But it seems clear (which means I'm likely wrong...) that you're setting yourself up for some sort of random walk in 3 dimensions (the 3 coefficients of your quadratic curves).  I would expect the coefficients to slowly "diffuse" into larger and larger values, with the rate of diffusion inversely proportional to something like the number of (equi-spaced) jiggled points.  It's certainly true that if you jiggle one point on a constant curve (aka, horizontal line) and re-fit the "curve" to the jiggled point, all you're doing is taking a random walk in 1-d.</description>
		<content:encoded><![CDATA[<p>OK, I admit to cheating, I went straight to the sneak peek before thinking about the question.  But it seems clear (which means I&#8217;m likely wrong&#8230;) that you&#8217;re setting yourself up for some sort of random walk in 3 dimensions (the 3 coefficients of your quadratic curves).  I would expect the coefficients to slowly &#8220;diffuse&#8221; into larger and larger values, with the rate of diffusion inversely proportional to something like the number of (equi-spaced) jiggled points.  It&#8217;s certainly true that if you jiggle one point on a constant curve (aka, horizontal line) and re-fit the &#8220;curve&#8221; to the jiggled point, all you&#8217;re doing is taking a random walk in 1-d.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

