Comments on: Fake fits

By: unekdoud

unekdoud — Thu, 08 Apr 2010 08:48:06 +0000

What happens if the coefficients of the quadratic are perturbed, rather than the y-coordinates of points on it?
Another variation: What if the y-coordinates are randomly scaled up or down by small factors (say about 5%), rather than simply adding a unit standard normal? (I believe the effect would be different for different choice of X)

By: Iain

Iain — Wed, 31 Mar 2010 13:15:40 +0000

Correction, with apologies for multiple comments. I missed out a superscript: A = X * (X’ * X)^{-1} * X’

By: Iain

Iain — Wed, 31 Mar 2010 04:34:04 +0000

Another comment: the observation of growing coefficients (apparent from the process identified in ivansml’s post) might provide some intuition as to why regularization or shrinkage can make sense.

By: Iain

Iain — Wed, 31 Mar 2010 04:23:55 +0000

Using the same notation as the last comment, one can also write down the progression of the y’s:

y_{t+1} = A * y_t + e_{t+1}, where A = X * (X’ * X) * X’.

This is a “vector autoregression model”, specificially a VAR(1) model, http://en.wikipedia.org/wiki/Vector_autoregression . For those interested, a better starting point might be http://en.wikipedia.org/wiki/Autoregressive_model which involves less linear algebra.

By: ivansml

ivansml — Tue, 30 Mar 2010 21:41:07 +0000

Interesting problem (and a nice opportunity to procrastinate a little…):

I also think it should be a random walk (though with correlated increments): say that at t-th round, you have vector of coefficients b_t, and generate vector of y-coordinates as y = X * b_t + e, where X is your data matrix (so in your case, first column would be vector of 1′s, second column would be vector (1,2,3,4,5) and the third column (1,4,9,16,25)), and e is vector of random errors. Then, OLS estimate of coefficients will be:

b_{t+1} = (X’ * X)^{-1} * X’ * y = (X’ * X)^{-1} * X’ * (X * b_t + e) = b_t + (X’ * X)^{-1} * X’ * e = b_t + u,

where u is a transformed error term: u = (X’ * X)^{-1} * X’ * e (so in general, if elements of e were independent, elements of u will be correlated). In your example, you pick the same x-coordinates, so X would stays constant in all rounds - if you randomized those as well, I think the result would be the same, only we would have X_t with time index, the transformation e -> u would be different in each round and thus the covariance matrix of errors u would be changing randomly at each round as well.

By: Barry Cipra

Barry Cipra — Tue, 30 Mar 2010 19:57:29 +0000

OK, I admit to cheating, I went straight to the sneak peek before thinking about the question. But it seems clear (which means I’m likely wrong…) that you’re setting yourself up for some sort of random walk in 3 dimensions (the 3 coefficients of your quadratic curves). I would expect the coefficients to slowly “diffuse” into larger and larger values, with the rate of diffusion inversely proportional to something like the number of (equi-spaced) jiggled points. It’s certainly true that if you jiggle one point on a constant curve (aka, horizontal line) and re-fit the “curve” to the jiggled point, all you’re doing is taking a random walk in 1-d.