Linear Regression

Previously, I had written an article about how to use non-orthogonal basis vectors for the modeling of sampled data. Here I show the same idea, but reformulated as a linear regression problem.

The linear regression problem is often presented in the context of finding the best fit line to a set of data points. However, it can do much more than that. The algorithm works with the following:

A set of observed data points, y. This is the input.
A set of unknown coefficients, $\beta$ that correspond to the weight of each basis vector. This is the output.
A set of basis vectors, x. These are arranged in a rectangular matrix X. The vectors here form the columns of matrix.

The first example would be the line fitting case. In this case, we have:

observed values, such as 10,21,29,42.
a set of basis vectors. eg: 0,1,2,3 for the first, and 1,1,1,1 for the second.

The algorithm solves the problem $y = X\beta$ . For the above, this is: $y(n) = \beta_0 (1) + \beta_1 (n)$ . Here $X$ is a 4×2 matrix.

The pseudo-inversion can be used to determine the best fit. This solves $\beta = X^{-1}y$ when X is not a square matrix. MATLAB/Octave make this easy with the right divide operator.

y = [10,21,29,42]';
x = [0,1; 1,1;, 2,1; 3,1];
beta = x\y;
% beta = [10.4; 9.9]

In this example, the best fit occurs with $y[n] = 9.9 (1) + 10.4 (n)$ .

It should be noted that this is not the only way to define a line. The basis vectors 0,1,2,3 and -3, -2, -1, 0 will also define a line. However, the coefficient for the common 0,1,2,3 basis vector will be different in the two cases. The second case corresponds to a model where the observations fall along some linear function, $f_0(n) = k_0n$ , as well an affine function $f_1(n) = k_1n+b$ . The slope will tend to be divided between the two basis vectors when the observations have a different x-intercept than $b$ .

y = [10;21;29;42];
x = [0,-3; 1,-2; 2,-1; 3,0];
beta = x\y;
% beta = [13.7; -3.3]

In which case, the best fit is $y[n] = 13.7 (n) - 3.3 (n-3)$ , which simple algebra shows to be the same as the expression with respect to the first set of basis vectors.

The same method can also be used with the sine+cosine case as in the previous article. In this case, $y[n] = \beta_0 (sin[n]) + \beta_1(cos[n])$ . Or, in the matrix format, X is a matrix that is Nx3. $\beta$ is 3×1, and y is Nx1. N is the number of observations.

t = [0:1234]';
y = sin(2*pi*t/1107) + 2*cos(2*pi*t/1107);
x = [sin(2*pi*t/1107) , cos(2*pi*t/1107)];
beta = x\y;
% beta = [1;2]

As before, the basis vectors must not be linear combinations of each other, and best results occur when they are not close to linear combinations of each other. For the multiple sine wave case, this implies that all frequencies are sufficient spaced in the frequency domain, or that a long window of data (in time, not just samples) is analyzed. Remember, oversampling increases bandwidth, but not frequency resolution.

To relate the two articles, $\psi = X^{-1}$ , which will typically be $\psi = (X'X)^{-1}X'$ as X will typically not be a square matrix and will only have a pseudo-inverse. It can be useful to generate the above $\psi$ explicitly, as it can be re-used and the inner-product operation is easier than recomputing the pseudo-inverse.

Welcome

Categories

Recent Posts

Archives

Meta