LINEAR TRANSFORMATIONS AND MATRICES [Continued]                SPH 5421 notes.015

HOW LINEAR ALGEBRA RELATES TO REGRESSION


     Assume that the random variable Y is related linearly to several
predictors X1, X2, ... Xp, as follows:

[1]        Yi = B0 + B1*X1i + B2*X2i + ... + Bp*Xpi  +  ei,

where ei is 'measurement error'. The subscript i denotes the i-th individual in the
dataset. We will assume ei is normally distributed with mean 0 and standard
deviation sigma (variance sigma^2).  Further, we will assume that for different
individuals, the e-terms are independent.   That is, ei and ej are IID
N(0, sigma^2) for i not equal to j.   We will assume one measurement per
individual.

     The coefficients  B0, B1, B2, ..., Bp  are unknown.  The goal in regression
is to collect enough data to be able to estimate the coefficients and make
inferences about them.

     For a given individual, the predictors can be represented as a row
vector, Xi :

     Xi = (1, X1i, X2i, ..., Xpi).

     Note that the '1' in this row vector corresponds to the coefficient B0,
the intercept term.

     Similarly, the coefficients can be represented by a vector,  B.  Again
it is easier to show the transpose of  B  rather than the column vector  B:

     B' = (B0, B1, B2, ..., Bp).

     Model [1] above can be represented in matrix terms as:

[2]  Yi = Xi * B + ei,

where the multiplication is matrix multiplication.  Again, this represents
the data for just one observation on one individual.

     The entire data set can be represented by matrices.  For example,


                | 1  X11  X21  X31  ...  Xp1 |              | Y1 |
                | 1  X12  X22  X32       Xp2 |              | Y2 |
                | 1  X13  X23  X33       Xp3 |              | Y3 |
          X  =  | .   .    .    .         .  |   and  Y  =  | .  |
                | .   .    .    .         .  |              | .  |
                | .   .    .    .         .  |              | .  |
                | 1  X1n  X2n  X3n  ...  Xpn |              | Yn |


     Here  n  = the number of individuals.  Note that  X  is an n x (p + 1)
matrix, and  Y  is an  n x 1  column vector.  In terms of these matrices, the
model may be written as

[3]      Y = X * B + e,

where  e  is the  n x 1  column vector of error terms.

     Coefficient estimates in ordinary regression are determined by finding the
values of B0, B1, B2, ... , Bp  which minimize the sum of squares:


     SSQ = Sum(Yi - Xi * B)^2,

where the sum is over the i individuals (i = 1 to n).


     The values of B0, B1, ... , Bp which minimize SSQ can be found by
first taking the derivative of SSQ with respect to each of the Bj, j = 0,
1, ... , p, setting the derivatives equal to 0, and then solving the
resulting system of (p + 1) linear equations.  We will denote the solution
by lower-case letters b0, b1, ..., bp.  The solution satisfies the
following equation:


[4]          (X'*X) * b = X' * Y.


     Note that X' * X  is a (p + 1) x (p + 1) matrix, and X' * Y is
a (p + 1) column vector.

     The usual solution to  [4]  is given by:


[5]           b = inv(X'*X) * X' * Y.


     [Note: in some cases, the inverse of X'*X does not exist, usually because
the model is overparametrized.  In such cases, parameter estimates can be
obtained by the use of 'generalized inverses.'  We will not deal with
generalized inverses here.  See the textbook of Searle for this topic.]

     The estimated covariance matrix for  b  can be derived from the
expression above.  It is:

              cov(b) = s^2 * inv(X'* X),

where  s^2  is the estimate of the variance sigma^2 of the error term,

              s^2 = (Y'* Y - b' * X'* Y) / (n - p - 1).

     Moreover, various sums of squares that would appear in an analysis of
variance table can be written in matrix terms also:

     SSREG =  sum of squares due to regression  :  b'* X'* Y - n * Ybar^2

     SSRES =  sum of squares residual           :  Y'* Y - b'*X'* Y,

     SSTOT = sum of squares total (corrected)   :  Y'* Y - n * Ybar^2,

where Ybar = mean of the Yi, i = 1, 2, ..., n.


     From this, one can compute other statistics used in regression:

     R-square = SSREG / SSTOT

     F = [(SSREG / p)] / [SSRES / (n - p - 1)].


     The F-statistic is used to test the hypothesis:


     H0 : B1 = B2 = B3 = ... = Bp = 0,

for which F would be compared to an F-distribution with degrees of
freedom = (p, n - p - 1).


     You can see how all this works in the case of simple linear regression,
for which the model is


         Yi = B0 + B1 * Xi + e,


where there is only one predictor variable, Xi.

     In this case,


                | 1  X1 |               | Y1 |
                | 1  X2 |               | Y2 |
                | 1  X3 |               | Y3 |
          X  =  | .   . |   and   Y  =  | .  | , and
                | .   . |               | .  |
                | .   . |               | .  |
                | 1  Xn |               | Yn |



                    |  n        sum(Xi)   |
          X' * X =  |                     |.
                    |  sum(Xi)  sum(Xi^2) |


      The determinant of  X' * X is:  D = [n * sum(Xi^2) - {sum(Xi)}^2]

      From the formula for the inverse of a  2 x 2  matrix given in notes.014,
the inverse of  X' * X  is:


                       | sum(Xi^2)/D      - sum(Xi)/D |
          inv(X'* X) = |                              |
                       | -sum(Xi)/D            n / D  |


      The estimate  b  = (b0, b1)  of  the vector  B = (B0, B1)' as shown above
in equation (5) is:


          b = inv(X'* X) * X' * Y.


       Note that here


          X' * Y = (sum(Yi), sum(Xi * Yi))'.


          This implies that


          b1 = [n * sum(Xi * Yi) - sum(Xi) * sum(Yi)] / D.


       Dividing top and bottom by  n  gives:


          b1 = [sum(Xi*Yi) - sum(Xi)*sum(Yi)/n]/[sum(Xi^2) - sum(Xi)*sum(Xi)/n].


       This is the familiar form of the expression for the estimated slope in
simple linear regression.  The corresponding expression for the intercept is


          b0 = [sum(Xi^2) * sum(Yi) - sum(Xi) * sum(Xi * Yi)] / D.


/home/walleye/john-c/5421/notes.015    Last update: February 5, 2000