LINEAR TRANSFORMATIONS AND MATRICES [Continued] SPH 5421 notes.015 HOW LINEAR ALGEBRA RELATES TO REGRESSION Assume that the random variable Y is related linearly to several predictors X1, X2, ... Xp, as follows: [1] Yi = B0 + B1*X1i + B2*X2i + ... + Bp*Xpi + ei, where ei is 'measurement error'. The subscript i denotes the i-th individual in the dataset. We will assume ei is normally distributed with mean 0 and standard deviation sigma (variance sigma^2). Further, we will assume that for different individuals, the e-terms are independent. That is, ei and ej are IID N(0, sigma^2) for i not equal to j. We will assume one measurement per individual. The coefficients B0, B1, B2, ..., Bp are unknown. The goal in regression is to collect enough data to be able to estimate the coefficients and make inferences about them. For a given individual, the predictors can be represented as a row vector, Xi : Xi = (1, X1i, X2i, ..., Xpi). Note that the '1' in this row vector corresponds to the coefficient B0, the intercept term. Similarly, the coefficients can be represented by a vector, B. Again it is easier to show the transpose of B rather than the column vector B: B' = (B0, B1, B2, ..., Bp). Model [1] above can be represented in matrix terms as: [2] Yi = Xi * B + ei, where the multiplication is matrix multiplication. Again, this represents the data for just one observation on one individual. The entire data set can be represented by matrices. For example, | 1 X11 X21 X31 ... Xp1 | | Y1 | | 1 X12 X22 X32 Xp2 | | Y2 | | 1 X13 X23 X33 Xp3 | | Y3 | X = | . . . . . | and Y = | . | | . . . . . | | . | | . . . . . | | . | | 1 X1n X2n X3n ... Xpn | | Yn | Here n = the number of individuals. Note that X is an n x (p + 1) matrix, and Y is an n x 1 column vector. In terms of these matrices, the model may be written as [3] Y = X * B + e, where e is the n x 1 column vector of error terms. Coefficient estimates in ordinary regression are determined by finding the values of B0, B1, B2, ... , Bp which minimize the sum of squares: SSQ = Sum(Yi - Xi * B)^2, where the sum is over the i individuals (i = 1 to n). The values of B0, B1, ... , Bp which minimize SSQ can be found by first taking the derivative of SSQ with respect to each of the Bj, j = 0, 1, ... , p, setting the derivatives equal to 0, and then solving the resulting system of (p + 1) linear equations. We will denote the solution by lower-case letters b0, b1, ..., bp. The solution satisfies the following equation: [4] (X'*X) * b = X' * Y. Note that X' * X is a (p + 1) x (p + 1) matrix, and X' * Y is a (p + 1) column vector. The usual solution to [4] is given by: [5] b = inv(X'*X) * X' * Y. [Note: in some cases, the inverse of X'*X does not exist, usually because the model is overparametrized. In such cases, parameter estimates can be obtained by the use of 'generalized inverses.' We will not deal with generalized inverses here. See the textbook of Searle for this topic.] The estimated covariance matrix for b can be derived from the expression above. It is: cov(b) = s^2 * inv(X'* X), where s^2 is the estimate of the variance sigma^2 of the error term, s^2 = (Y'* Y - b' * X'* Y) / (n - p - 1). Moreover, various sums of squares that would appear in an analysis of variance table can be written in matrix terms also: SSREG = sum of squares due to regression : b'* X'* Y - n * Ybar^2 SSRES = sum of squares residual : Y'* Y - b'*X'* Y, SSTOT = sum of squares total (corrected) : Y'* Y - n * Ybar^2, where Ybar = mean of the Yi, i = 1, 2, ..., n. From this, one can compute other statistics used in regression: R-square = SSREG / SSTOT F = [(SSREG / p)] / [SSRES / (n - p - 1)]. The F-statistic is used to test the hypothesis: H0 : B1 = B2 = B3 = ... = Bp = 0, for which F would be compared to an F-distribution with degrees of freedom = (p, n - p - 1). You can see how all this works in the case of simple linear regression, for which the model is Yi = B0 + B1 * Xi + e, where there is only one predictor variable, Xi. In this case, | 1 X1 | | Y1 | | 1 X2 | | Y2 | | 1 X3 | | Y3 | X = | . . | and Y = | . | , and | . . | | . | | . . | | . | | 1 Xn | | Yn | | n sum(Xi) | X' * X = | |. | sum(Xi) sum(Xi^2) | The determinant of X' * X is: D = [n * sum(Xi^2) - {sum(Xi)}^2] From the formula for the inverse of a 2 x 2 matrix given in notes.014, the inverse of X' * X is: | sum(Xi^2)/D - sum(Xi)/D | inv(X'* X) = | | | -sum(Xi)/D n / D | The estimate b = (b0, b1) of the vector B = (B0, B1)' as shown above in equation (5) is: b = inv(X'* X) * X' * Y. Note that here X' * Y = (sum(Yi), sum(Xi * Yi))'. This implies that b1 = [n * sum(Xi * Yi) - sum(Xi) * sum(Yi)] / D. Dividing top and bottom by n gives: b1 = [sum(Xi*Yi) - sum(Xi)*sum(Yi)/n]/[sum(Xi^2) - sum(Xi)*sum(Xi)/n]. This is the familiar form of the expression for the estimated slope in simple linear regression. The corresponding expression for the intercept is b0 = [sum(Xi^2) * sum(Yi) - sum(Xi) * sum(Xi * Yi)] / D. /home/walleye/john-c/5421/notes.015 Last update: February 5, 2000