_________Prof. Bryan Caplan

Prof. Bryan Caplan

bcaplan@gmu.edu

http://www3.gmu.edu/departments/economics/bcaplan

Econ 637

Spring, 1999

Weeks 3-4: The k-Variable Linear Equation

I. Quick Review of Linear Algebra

A. A matrix is a rectangular array of numbers; a vector is a matrix with only 1 column (or only 1 row); a scalar is a matrix with 1 row and 1 column.

B. Two matrices are equal iff they have the same size and the corresponding entries in the two matrices are equal.

C. If A and B are matrices, then their sum is obtained by adding the corresponding entries in each matrix. You can only add matrices of the same size.

D. If A and B are matrices, then their product can be obtained as follows: entry (i,j) in matrix AB equals (the first element of row i in column A times the first element of column j in column B), plus (the second element of row i in column A times the second element of column j in column B)+...+(the last element of row i in column A times the last element of column j in column B). Two matrices can only be multiplied if they are conformable: to get product AB, the # of columns in A must equal the # of rows in B. .

E. I is the "identity matrix" - a matrix with 1's on its diagonal and 0's everywhere else. A*I=A. A’, read "A-transpose" or "A-prime" is simply a matrix in which the 1st column of A is the first row of A’, the 2nd column of A is the second row of A’, etc.

F. A^-1 is the matrix such that AA^-1=I. Finding the inverse of a big matrix is extremely time consuming for a person, but computers are great at it!

G. The rank of a matrix is its number of linearly independent columns. (A set of columns is linearly independent iff has only 1 solution, with all of the k's=0).

H. A matrix can only be inverted if its rank is equal to its # of columns (aka if it has "full rank"). This will be very important for multiple regression, because a regression of Y on a set of variables that do not have full rank will not have a solution.

II. Multiple Regression

A. Intuitively: More than one factor often matters in the real world. And people frequently claim that some matter and others don't. Is there any way to extend the bivariate regression to shed light on this? E.g. finding the impact of IQ controlling for education, or finding the impact of spending controlling for deficits.

B. Mathematically: If you can find the "best" statistical fit between one dependent variable and one independent variable, can you find the "best" fit of one dependent variable on any number of independent variables?

C. It is very convenient at this point to switch to matrix notation. Suppose you have N observations of all variables you are interested in. Then just write the dependent variable, Y, as a 1-column matrix (i.e., a vector) with N rows: . And write each independent variable X_i as a vector with N rows: . Note further that we can think of the constant as itself one of the independent variables: .

D. Now we could write a regression equation with k variables (including the constant) as: , where b _i is the coefficient for variable i, and u is a vector of disturbance terms.

E. This can be written even more compactly by combining all k columns of independent variables into one big matrix X with N rows and k columns. Similarly, combine all k coefficient scalars into a single (kx1) matrix of coefficients b . Then we can rewrite the above equation as: !

F. An example: Y is a student’s grade on the final exam, and it is regressed on a constant, the # of hours the student studied, and their GPA. What do Y, X, and b look like? Are the dimensions right?

The Mathematics of Multiple Regression

Now we will redo the results established for the bivariate case for the k-variate case, enabling us to regress any dependent variable on any set of independent variables we like.
The vector of error terms, . Let’s choose b to minimize the SSE - which can now be conveniently written as e’e. (Double-check that e’e is a scalar).
Note that Y’Xb is a scalar, so Y’Xb=b’X’Y.

Now simply differentiate above expression wrt b. The derivative of Y’Y wrt b is obviously a vector of 0’s, since it does not contain b. The derivative of -2b’X’Y wrt b is just -2X’Y. The derivative of b’X’Xb is 2X’Xb.

Setting these terms equal to a vector of zeros yields: .
Rearranging and dividing by 2 gives: .
Finally, pre-multiply by (X’X)^-1 to solve for b: . This is an enormously powerful result.
The simple vs. the expectations-augmented Phillips curve: an example of multivariate regression.

Bivariate Regression as a Special Case

Double-check that this formula implies the bivariate results: if , then and . Then .
Writing these equations out explicitly gives:
Notice that these are identical to our results for last week: just divide the first equation by N to get , and substitute this expression for b1 into the second equation.

The Full Rank Condition

Mathematically, will only have a solution if (X’X) is invertible. And it will only be invertible if (X’X) has full rank. But what does this mean, and is there any intuitive interpretation?
Suppose you put the same variable into your regression twice. Does this make sense?
Suppose you include 3 variables: a constant, a "male" dummy (=1 if male, 0 otherwise), and a "female" dummy (=1 if female, 0 otherwise). Does this make sense?
Suppose you include a variable that is always equal to zero. Does this make sense?
Suppose you include both a constant (a vector of 1’s), and a vector of 3’s. Does this make sense?
Suppose you have 10 variables and only 3 observations. Does this make sense?
All are examples of violation of the full rank condition. In general, these are cases when your regression makes no sense - i.e., there cannot be a unique "best" way to fit the data to an equation.
Rule of thumb: violation of the full rank condition never arises by chance. It can only occur if there is conceptual confusion in your initial setup - if one of the columns of X is just a linear combination of some other columns. (Although figuring out your confusion isn’t always easy!)

R² and Multiple Regression

R², as you recall, measures the "goodness of fit" of an equation. We now extend it to the multivariate case.
Designate demeaned variables by their lower-case counterparts. Then: y=xb+e, so y’y=b’x’xb+e’e (recall that the correlation between X and e is always 0).
This is the multivariate version of TSS=ESS+SSE. R²=ESS/TSS=1-(SSE/TSS). The square root of R² is known as the coefficient of multiple correlation.
Fun fact #1: Adding more variables to an equation cannot decrease the R². If every non-zero coefficient on a new variable makes the fit worse, you can always just set its coefficient equal to zero to keep the fit the same as before.
Fun fact #2: If you have N observations, you can always get a perfect fit with R²=1 by simply having N explanatory variables. Implications?
Without proof: You must have a constant vector in your regression to be assured that R² lies between 0 and 1.

Mean of b

We maintain the earlier assumption that the disturbance terms are iid N(0,s ²).
. Sub in for Y to get: . Then simply apply the expectations operator: . b is an unbiased estimator for b (given our assumptions).
In order to derive var(b), use the fact that E(b)=b : var(b)=E[(b-b )(b-b )'].
(Verify that the dimensionality is right: the diagonals give variances, the off-diagonals give covariances).
E[(b-b )(b-b )']= . Remembering that E(u'u)=s ², and canceling inverses: var(b)=s ²(X'X)^-1.
This formula allows us to derive the formulas for the variance of the constant and slope coefficient in the bivariate case. That will be left as an exercise.
How do you get from a calculation of var(b) to the SE's? The variance of the first coefficient is the first diagonal term; the variance of the second coefficient is the second diagonal term; etc. To get SEs, just take the square root. (How can you be sure that all of the diagonal terms will be positive? Must the off-diagonals be positive?)
Application: more Phillips curve regressions.

Estimating s ²

If you know the value of s ², you're already done. However, in real life you have to estimate s ². How can this be done?
Note that e=Y-Xb=Y-X(X'X)^-1X'Y=(I-X(X'X)^-1X')Y. Define M=I-X(X'X)^-1X'; then e=MY. Furthermore, note that MY=M(Xb +u)=Mu (since MX=0).
E(e'e)=E(u'M'Mu)=E(u'Mu).
Aside: The trace of a matrix, tr(X), is the sum of its diagonal entries.

tr(AB)=tr(BA)
tr(A+B)=tr(A)+tr(B)

Since the trace of a scalar is just itself (it's only got 1 diagonal!): E(u'Mu)=E[tr(u'Mu)]=E[tr(u'uM)]= s ²tr(M) (since E(u'u)=s ² and E(M)=M).
s ²tr(M)=s ²[tr(I)-tr(X(X'X)^-1X']= s ²[tr(I)-tr((X'X)^-1X'X]=s ²(N-k).
Therefore, since E(e'e)=s ²(N-k), an unbiased estimator for s ²=e'e/(N-k)
The Gauss-Markov theorem (stated without proof): the OLS estimator is BLUE (best linear unbiased estimator).

Hypothesis Testing with Multivariate Regressions

The simplest hypotheses to test concern only a single variable. In this case, everything learned from the bivariate case carries over: just set up an appropriate CI for the single variable you're interested in, and use the t-distribution (with N-k degrees of freedom) to get critical values.
Alternately, one can use the square root of the F-test with (1, N-k) degrees of freedom. It's equivalent.

Restricted vs. Unrestricted Regression

For hypotheses involving more than 1 variable, it is much easier to use the F-test. There are numerous equivalent versions, so I'll just give you the easiest.
First, do a vanilla regression of the dependent variable on all of the independent variables. Calculate the R².
Second, do the restricted regression that imposes the conditions you want to test. Calculate the new R².

If you are testing the hypothesis that some coefficients are zero, just re-run the regression without those variables.
If you are testing the hypothesis that two variables have the same coefficient, just add them together (forcing them to have 1 coefficient).
If you are testing the hypothesis that a variable equals an exact #, multiply the variable by that #, then subtract it from both sides of the equation (or combine it with the constant).

Calculate the test statistic: it is equal to , where N is the # of observations, k is the # of variables in the UN-restricted regression, k₂ is the # of restrictions imposed on the 2^nd regression, R² is the R² of the UN-restricted regression, and D R² is the change in the R²'s.
Compare the test statistic to the appropriate critical value: the F(k₂,N-k) distribution. If your test statistic exceeds the critical value, you can reject the hypothesis.
Example: I have 34 years of data on inflation and unemployment, and regress Unemployment on a constant, current inflation, lagged inflation, and lagged unemployment. Suppose I want to test two hypotheses: first, the coefficients on inflation and lagged inflation are equal and opposite; second, the coefficient on lagged unemployment =.8.
First, I estimate the unrestricted equation, finding that its R²=.85. I then regress unemployment-.8*lagged unemployment on a constant and (inflation-lagged inflation), with R²=.20. So the test statistic is: (34-4)/2*.65/.15=65. The 5% critical value for F(2,34)=3.32, so this hypothesis is strongly rejected.
In practice, single coefficient tests are by far the most common - mainly because they can be done without the use of a table.