Prof. Bryan Caplan

http://www.gmu.edu/departments/economics/bcaplan

Econ 345

Fall, 1998

Week 8: Types of Data

- Three Types of Data
- Cross sectional data: You observe each member in your sample ONCE (usually but not necessarily at the same time).
- Time series: You observe each variable once per time period for a number of periods.
- Pooled time series (="panel data"): You observe each member in your sample once per time period for a number of periods.
- Examples of Cross Sectional Data
- Observing the heights and weights of 1000 people.
- Observing the income, education, and experience of 1000 people.
- Observing the profitability of 20 firms.
- Observing the per-capita GDP, population, and real defense spending of 80 nations.
- Examples of Time Series
- Observing U.S. inflation and unemployment from 1961-1995.
- Observing U.S. nominal GDP, real GDP, and M2 from 1960-1992.
- Observing the profitability of one firm over 20 years.
- Observing the daily closing price of gold over 30 years.
- Examples of Pooled Time Series (="panel data")
- Observing the inflation rates and rates of M2 growth of 15 countries over the 1970-1995 period.
- Observing the output and prices of 100 industries over 12 quarters.
- Observing the profitability of 20 firms over 20 years.
- Observing the annual rate of return of 300 mutual funds over the 1960-1997 period.
- Dummy Variables
- We have already briefly discussed dummy variables. A dummy variable is simply a variable that can either be 0 or 1.
- A dummy variable is used to put "discrete" variables into a regression model. Most variables are continuous - you can earn $5 per hour, $5.07 per hour, $70.83 per hour, etc. But some variables are discrete: you either are or are not male. So if you wanted to you at the effect of "maleness," you could define a dummy variable that =1 if the person is a male, and 0 otherwise.
- Examples of Dummy Variables
- Male = 1 if male; 0 otherwise
- White = 1 if white; 0 otherwise
- War = 1 if a country was at war in a given year; 0 otherwise
- Tall = 1 if a person is taller than 6'; 0 otherwise
- Degree =1 if a person finished their college degree; 0 otherwise ("sheepskin effect")
- Independent Dummy Variables
- You can use dummy variables as independent variables (on the right-hand side of your equation).
- If you regress one dependent variable on a constant and a dummy variable (say Male), the coefficient on the dummy variable is exactly equal to the
*average*difference between Men and Women. If the dummy variable is $2000, it shows that men on average make $2000 more than women; if it is -$500, it shows that men on average make $500 less than women. - If you regress one dependent variable on a dummy AND one or more other variables, then the coefficient on the dummy shows that average difference between e.g. men and women CONTROLLING for the other variables.
- You can put as many dummies as you want onto the right-hand side of your equation, but you have to be careful. You
*cannot*put two dummies that always add up to 1. E.g., you can't put both Male and Female in, because Male+Female=1. (Why not? You need to take the graduate class to find out). However, you could put both Male and White in, because Male+White doesn't always add up to one. - Question: Suppose you have 50 states or 100 industries. How many "state dummies" or "industry dummies" can you include in your regression equation?
- Answer: 49 states; 99 industries. Why?
- Dependent Dummy Variables
- You can also make a dummy variable the dependent variable (on the left-hand side of the equation).
- Example: regress War (=1 if the country was at war, =0 otherwise) on different economic variables to explain why countries would go to war.
- Your prediction will typically not equal either 0 or 1; rather, it will be a number in between 0 and 1. You can interpret this as the conditional probability that a country will the observed characteristics will go to war.
- Trend Variables
- Any two variables with similar
*trends*(i.e., tendency to go up or down) will generally be highly correlated if you regress one on the other. But this is frequently bogus: a higher price of IBM does not cause a higher price of Exxon, nor does a higher price of Exxon cause a higher price of IBM. - Question: How can you figure out which variables with similar trends really have a significant relationship which each other?
- Answer: Add a TREND variable. A simple trend just looks like {1, 2, 3, 4, ...}. One simple (though not the easiest) way to do this in Eviews is to just genr a blank series, then Edit the series manually until it looks like a trend.
- If you regress e.g. the price of IBM on a trend and the price of Exxon, in all likelihood the trend will basically explain everything, and Exxon's price nothing. Exxon mattered just because it had the same general direction as IBM, not because they were really related.
- Similarly, you can regress e.g. real GDP on M2 and a trend. Is there really a relationship between M2 and GDP, or are they just both going up all of the time.
- There are also more complicated trends. You can add a "quadratic trend"={1, 4, 9, 16, 25, ...} if a series is increasing at an increasing rate.
- It is often a good idea to add a trend to time series regression to double-check your results

Prof. Bryan Caplan

http://www.gmu.edu/departments/economics/bcaplan

Fall, 1997

Week 9: Dummy and Trend Variables