Bryan Caplan

Prof. Bryan Caplan

bcaplan@gmu.edu

http://www3.gmu.edu/departments/economics/bcaplan

Econ 637

Spring, 1999

Weeks 13-14: Applications and Reflections, II

Do the Advanced Topics Have Any Value-Added?

The topics covered before the midterm were all various aspects of OLS estimation.
Since the midterm, however, we have investigated a variety of more advanced topics.
Interesting phenomenon: the advanced topics seem to go in and out of fashion, while OLS retains popularity year after year.
My guess of the reason (other than OLS is easier and requires less study): people who do econometrics gradually get a sense for how easy it is to manipulate OLS. But few use the advanced techniques enough to get a comparable sense of the scope of statistical deceit that they afford.
General rule followed by many:

See if OLS works first.
Then try more advanced techniques to check the sensitivity of your OLS results.

My sense: Far too much time is spent on technical econometric issues, not nearly enough on two far greater problems:

Distinguishing correlation and causation.
Practical and conceptual data problems.

More on Correlation vs. Causation

Econometricians have not totally ignored the problem of distinguishing correlation from causation.
2SLS, 3SLS, MLE, GMM, and other techniques attempt to explicitly cope with simultaneity problems - probably where the danger for conflation of correlation and cause is greatest.
Related problem: over-generalizing. Econometricians use panel data to cope with this problem.
Nevertheless, it remains far from clear that the techniques used are actually successful in distinguishing cause and correlation.

Numerous pieces of evidence in e.g. recent time series literature: E.g. the "price puzzle" and "liquidity puzzle" in the structural VAR literature.
Panel data estimation often seems to "drain out" clear patterns from the data. E.g. "differences in differences" estimation of the causes of European unemployment.

The recent turn to "natural experiments" is almost surely a good one: at least this shifts the focus away from econometric theory onto the real problem. (But how good are the natural experiments anyway? They don't remotely compare to double-blind medical studies).
Other promising trend: "experimental economics." While this approach doesn't seem likely to solve serious macro issues anytime soon, it has been quite helpful for e.g. game theory.
In general, differentiating cause-and-effect from correlation requires not merely experience, but intellectual judgment. The pure econometrics of simultaneous systems openly acknowledges this, and the same principle applies everywhere else.

Many see this as an argument for nihilism or a fancy variant thereof.
But I see it as a strong confirmation of the epistemology of Thomas Reid and the associated "Scottish philosophy of common sense." See e.g. Reid's Essays on the Intellectual Powers of Man.

Subordinating Econometrics to Economic History

As mentioned in the first week, econometrics has had a strong tendency to "crowd out" more traditional empirical approaches, especially economic history.
Economic history and similar non-econometric approaches to empirical economics have been frequently criticized as "unscientific."

Too much room for subjective judgments.
Ideologically driven.
Too anecdotal.

But these charges could easily be made against econometrics as well. Rather than make a blanket rejection of both, why not openly recognize that all methods face these difficulties and try to surmount them?
Moreover, economic history has a two key advantages over econometrics:

Strong focus on causation rather than correlation - and wider range of tools for differentiating.

Ex: Industrial Revolution
Ex: Hoover and the Great Depression

Permits empirical investigation of questions that can't be easily reduced to a data set.

Ex: Credibility and the gold standard.
Ex: Sub-game perfection and predation.

My judgment: the quality of empirical economics would substantially increase if econometrics were viewed as one tool of the economic historian. Economic history more broadly construed would help resolve issues of causation, look at empirical questions not easily forced into econometric mold, etc.

Ex: Friedman and Schwartz
Ex: Bernanke's "Macroeconomics of the Great Depression" (JMCB, Feb 1995).

Just doing more economic history won't eliminate fundamental disagreements between economists. But it is much better to actually argue about fundamental disagreements than to pretend that technical econometrics is the profession's impartial arbiter.

The Bell Curve

TBC has been even more controversial than the Card-Krueger minimum wage study, but both have something in common.

Opponents: Correlation is not causation, empirical methods have severe problems, plus it is immoral to do this kind of work because of the policies it may inspire.
Supporters: These studies provide strong empirical confirmation of theories that many people dogmatically oppose, and that is what science is all about.
Support for the two studies exhibits strong negative correlation.

Central hypothesis of TBC: intelligence - as measured by IQ and similar tests - makes a lot of difference for a wide variety of social outcomes. In particular, it will make a difference even if you control for socio-economic status and a lot of other variables.

Critics focus almost entirely on the chapters on IQ and ethnicity; but these topics only appear in two chapters.
Standard complaint about IQ tests: cultural bias. M&H address this elsewhere in the text.

Important: All of the results from part 2 deliberately exclude all data for non-whites (including Latinos).
Note: Important difference between e.g. "most criminals have low IQ" and "most people with low IQ are criminals." The former is true; the latter is not. (How is that possible?)

Poverty. M&H run a number of logits; the LHS dummy variable =1 if you are "below the poverty line" and 0 otherwise. (Notice: On graphs, other variables are set at their mean values. You need to make an assumption like this for logits and probits due to non-linearity).

Main findings: logit of poverty line on constant, IQ, SES, and Age shows that IQ matters much more than SES, and Age barely matters at all. (p.596; 134).
Additional findings: IQ matters most for people without college degrees, and for non-married mothers. (I.e., if you have a college degree, or you are a married mom, you are almost never under the poverty line).

Schooling. M&H run logits to determine what affects P(drop-out) and P(getting a bachelor's degree).

Main findings: IQ matters a lot more than SES for dropping out and for getting a bachelor's degree. (p.597-99; 149, 152)

Unemployment, idleness, and injury. Run logits of "out of the labor force" and "unemployed" on IQ, SES, intercept, and age of males only.

Main findings: Higher SEs actually has negative impact on labor force participation, and no impact on unemployment. IQ has predicted negative impact. (p.599, 600; 159, 164)

Family matters. Run logits to estimate P(divorce in first 5 years of marriage) and P(illegitimate first birth).

Main findings: IQ makes divorce less likely, higher SES makes divorce more likely. Both IQ and SES decrease the odds of illegitimacy, but IQ matters more. (p.602-604;175, 183)
Additional findings: IQ increases the odds for people without college degrees to get married in the first place, but has no effect in the overall sample.

Ethnic inequalities in relation to IQ: prologue. This is probably what generates all of the bitter controversy about TBC, so note the following:

Non-whites have been excluded from the sample until this point.
M&H take no stand on genetic vs. environmental origins of IQ, but do indicate that both seem to matter in a wide variety of studies.
M&H do however point out that regardless of the genetic/ environmental composition of IQ, there is virtually no study that shows that anything short of adoption at birth can appreciably raise a person's IQ.

Ethnic inequalities in relation to IQ: the setup. Data for whites, Latinos, and blacks are used separately. Battery of tests run using the following sets of independent variables:

Age
Age and IQ
Age and SES
Age, IQ, and SES

Table 2 (p.648-649) summarizes results of repeating earlier questions using the data sets for each ethnic group.

Main findings: Controlling for IQ reduces, equalizes, or sometimes reverses ethnic differences in college graduation, income, poverty rates, unemployment, and welfare dependency.
Additional findings: IQ does little to explain ethnicities' different marriage rates or illegitimacy rates.

TBC: general observations.

As with Card and Krueger, M&H's study has been attacked as fraudulent, dishonest, etc. In neither case is this likely.
At the same time, defenders of M&H have tended to dismiss critics as anti-empirical ideologues.
M&H did spend a great deal of effort trying to anticipate criticisms and account for them. E.g. Limiting initial results to white-only sample. Critics frequently ignored these efforts. (Similarly, C&K tried to take care of some objections in advance, albeit less thoroughly than M&H did).
M&H spend much more effort trying to make their underlying theory convincing than C&K do. Do they succeed?
Crane: "Sometimes taboos have a legitimate social function." Agree or disagree?

Best critical analysis of TBC: Bill Dickens (+co-authors) "Does the Bell Curve Ring True?" Main point: M&H are qualitatively correct, but overstate their quantitative case. (But then: So have all other researchers on e.g. return-to-education!)

"War as a Natural Macro Experiment: Did Fiscal Policy Ever Matter?"

Motivation:

The U.S. has had extremely large defense cuts over the last ten years - a decline from nearly 8% of GDP to barely 4%. After ten years of cuts, unemployment is at a 27-year low, and it is difficult to see a real or even a nominal impact of the cuts.
Discretionary fiscal policy has been widely abandoned, but mostly for political reasons.

Many economists have seen wars as "natural experiments," but only a few papers (by e.g. Barro) have tried to use this approach to isolate the impact of exogenous policy shocks.

Both fiscal and monetary policy are usually expansionary during wars, but there has been a strong tendency to attribute the growth in nominal and real output mainly to fiscal policy. Is it possible that money has been doing all of the expansionary work - during both war and peace - and that fiscal policy alone merely has a compositional effect?

Aim and Strategy:

Use war and war-related variables as instrumental variables to answer the following two questions:

1. Does fiscal policy matter holding monetary policy constant? (Primary question.)

2. How robust are the structural VAR estimates of the impact of monetary policy? (Secondary question.)

Organization:

1. The data

2. Are war-related variables good instruments?

3. The baseline specification and the baseline results

4. Sensitivity tests

5. Conclusion

1. The Data

To get more robust results, estimation uses two distinct data sets:

a. "Narrow" data set: Annual data from 15 relatively industrialized countries over the period from 1881-1988. Most data courtesy of Michael Bordo, supplemented with data from International Historical Statistics and combined with data from the Correlates of War Project.

b. "Broad" data set: Annual data from 69 more heterogeneous countries over the period from 1949-1992. Data comes from merging parts of Annual Data on Nine Economic and Military Characteristics of 78 Nations, 1948-1983, World Military Expenditures and Arms Transfers, 1983-1993, the Correlates of War Project, and International Historical Statistics. (Supplemented with data from International Financial Statistics Yearbook and the Pennworld data set).

Note: Hyperinflation country-years (nominal output growth>100%) are removed.

2. Are War-Related Variables Good Instruments?

a. Simple regression of {percent change of Nominal GDP, percent change of Real GDP, government spending as a fraction of GDP, and money supply growth} on War suggests No. [Tables 1a and 1b]

b. But: if you separate wars fought exclusively on foreign soil (Soil=1) with wars fought partially on domestic soil, and then regress {percent change of Nominal GDP, percent change of Real GDP, government spending as a fraction of GDP, and money supply growth} on {War*Soil and War*(1-Soil)} you find out that (some) war-related variables look like good instruments after all:

In both data sets, there are large declines in output during domestic wars, and small but statistically significant increases during foreign wars.
For each data set there is a subset of wars such that: monetary policy is expansionary, fiscal policy is expansionary, and nominal output growth is high. (Which subset, however, differs).
In the narrow data set, fiscal policy is always strongly expansionary during war; in the broad data set (Table 1c), fiscal policy is unambiguously expansionary during foreign wars, but during domestic wars the increase in military spending is paid for with cuts in nonmilitary spending.

c. Switching from War to Warmonth makes no difference, but adding Casualty rate variable shows that Casualty sometimes seems to matter along with War*Soil and War*(1-Soil).

Therefore: estimation will use both War and Casualty, interacted with Soil as instruments.

3. The Baseline Specification and the Baseline Results

In the structural VAR literature beginning with Sims (1980), identification is typically achieved by imposing restrictions on contemporaneous interactions. E.g. in Bernanke-Blinder (1992), identification is achieved by assuming that within a given time period, there is no causation from policy variables to non-policy variables (alternately, from non-policy to policy).

With annual data this standard identification technique is not credible; fortunately, the war-related variables provide an alternate identification route - so long as wars are truly exogenous. Want to estimate the following system:

(The measure of fiscal policy, F_t, is the change in Gfrac. Why the change? Only changes, not levels, ought to be expansionary. Since lagged war-related variables are included, we still have instruments to shift F - if current wars shift Gfrac up, then current war shifts F up and lagged war shifts F down).

3 Initial Restrictions

Restriction #1: Occurrence of wars is exogenous.

Restriction #2: Neither foreign nor domestic wars have direct impact of nonpolicy variables.

Restriction #3: To account for AS shocks, casualty rates are allowed to directly affect real but not nominal output.

Equations (7), (8), (9), and (10):

Variables:

X - complete vector of country and year dummies

N - % change in nominal GDP

R - % change in real GDP

M - % change in money supply

F - change in G/GDP

(1-Soil)*War - domestic war dummy

Soil*War - foreign war dummy

(1-Soil)*Cas - casualty rate in a domestic war

Soil*Cas - casualty rate in foreign war

Note: Only the first two equations, (7) and (8), are identified. Since these equations give the coefficients of greatest interest (i.e., the impact of exogenous policy shocks), this is not a severe problem. The 2 identified equations of the partially identified system are estimated using 3SLS, with the war-related variables, the country and year dummies, and lags of all variables serving as instruments.

Problem: Initial results are often explosive, so one additional restriction is imposed:

Restriction #4: Policy directly affects only nominal output, not real output. Fits in well with the New Keynesian nominal rigidity literature, though not so well with RBC.

Estimation of (7) and (8) using 3SLS and imposing all FOUR restrictions is known as the "baseline specification."

Estimation of this system yields quite consistent results for both data sets. The estimates are then used to calculate impulse-response functions and ± 2 SE bands for TWO policy experiments:

Policy Experiment #1: Permanently Increasing the Rate of Money Supply Growth by 1%: [Figures 1ab-2ab]

Money matters for both nominal and real output for both narrow and broad data sets.
Real impact of money doesn't totally die out, violating long-run neutrality. The same problem is often encountered in the structural VAR literature, but additional lags and other specifications make the anomaly smaller.

Policy Experiment #2: Permanently Increasing F by 1% (equivalent to e.g. going from G/GDP of 40% to G/GDP of 70% over the course of 30 years). [Figures 3ab-4ab]

Fiscal policy matters for NEITHER nominal nor real output.
Point estimates are actually negative, but one can't reject the null of zero impact.

4. Sensitivity Tests:

a. Alternate Measures of Fiscal Policy.

i. New Keynesian but not RBC theories suggest that increased taxation might obscure expansionary impact of fiscal policy. Results (narrow set only) re-estimated with change in tfrac=(tax collections/GDP) added as a fifth endogenous variable. [Figures 5a-f] SE bands for money get tighter; negative point estimate for spending unchanged; taxes have standard negative point estimate as well. (But can't reject null of zero nominal and real effect for both spending and taxes).

ii. Separating military and nonmilitary spending (broad set only). [Figures 6a-b] Shows statistically significant negative impact of military spending on nominal and real output, and approximately zero impact of nonmilitary spending.

b. Number of Lags.

Switching to 6 lags makes the SE bands smaller, and shrinks the estimated nominal and real impact to more reasonable levels. Qualitative results stay about the same.

c. Choice of Instrumental Variables and Method of System Estimation

Switching from War to Warmonth changes nothing. Eliminating or replacing Casualty variable sometimes makes the results explosive, and generally increases size of SE bands.

Estimating baseline specification using GMM instead of 3SLS increases size of SE bands, but qualitative results and statistical significance unchanged.

5. Conclusion

Main Findings:

a. Fiscal policy seems to have no nominal or real effect.

b. Structural VAR estimates of nominal and real effect of monetary policy are quite robust, since my alternate approach yields similar answers.

Main Strengths of Results:

a. Qualitative and even quantitatively consistent estimates emerge from two very different data sets.

b. Data set robust to a variety of specification changes.

Main Weaknesses of Results:

a. Choice of instruments matters for reasons that are a priori unclear.

b. Both data sets sometimes imply explosive impulse-response functions for some specifications.

Broader Observations:

a. Nominal rigidity models cannot be readily extended to infer impact of fiscal policy, because there is little evidence that fiscal policy even has a nominal effect.

b. Shift in stabilization policy away from fiscal policy and towards monetary has been wise, because money can affect both nominal and real output and fiscal policy can't affect either. No need to rely on political or lag arguments.

An Introduction to Experimental Economics

If empirical economics is unreliable because the data are not gathered under controlled experimental conditions, why not try experimental methods? This approach has become increasingly popular.
How do you set up an economic experiment? Mostly obvious in theory, though tricky in practice.

You want to put human agents in a situation where an economic theory predicts definite outcomes, and see if they actually do what the theory predicts.
You want your artificial environment to be the same for each group of subjects - to make your results comparable, and to permit replication by other researchers.

Main non-obvious aspect: "Induced Value Theory." Monetary incentives are necessary but not sufficient. They have to be monetary incentives mimic those that the economic theory under consideration posits. Moreover, the researcher often has to actually know preferences to see if the economic theory predicts correctly.
Ex: When testing basic supply and demand, don't let people decide their supply and demand curves for themselves. Instead, give people incentives that "induce" the demand and supply curves you want them to have.

Tell demanders their reservation prices and suppliers their marginal costs.
Make demanders' monetary rewards proportional to the difference between reservation prices and price paid; make suppliers' monetary rewards proportional to the difference between marginal cost and price received.
Calculate intersection of S&D. Test whether observed market outcome is the same.
Crucial assumption: "payoff dominance." The monetary rewards have to be large enough to dominate other considerations. If the most a person can win is a penny, it is unreasonable to expect economic theory to work.

Question: How would you set up "induced values" for test of public goods theory?
General findings of experimental economics:

Many theories work well: S&D works, and is robust to extreme elasticities and even to the number of players (for numbers greater than 4 or so).
Evidence on some other theories is more fixed. In public goods experiments, you almost always observe positive levels of cooperation. But cooperation is also well under 100%, and diminishes with repeat play.
Evidence on some other theories is quite negative. Expected utility theory flops along many dimensions. For example, expected utility theory predicts that everyone will be risk-neutral over lottery tickets (as opposed to outcomes), but they aren't. Payoff dominance critique definitely applies to many of these experiments, though.

Experimental Evidence on the 3-Doors Paradox

Experiments frequently play on perceived "cognitive anomalies." Probably the hardest of all of these cognitive anomalies is known as the 3-doors paradox.
The setting: There are three doors. A big prize is behind one of the three. You are asked to pick 1 door. Then, MC opens one empty door that you did not pick. There are now only two doors left: the one you originally picked, and the one the MC failed to open. You now get a second choice: stay or switch. After you make this choice, the MC opens the door you picked. It is has the prize in it, you get the prize. Otherwise you get nothing.
Absolute truth, whether you believe it or not: The expected payoff of switching is fully DOUBLE that of staying. After the door has been opened, there is a 1/3 chance that you pick is right, and a 2/3 chance that the other door is right.
Why? Initially, there is a 1/3 chance that your pick is right. If you switch when you were right initially, you always lose. However, there is a 2/3 chance that your pick was wrong. One of the doors you didn't pick is right, the other is wrong. But the MC only opens empty doors. So if you picked wrong initially, the remaining door HAS to be correct. Thus, if you switch, there is a 1/3 chance you switch from right to wrong, and a 2/3 chance that you switch from wrong to right.
Friedman (1998) reports experimental evidence on the 3-doors paradox. Payoff structure: Ten trials, payoff=$1.00+$.30*#wins. Initial environment: isolated. This is termed "Run1." Note that the marginal payoff of being right is 10 cents (2/3*.30-1/3*30=10).
Results: Switching slightly increases over time, but never even reaches 50%! Theoretical prediction of course is 100% switch rate.
Linear probability model for switch rate (p.940) shows positive time trend and a significant impact of Switchbonus (defined as cumulative earnings for switching - not switching).
Run2: Same subjects given option to play 12 or 15 more rounds. During Run2, subjects receive one or more alternative treatments:

Intense incentives (Intense): Subjects get +$1 if right, -$.50 if wrong. Marginal incentive to switch is now (2/3*100-1/3*50)-(1/3*100-2/3*50)=$.50, five times the initial marginal incentive.
Track record (Track): Each subject fills out table with three columns as they play: Your Payoff, Always Switch Payoff, and Always Stay Payoff.
Written advice (Advice): Subjects given a page with two arguments: One sound argument for switching, one bogus argument for not switching.
Comparative results (Compare): After 6^th period, subjects told about the results for the first 40 subjects: % of switches that won, % of stays that won, and the fact that a majority stayed.

All treatments except Intense made performance closer to theoretical predictions. Intense actually had a negative effect.
But: switch rate never tops 60%.
Friedman's "recipe for pseudo-anomalies."
What about long-run elasticity? What if people were given a lunch break of 90 minutes? Would Intense incentives have improved performance then?
Bottom line: Experiments are a useful addition to the empirical economist's toolkit. Combine experimental data with simple econometrics can also be quite publishable, as the Friedman publication shows...