Current claims about the benefits of using “rigorous” evidence are deeply wrong

I also have a new paper that argues that the current conventional wisdom about “rigorous evidence” in policy making is just empirically wrong. That is, a current conventional wisdom is that, because of the dangers of a lack of internal validity of estimates of causal impact (LATE) one needs to do RCTs. Then, after doing some number of those, one should do a “systematic review” that aggregates those “rigorous” estimates and policy making should be “evidence based” and rely on these systematic reviews. The paper shows, with real world data across countries, that this approach actually produces larger prediction error in causal impact than if each country just relied on its own internally biased estimates.


A simple analogy is I helpful. Suppose all men lie about their height and claim to be 1 inch taller than they actually are. Then self-reported height is internally biased. One could do a study of produce “rigorous” estimate of the true height of men and have the distribution of true heights, which has a mean (say, 69 inches (5′ 9”) and a standard deviation (3 inches). Then suppose I want to predict Bob’s height. If I don’t know anything about Bob then 69 inches is my best guess. But suppose I do have Bob’s self-reported height and he says he is 6′ 3” (75 inches tall). The conventional wisdom of “RCTs plus systematic review” approach would tell us to guess 69 inches and ignore altogether Bob’s self-report because it is not a “rigorous” estimate of Bob’s height because it isn’t internally valid and is biased. But in this case that approach is obvious madness. We should guess that not that Bob is 69 inches but that he is 6’2” (74 inches) tall and if Fred says he is 5’5” (65 inches) we should guess not 69 inches but 64 inches.

The obvious point is that the prediction error across a number of cases depends on the relative magnitude of the true heterogeneity in the LATE across contexts versus the magnitude of internal bias in a given context. There is no scientifically defensible case for using the mean of the set of “rigorous” estimates as the context specific estimate of the LATE (the proposed “conventional wisdom”) in the absence of (a) specific and defensible claims about the heterogeneity of the true LATE across contexts (and the available evidence suggests heterogeneity of LATE is large) and the typical magnitude and heterogeneity of the internal bias from various context specific estimates (about which we know little).

The paper (which is a homage to Ed Leamer’s classic “Let’s take the con out of econometrics” paper)–and which is still a draft–illustrates this point with data about estimates of the private sector learning premium across countries, where I show that both the heterogeneity across countries in the estimates are large and the internal bias is also large and that the net is the the “rigorous estimates plus systematic review” approach produces larger RMSE (root mean square error) of prediction that just using the OLS estimate (adjusting estimates for student HH SES) for each country.