Awhile back I was exchanging emails with a very savvy economist who works almost exclusively on the USA. I was arguing that “development” should be 80 percent devoted to “national development” and 80 percent of that to (inclusive) growth and only 20 percent to programs that mitigate the consequences of the lack of national development (an argument I have been making for some time, here and here).
My impression is that the fad is for “development” (among philanthropists and others) to be 20-80–mostly about programs targeted to specific households (e.g. anti-poverty programs) or specific outcomes (e.g. safe water) or both (e.g. safe water for poor households). Only in that allocation of development assistance/effort can RCTs even seem an important part of development in providing “evidence” for “what works” (and event that isn’t really true) as it is obvious that most RCTs are not useful for country level questions about national development or growth (here) (and here, with new data, in this new blog by Dany Bahar).
He in response, gave a quick and thoughtful response:
Taking your fractional breakdowns, could you fill in these numbers: Absent the randomistas it would have been X:(100-X) development: charity. With them it is 20:80. But the 80 is Z times more effective than the (100-X) would have been because we spend the charity better. So their net impact is…
Here is my emailed response (slightly edited and with more links added);
I am worried my quick and long answers might give you the impression I am not busy and have time weighing on my hands, but it is the refreshing opportunity to discuss these issues with an open minded economist’s economist that has me procrastinating other things to respond.
Four big sections.
First, but we have to get the causal order straight. RCTs are the child of the shift to 20-80 where 80 is “programmatic charity work.” The story of that shift is really the politics of Western support for development assistance. That story is something like the two-fold impact of the end of the Cold War (which was a major motivator of “getting to Denmark” national development efforts to keep the Third World out of the Second by keeping it on the way to the First) and the decade(s) of addressing the debt crisis via structural adjustment (starting in 1982 with Mexico).
There are two key turning points.
One is the 1990 World Development Report on Poverty which adopts for measuring the total number of people poor the ‘dollar a day’ poverty line, which was, as I argued at the time, ridiculously low and would destroy efforts to pursue national development, but the line was adopted because of World Bank politics (the director was nervous that anything but the lowest possible poverty line would make the WB seeming like a special pleading advocacy group). (This narrative is in this paper).
Two was the 2000 Millennium Development Goals, which were a Jeff Sachs brainstorm that to protect foreign aid from being part of the “peace dividend” the West country voters needed more “accountability” of aid for concrete results so set up as the “goals” of development consisting of on a few goals, for topics very popular (with West voters), all set at a very low bar.
So, for instance, the goal for “education” was “every child complete primary school” (nothing about quality of learning, nothing about middle school, high school, nothing about job skills, nothing about higher education). The goals for health were not about the health system but about specific diseases (goal 4 was child mortality, which is broad but age specific), goal 5 was maternal health, goal six was HIV/AIDS, Malaria, and other diseases.
It was only with this new (in 2000) framing that “development goals” was not “India becomes the UK (or Denmark)” but “malaria is reduced” or “every kid completes primary school” or (Goal 1) “lower dollar a day poverty” and that foreign aid should be “accountable” for these goals that the “programmatic charity” view of development really took off.
And, only in that broader context could the “randomista revolution” happen as the RCT could be sold as both (i) an “accountability” device–prove your programs work to get funding and (ii) a learning device of “what works” to cost effectively meet these specific target outcomes.
So, to first order, the shift from to 20-80 was not in the context “and with RCTs these can be more effective” as in 2000 there was still only a handful of RCTs (I helped Esther get the data from Indonesia for her job market paper while I was working with the WB in 1999, so she was still in grad school).
Second, in the pre-randomista revolution era (1990s into 2000s) the question was the relative efficacy of “national development” efforts (mainly growth, but also “system” work in sectors and on governance) versus “programmatic” efforts.
I was working at the time a lot on health and education so let me give just two examples.
Larry and I did a paper “Wealthier is Healthier” that used IV approaches to show the higher income to better child mortality outcomes was causal and pretty big.
Deon Filmer and I did a paper “The Impact of Public Spending on Health: Does Money Matter?” (which has 1240 citations and so it is not obscure) in which we did two things.
a) We showed that the cross-national variation in (well measured) child mortality was almost completely explained by: GDP per capita, the Gini (which is needed due to Jensen’s inequality even without any “direct” inequality effect), women’s stock of schooling, and a dummy for Muslim countries (due to excess female mortality in those countries), and regional dummies. The Rsq of that regression is .95 (and since measurement error in child mortality is only the order of 2.5 to 5 percent of the variance that means pretty much all of it). The important point here is that it isn’t just “growth is important” the result is that if these correlates are causally prior to health spending (or anything about the health system), which they plausibly are (and we could do IV and all that) then the challenge of finding the impact of exogenous health actions conditional on those factors is that there is almost nothing left to be explained. I know Rsq is old fashioned (and was even at the time, because of its mis-use in model selection) but it actually means something concrete and useful.
b) We then made it clear we were not doubting that modern medicine worked, nor that there were not “cost effective” interventions in health (like vaccinations for childhood diseases or Oral Rehydration Therapy). We were making the case there was a micro-macro paradox. One should show evidence (at the time) that there were feasible interventions that had cost effectiveness of 100 or 1000 dollars per life saved. But, if you inferred the “public dollars spent per life saved” from the cross-national data then (i) it was not robustly statistically different from zero and (ii) the point estimates implied millions of dollars per life saved. (This was backed up later by micro studies of provider behavior showing publicly provided ambulatory care in India did not reach the level of “do no harm” (here and here)).
So the idea that shifting development assistance from “promoting growth” to “spending on health” was better, even for health, was actually not so well proven. It just “seems” more concrete to non-economists.
(And, on the plane to Cambridge last night I read a 2025 report “Taking stock of development assistance for health in the 21st century” written by committed global health advocates (not skeptical economists) which finds very weak support in the aggregate data for an impact of development assistance on health outcomes).
The other example is that Deon Filmer and I did a similar paper on education titled “What do education production functions really show?” in which we looked at micro evidence about the impact of different types of education and found that generally there were order of magnitude differences in “learning gain per dollar” across various types of expenditure and that, generally, things that entered directly into teacher utility (e.g. wages, class size) were orders of magnitude smaller at the margin than things that did not (e.g. learning materials, desks, quality of facilities). This meant, again, that the question “What is the impact of additional spending on learning outcomes?” has the answer “depends radically (up to three orders of magnitude) on what you spend it on and that is a political, not a technical question.” Again, the “take money from the growth or national development side and devote it to the education side” is not an obvious plus, even judged on the effect on education (learning) alone.
Third. So the young Turks of randomization come into a world in which (a) “development” within the Western world of development assistance (actual developing countries never bought into this) had been “defined down” (to what I call “kinky development” as if just getting above some low-bar is the actual goal) and (b) in which the evidence that, at the margin, additional public spending (and hence most development assistance if money is fungible) on these specific goals (poverty, primary enrollment, malaria) is far, far, from efficient (and even debatable it is even efficacious). (And with the added factor that Gates (and other NGOs a bit) are bigger players).
(And, within the logic of the world of academic economics there was increasing skepticism of non-randomized ways to identify causal effects as the IV approach was under increasing attack for reasons good and bad).
In that context, there were two narratives of why and how RCTs would make a difference that had to have two elements.
One element was how the evidence would translate into action.
a) the “accountability” story, which is that “independent impact evaluation” would be a tool for a central allocator of funding to evaluate the claims of sector advocates about efficacy and stop funding of interventions that were not effective (or cost effective) and reallocate that spending to interventions which had “rigorous evidence” of being effective. Note that this story of impact of RCTs in development assistance is (and has to be) explicitly a political story about the people who are choosing to allocate expenses and in this story the “independent” (of the actual do-ers) evaluator works for the “policy maker” and takes money away from some people and gives it to other people (or, the policy maker is able to “force” the doers to do something else).
b) the other narrative is the “learning” story in which there are well meaning actors, with resources, who are fungible across approaches (e.g. not wedded to “micro-finance”), who are willing to follow the evidence to cost effective things. This is more a story about (some) philanthropy than a realistic view of either governments (and their agencies) or the development assistance bureaucracies.
The other element is that RCTs would produce better evidence for decision making, which kinds of depends on the mechanics of decision making above.
All of this, all the elements of how RCTs would lead to better outcomes, was pretty dubious to people actually working in development from the get go. That is, for all of the hoopla about “evidence” there was never any evidence that evidence was the binding constraint to improved efficacy at scale.
Just two points and then time to answer your concrete question.
One, after years and years of RCTs on education JPAL published a table of the “cost effectiveness” on learning of various interventions that looked exactly like the results of Filmer and Pritchett (1999) from 20 years earlier, showing that some interventions were orders and orders more efficacious than others (and even, to some extent the same pattern). But instead of concluding, as Filmer and Pritchett did, this shows that the politics of how education spending is allocated matter and we need a positive model of why there is so much spending on inputs ineffective at the margin in order to make “recommendations” they (JPAL) say this evidence implies the recommendation “do the effective thing” as if that isn’t completely naive of a positive economics of the existing outcomes, to who, exactly, is that a “recommendation”?
Two, the idea that RCTs generate useful evidence for decision making in a development context hinges on the relative merits of “internal validity” (bias) and “external validity” (contextual variation). As either the RCT has to be done context by context (which is hugely expensive) or one has to believe that evidence “rigorous” in one context is “good enough” for another.
On that, two points. (a) “external validity” claims are logically incoherent if there is large variation in observational estimates (a point I won’t try and articulate here) and (b) I have a couple of papers (including one that is recent and really good) showing that the claim that “do RCTs, do a systematic review, take the mean of the RCT estimates as the point estimate for the LATE in your country” produces (much) higher Root Mean Square Error of prediction of the LATE across countries than does “Use the OLS estimate in your country.” This is just a statement that, at least across four different phenomena for which we could generate enough estimates, the internal validity problems with OLS was (much) smaller than the variation across countries in the LATE, so using the mean LATE was worse than using the biased country specific. And, in the paper, I make the point that we have, as economists, zero basis on which to have any prior about this. That is, the LATE of an intervention is not like “electron mass” about which physics has a theory of external validity (“electrons have no hair”) or even the “boiling point of H20” which varies by pressure, but according to a formula. In my paper I take the “learning impact of an exogenous move of a child from public to private school.” We do not have a theory that this is constant across countries and no theory that the selection bias that contaminates OLS is constant across countries (and good evidence that both do vary across countries). So there is no reason why one could or should start from any presumption that an aggregate of rigorous LATE estimates applied across countries (or contexts) is better than using context specific OLS on observational data.
Fourth, so, finally, back to your question.
1) we know that the gains from growth accelerations are huge. In Trillions Gained and Lost me and my co-authors estimate the timing and magnitude of growth episodes versus a counter-factual of BAU. So, for instance, we find the total NPV of gain from the growth accelerations (there were two in our dating, one in the early 1990s and another in early 2000s) was on the order of 2 trillion dollars of income to Indians (so people with high marginal utility of money as the growth broad-based). What could you have spent probabilistically on a lottery ticket of affecting the likelihood of these growth accelerations in order for this to have been a massively high benefit cost decision? In a paper Method or Madness I discuss that the Ford Foundation spend, many, ten million dollars over a number of years to support a think tank (ICRIER in Delhi) that, according to many narrative accounts, helped with research and advocacy pave the way to make the reforms in India in the early 1990s, which plausibly accounted for the growth acceleration possible. Suppose that only increased the probability of this outcome by 1 percent, well, 1 percent of 2 trillion is a big number.
Alternatively, the value of the China growth accelerations are even bigger. And Edward Lim, the World Bank country director, tells a narrative of a set of very behind the scenes discussions with the Chinese leadership as Deng took power that helped along the decision to launch their (incremental, cross the river a step at a time) reforms. I did the semi-facetious calculation that even if all of academic economics from Adam Smith to 1978 did nothing but create a body of evidence that persuaded the Chinese leadership to embark on their reforms it was all worth it.
And the losses from the two decade long growth deceleration in Brazil were an NPV of 7.5 trillion. Suppose we as development economists had managed to help the loss be half that large, that would be 3.75 trillion. That probably pays for all of the IMF for its entire history as a hugely positive NPV investment.
So the old “80 percent on national development” was some mix of specific growth projects but research/knowledge dissemination/persuasion about things that work to accelerate growth. Even if this is a highly uncertain endeavor (both on that it generates the right answers and that these are taken up) the upside is huge (and avoiding downsides is huge).
2) Against that, the shift from 80-20 to 20-80 on “spend more on programmatic interventions” as BAU efficacy is, at best a wash, at worst, worse because the average incremental gain of “spend more” on outcomes is plausibly quite near zero–not because it cannot work but because the systems that would make it work as a routine result of the way the system works are highly dysfunctional (and making systems functional is kind of what national development is all about).
3) The incremental improvement of RCTs on the spend within the 80 percent is small, I would say bounded between 0 percent and maybe 2 percent, meaning the average gain in outcomes per dollar spend is higher by 2 percent on average. This is both because (i) they have never had a plausible positive (policy or political) model of how “better evidence” would get to “better spend” and (ii) they have not really discovered many generalizable “silver bullet” like interventions. (And even in the rare instances when they claim this, like for “teaching at the right level,” the story (which I have been a part of) is much much more complex than “RCT evidence convinced people so they adopted it.”)
So, on net, the shift from 80-20 on national development, 80 of that on growth to 20-80 on “mitigation of the consequences of the lack of national development” plus RCTs is a huge loss. Basically it is a shift from low cost, low probability lottery tickets with huge upsides to low probability lottery tickets on small upsides and adding RCTs on top of that is a high cost lottery ticket on a very small probability of impact with low upside anyway (e.g. the Science publication on the giant six country impact evaluation of the Graduation approach to poverty that finds, meh, about cash transfer impact or worse, the Niger experiment in Nature magazine with a “psychosocial” component of a cash transfer program finds tiny impacts at very low cost so high “cost effectiveness” because both are small).
And, to be clear, I don’t need a very high probability of being able to give good advice that causes a growth acceleration (or avoid a loss) in a country, just better advice because the ticket is big. (Nor am I anywhere here making an argument that depends on right wing views that “free enterprise” is always the right answer to promoting growth nor does this depend on any antipathy to taxation to fund effective social programs),