~RCT

What’s the big idea? In the early 2000s a group emerged arguing that important improvements to development and hence to human well-being could be achieved through the wide spread use of independent impact evaluations of development programs and projects using randomized control trial methods (RCT) of choosing randomly “treatment” and “control” individuals. I have been arguing, since about that time, that this argument for RCT in IIE gets one small thing right (that it is hard to recover methodologically sound estimates of project/program causal impact with non-experimental methods) but all the big things wrong.

The Debate out RCTs is over: We won, They lost. This 2018 presentation at NYU (full video) summarizes my arguments. Here are versions of these slides (with rolling updates) in presentations at: NYU. Oxford (adapted to be about learning about education). Cambridge. Harvard.

A different tack into the question about RCTs is to ask about the “return on investment” from spending on research, which depends on a whole series of questions, like the magnitude of the gain from knowing the answer if adopted, the scope of conditions over which the answer can be reliably applied, the political likelihood that the policy/program/project about which evidence is being generated will be adopted at scale (or discontinued), the extent to which the policy/program/project can be implemented with fidelity. etc. My lecture at University of Washington (April 2017) (which Joan Gass helped me think through and prepare) took that approach.

Context Matters for Size: Why External Validity Claims and Development Practice Don’t Mix.  2014.  Journal of Globalization and Development, 4(2), pp 161-197. (with Justin Sandefur).

Learning from Experiments When Context Matters”, American Economic Review Papers and Proceedings (May 2015). (with Justin Sandefur).

“Its all about MeE: Using Structured Experiential Learning to Crawl the Design Space” (with Salimah Samji and Jeff Hammer)

“It Pays to be Ignorant:  A Simple Political Economy of Rigorous Program Evaluation.” Journal of Policy Reform, 2002.

We knew fire was hot. (November 2018).

Using “Random” Right: New Insights from IDinsight Team. 12/10/15. Center for Global Development.

Is Your Impact Evaluation Asking Questions That Matter? A Four Part Smell Test.  11/6/2014.  Center for Global Development.

An Homage to the Randomistas on the Occasion of the J-PAL 10th Anniversary: Development as a Faith-Based Activity. 3/10/14. Center for Global Development.

Rigorous Evidence Isn’t. 2/20/14. Building State Capacity, PDIA in Action.

Important in this debate is the extent to which the empirical results from a “rigorous” impact evaluation in one context can be used in another (which in practice combines “external” and “construct” validity). Eva Vivalt has a wonderful paper on this, showing there is strong evidence of either construct or external validity (as there are large variations in impact estimates within the papers themselves and across papers on the same class of project).

One way you can know that (by 2018) the debate was over was that the practice and advocacy rhetoric of those doing RCTs had changed completely. The basic idea that academics would do “independent” (independent of the implementing organization) “impact evaluation” (all the way to outcomes for beneficiaries) and this would generate “rigorous evidence” of “what works” that could be summarized in “systematic reviews” and this could be used, more or less, “as is” for guidance by practitioners is completely gone. IDInsight is a relatively new organization founded on the premise that in order for evaluations to have impact they have to be done with a partner organization (giving up on “independent”) and that evaluation and RCT tools can be used to learn about project design and steps from “inputs” to “activities” or “activities” to “outputs” (giving up on “impact” as the only evaluation) and they are doing “decision focused” versus “knowledge focused” evaluations (DFE vs KFE). Organizations like JPAL are also now taking very pragmatic positions acknowledging that there is a “generalizability puzzle” (at it would be petulant of me to insist it is not a “puzzle” at all, but rather more or less what everyone suspected would be the result of multiple impact evaluations in advance) which therefore completes abandons the claim there is anything special about “rigorous” evidence beyond the narrow context in which it was done and the particular element(s) of the design space it evaluated.