Why I won’t sell “best buys”–and why you shouldn’t buy them

Let me start with an analogy.

The famous theater Carnegie Hall (note 1) is located in Manhattan on the east side of Seventh Avenue between 58th and 57th, hence between Central Park 59th and Times Square at 42nd. Suppose I observe someone walk down the east side of Seventh Avenue from Central Park (59th) to Times Square (42nd) and then I stop them and say: “If you are headed for Carnegie Hall you should turn around and walk back up the east side of Seventh Avenue to 57th Street.”

Can that statement of correct directions to Carnegie Hall be considered “advice” or a “recommendation” to the person I stopped? I think not. I think the prima facie and best interpretation of the person’s behavior is that they were not going to Carnegie Hall at the time. I would guess if I made this “recommendation” 10,000 times I would be very surprised if even once the response was: “Gee thanks mister, I was headed to Carnegie Hall but didn’t know how to get there and I am a little chagrined I walked right past it.”

I don’t think it properly counts as “advice” or a “recommendation” to give people conditional information: “if you want to achieve X, do Y” if there is no evidence they want to do X and, even more so, if the best available evidence from their observed behavior is that they don’t (currently) want to do X.

Now a personal story about “best buys” in education (or policy advice based on empirical estimates of cost effectiveness).

An early attempt to do “best buys” (or “smart buys”) was the Copenhagen Consensus which was an attempt to give expert and evidence informed recommendations as to how to best spend some given amount of money, like $25 billion, to promote human wellbeing. The process was, step 1, to choose a variety of potential domains in which there might be cost effective spending opportunities (e.g. education, health, corruption, water and sanitation) and hire an expert in each of those domains review the available evidence and then rank, with specific estimates, the most cost-effective “interventions” or “actions” and produce a “challenge” paper addressing the various challenges. Then, the Copenhagen Consensus process was that (step 2) the expert chapters would be read by two other experts who would provide comments and (step 3) the chapter authors and the discussants in each domain would present their findings and evidence to an expert panel. Step 4, the panel would then produce a “consensus” of the most cost effective ways to improve human wellbeing (note 2).

I was hired to write the education challenge paper. I wrote a long paper that had an explication of the simple producer theory of maximizing subject to constraints, a review of the literature of empirical estimates of the cost effectiveness, I then pointed out that if we were assuming “normative as positive”–that is, that our positive, descriptive, theory of the behavior of the producers of education–then this had (at least) four empirical implications and that all of those were, at least in many countries and instances rejected by very strong evidence.

In particular, my paper, drawing on my previous work with Deon Filmer “What education production functions really show” I pointed out that an empirical implication of normative producer theory with a multi-input production function was that the marginal gain in the producer’s objective function per dollar of spending on each input should be equalized. This implied that, if the evidence pointed to one particular input had a very, very high cost-effectiveness in producing a given output (say, some measure of learning gain per year of schooling) then this was prima facie evidence the producer choosing the input mix was maximizing that output. Therefore this evidence was evidence against “normative as positive”–that producers were actually maximizing an education output with choice of inputs–and therefore one could not–as it was not internally coherent–use that evidence to make “recommendations” on the assumption that the producer was maximizing. (The connection to the analogy is obvious, I cannot stop people who have walked right by Carnegie Hall and given then “recommendations” about how to get to Carnegie Hall and expect that to change their behavior as the best interpretation of their behavior is that they were not trying to get to Carnegie Hall at the time).

In my challenge paper I gave reasons why “recommendations” about how to improve education had to be based on a correct positive model of what education producers were actually doing and why and I made some suggestions of what such a model might look like. And in doing so, I explicitly explained why I was therefore not going to provide a list of “cost effective” (“best buy” or “smart buy”) actions or interventions in education, in spite of having presented empirical evidence that often showed there existed highly cost effective actions.

I submitted my paper. The organizers got back to me and pointed out I did not provide them with a list of “best buys” to be compared by the panel to other domains. I said yes, I was aware of that and that I thought my paper was an excellent guide of what might be done to improve education but that imagining there were discrete, compartmentalizable, actions that were “cost effective” and “recommending” those as ways for some outsider to spend money was not a way to improve education, one needed to think about education systems as systems and understand them.

The organizers then pointed out that the Terms of Reference of the output they were paying me X thousand dollars for (where I honestly don’t remember X but was on the order of $10,000) included that I provide them such a list and that I had already taken half of the payment up front. I acknowledged that, apologized for not having read and interpreted the TOR correctly, and offered to both not take the second payment on the contract but moreover, I would be happy to give the first half paid in advance back. I pointed out that it wasn’t just that I thought the evidence was too weak (not “rigorous” enough) I thought the idea of making recommendations based evidence and a positive model of the agents/actors to whom you were giving “recommendations” when the evidence was inconsistent with the positive model was intellectually incoherent, contradictory, and hence untenable. I would rather give up payments after I had done a massive amount of work rather than have my name associated with things that were so intellectually indefensible. I would not sell them “best buys.”

The final “challenge” paper I think remains a great short introduction into the economics of education.

In the end they relented as they were faced with the prospect of not having “education” as one of their considered domains, but, since I had not provided a list the expert panel list I don’t think (I did not pay that much attention to the overall process) had any education “interventions” in their top 10. The Copenhagen Consensus was repeated and in the next round, not surprisingly, they chose a different expert but, to their credit, I was asked to be a discussant and hence could articulate again my objections (although I went light on the “normative as positive” point).

NAPwrong_PolicyIrrelevance_Brookings_final Download

None of my 2004 objections to the “normative as positive” contradictions in using evidence from studies of cost-effectiveness of individual interventions (no matter how “rigorous” these estimates are) to make “recommendations” have been addressed.

Rather, what has happened often illustrates exactly my points. Three examples, one from Kenya, one from India and one from Nigeria.

Duflo, Dupas and Kremer ( 2015) did a RCT study estimating the impact of reducing class sizes in early grades in Kenya from very high levels and there was a control group and four treatment arms from two options (2 by 2); (a) either the teacher was hired on a regular civil service appointment or was hired on a contract and (b) the additional classroom was divided on student scores or not. The results were that the “business as usual” reduction in class size (civil service appointment–non-tracked classrooms) had a very small (not statistically different from zero impact) whereas the contract teacher reduced class sizes had impacts on producing learning both in tracked and untracked treatment arms.

In a JPAL table showing the “rigorous” evidence about cost effectiveness (on which things like “best buys” or “smart buys” are based) this appears as “contract teachers” being an infinitely cost effective intervention.

Of course in any normative producer theory the existence of an infinitely cost effective input should set off loud, klaxon volume, warning bells: “Oooga! Oooga!” This finding is, in and of itself, a rejection of the model that the producer is efficient (as it cannot be the case that the cost effectiveness of all inputs is being equalized if one of them is infinite). So I cannot maintain as even semi-plausible that my positive theory of this producer is that they are maximizing the measured outcome subject to budget constraints. But if that isn’t my positive model what is? And in a viable positive model of producer behavior what would be the reaction to the “recommendation” of contract teachers and what would be the outcome?

The reason I used the Kenyan example is that the Kenyan government decided to scale up the reduction in class size using contract teachers. A group of researchers did an RCT of the impact of this scaling. The Kenyan government did not have capability to scale the program nationwide so they had an NGO do parts of the country and the government do parts of the country. The researchers (Bold, Kimenyi, Mwabu, Ng’ang’a, and Sandefur 2018) found that in the government implemented scaling up there was zero impact on learning. So an infinitely cost-effective intervention when done by an NGO–a “best buy”–had zero impact when actually scaled by government and so was not at all a “best buy.”

Another example comes from the state of Madhya Pradesh India where the state adopted at scale a “school improvement plan” project that was based on the experience of doing a similar approach in the UK. A recent paper by Muralidharan and Singh 2020 reports that the project was implemented, in the narrow sense of compliance: schools did in fact prepare school improvement plans. But overall there was zero impact on learning (and not just “fail to reject zero” but in early results the estimated impact on learning was zero to three digits) and the zero impact was consistent with estimates that, other than doing the school improvement plan nothing else changed in the behavior of anyone: teachers, principals, supervisors. So whether “school improvement plans” were or were not a “best buy” in some other context, they had zero impact at scale in Madhya Pradesh.

A third example is from a (forthcoming–wiil update) paper by Masooda Bano (2021) looking at the implementation of School Based Management Committees (SBMC) in Nigeria. In a qualitative examination of why SBMC seem to have little or no impact in the Nigeria context she finds that those responsible for implementation really don’t believe in SBMC or want them to succeed but see the going through the motions of doing SBMC as a convenient form of isomorphism as the donors like it and therefore the pretense of SBMC keeps the donors complacent. So, whatever evidence there might be that when well designed and well implemented SBMC can be cost effective is irrelevant to the cost effectiveness of SBMC in practice in Nigeria.

My point is not just another illustration of the lack of “external validity” of empirical estimates of cost-effectiveness, it is deeper than that. It is the point of the intellectual incoherence of making “recommendations” based on positive model of producer behavior (that producers are attempting to maximize an outcome subject to constraints) that the empirical estimates themselves are part of the evidence that rejects this positive model.

Let me end with a different analogy of “best buys.”

Suppose I have just read that spinach and broccoli are “cost effective” foods in providing high nutritional content at low prices. I am in the grocery store and see a fellow shopper whose cart is loaded with food that is both bad for you and expensive (e.g. sugared breakfast cereals) and nothing really nutritious. I could then go up to her/him and make a “recommendation” and give him/her my empirical evidence grounded “smart buy” advice: “Hello stranger, you should buy some broccoli because it is a cost effective source of vitamins.” One can imagine many outcomes from this but perhaps the least plausible response is: “Gee thanks, fellow stranger, I will now buy some broccoli and integrate this cost effective source of vitamins into my regular food consumption habits.”

Take the analogy a step further and suppose I have an altruistic interest in the health of my fellow shopper and so I just buy and broccoli and spinach and put it into his/her shopping bags for free. Again, one can imagine many outcomes from this action of mine, but I would think the most probable is that some broccoli and spinach gets thrown away.

“Smart buys” is just dumb (worse than dumb, as believing things that are false is very very common and easy to do–most of us do it most of the time about most topics–but believing things that are internally contradictory (“I believe both A and not A”) takes some additional mental effort to stick to an attractive but dumb idea). As my story illustrates, I personally would give up substantial sums of money rather than have my name associated with this approach. I will not sell “best buys.” Given the poor track record of slogging “best buy” evidence that then does not deliver in implementation in context, you should be wary of buying it.

Note 1) The reason I use directions Carnegie Hall is because there is the old joke about it. One person stops another on the street and asks: “Do you know how I can get to Carnegie Hall?” the answer: “Practice, practice, practice.”

Note 2) This Copenhagen Consensus process was called such because it was instigated and led by Bjorn Lomborg (who was based out of some organization in Copenhagen) and the not so hidden agenda was to just inform people that on the available evidence about the likely distribution of possible consequences of climate change and the likely costs of avoiding those consequences one need not be a “climate change denialist” to acknowledge the world had lots and lots of current and future problems and action on climate change should be compared/contrasted to other possible uses of scarce resources. So might discredit the exercise for this reason but one could (a) none of the domain experts in their sector papers had or were asked to form any view about climate change and (b) one can bracket the climate change estimates from the expert panel and the ranking within and across domains is unaffected. So whether you think climate change was unfairly treated in this process vis a vis education or health or nutrition, each of those was treated equally and, as best as I could tell, there wasn’t any bias across the other domains.