This blog is just a bit of background about the attached paper, which is still an early draft, circulated for comments.
There is a big debate within the field of development between those who believe that the promotion of “national development” (a four fold transformation of countries to a more productive economy, a more capable administration, a more responsive government and generally more equal treatment of all citizens) will lead to higher wellbeing and those who think that “national development”–and in particular, “economic growth” is overrated as a means to produce human well being. The alternative is to focus more directly on specific, physical, indicators of human wellbeing, with the idea that this “focus” on the “small” can lead somehow to “big” gains.
The attached paper examines indices and data on country level human wellbeing from Social Progress Imperative whose mission statement involves creating a Social Progress Index as part of their advocacy against the use of economic indicators:
We dream of a world in which people come first. A world where families are safe, healthy and free. Economic development is important, but strong economies alone do not guarantee strong societies. If people lack the most basic human necessities, the building blocks to improve their quality of life, a healthy environment and the opportunity to reach their full potential, a society is failing no matter what the economic numbers say.
The Social Progress Index is a new way to define the success of our societies. It is a comprehensive measure of real quality of life, independent of economic indicators.
In the paper I examine the empirical connections between the Social Progress Index, its components, subcomponents, and indicators, and three measures of national development: GDP per capita, state capability, and democracy. One basic finding is that for the Social Progress Index and its three major components the relationship between country measures of human wellbeing and national development is very, very, strong. Put another way, national development is both empirically necessary (there are no countries with high human wellbeing and low national development) and empirically sufficient (there are not countries with high national development and low human wellbeing).
The paper is much more interesting than just that as I explore the relationship between the various components of the Social Progress Index and the components of national development (e.g. how much does GDP per capita versus state capability matter for access to sanitation versus personal freedom, or indoor air pollution deaths than outdoor air pollution deaths). This leads to a set of what I argued are both interesting but ultimately intuitive findings.
{There is a new edited book about RCTs from Oxford University Press called Randomized Control Trials in the Field of Development. I have a chapter in it and other than that it is really excellent, with contributions from Agnus Deaton, James Heckman, Martin Ravallion and contributions about the rhetoric of RCTs, the ethics, and interested interviews from actual “policy makers” (from France’s development agency and from India) about their view of the value of RCTs. This book coming out has led me to go back and put into the public domain some things I wrote but did not post yet, like this (long) post about the weird methodological stance and approach the RCT crowds has adopted.}
Let me start with a discussion of a single paper that I believe illustrates an important methodological point that is, in many ways, at the core of many disputes about the value of RCTs.
The paper is “Bringing Education to Afghan Girls: A Randomized Control Trial of Village-based Schools” by Dana Burde and Leigh Linden. It published in one of the highest prestige journals in economics American Economic Journal: Applied Economics. I choose this paper because it is a paper with sound methods and clear findings and its authors are superb and experienced researchers. That is, nothing I am going to say is a critique of this paper or its authors. I chose a strong paper because the paper is just a vehicle for commentary on the more general intellectual stance and milieu and approach to “evidence” and hence the stronger the paper the clearer it makes the more general methodological point.
Here is the paper’s abstract:
We conduct a randomized evaluation of the effect of village-based schools on children’s academic performance using a sample of 31 villages and 1,490 children in rural northwestern Afghanistan. The program significantly increases enrollment and test scores among all children, but particularly for girls. Girls’ enrollment increases by 52 percentage points and their average test scores increase by 0.65 standard deviations. The effect is large enough that it eliminates the gender gap in enrollment and dramatically reduces differences in test scores. Boys’ enrollment increases by 35 percentage points, and average test scores increase by 0.40 standard deviations.
So. An RCT was done that provided 13 villages (?!) in one region of one country with “village-based” schools in year one (and to the other villages in year 2). The findings were that that reducing proximity to schools increases enrollment for boys and girls, increased enrollment leads to increased learning and the effect was differentially larger for girls.
All of us who have published papers in economics know how incredibly frustrating and difficult that process is. The top journals have very high rejection rates (on top of author’s self-selection on journal quality in submission decisions). Top journals reject most papers not because they are unsound or incorrect because they are “not of general interest” or not sufficiently “important.”
So the key question is: how is a paper based on the treatment of 13 villages in Northwestern Afghanistan sufficiently interesting and important to justify publication in a top journal when its findings confirm what everyone already believes (and has for a very long time).
Here are four things one has to feign ignorance of (or at least feign their irrelevance) in order for this paper to be the kind of interesting and important “contribution to knowledge” one expects in a top journal. Note that I am not saying the authors of this paper were in fact ignorant of these things, there were not because (a) the authors are intelligent and capable researchers with experience in education and (b) these are facts that pretty much everyone, even non-experts,knows. As I come back to below, one has to work one’s way into very special mindset to ignore the obvious, but that this mindset has, strangely, become popular.
First, one has to feign ignorance of the fact pretty much every government in the world has, for 50 years or more, based their core education policies on the presumption that (a) proximity matters for enrollment and attendance decisions and that (b) kids learn in school. This paper therefore confirms a belief that has been the foundation of schooling policy for every government in the world for decades and decades. To justify this paper showing “proximity matters” as “new” and “important” knowledge one has to use feigned ignorance to imagine that all governments might have been wrong all this time—but they weren’t but they now know what they already knew is some importantly different way.
Second, one has to feign ignorance of the fact that schooling in the developing world has expanded massively over the last 50 years, accompanied by a massive expansion of schools that dramatically increased proximity. Even a cursory look at the widely available Barro-Lee data on schooling (versions of which have been available for 25 years) shows that average schooling of the work force aged population in the developing world has increased massively (from 2.1 years in 1960 to 7.5 years in 2010). It is widely accepted that the increase in the proximity of schools facilitated this expansion of schooling. To justify this new paper as important, publishable, new knowledge one has to adopt the feigned ignorance view that: “yes, completed schooling has expanded massively in nearly every country in the world and yes, that happened while more and more schools were being built–but we can imagine this might have been an empirical coincidence with no causal connection at all.”
Third, one has to feign ignorance of a massive empirical literature, with literally hundreds (perhaps thousands of papers) showing an empirical association between enrollment and proximity. The overwhelming conclusion of this literature is that proximity matters. How does one justify that a paper that says “proximity matters” is a sufficiently new and interesting finding to justify publication in a top journal? One has to adopt the view that: “Yes, there is a massive empirical literature showing an empirical association between child enrollment and distance to school–but one can imagine that these might all be the result of reverse causation where schools happened to be built where children would have enrolled anyway.”
Fourth, one has to feign ignorance of the law of demand: if something is cheaper people will consume more of it (mostly, with some few exceptions). Proximity reduces travel time and hence the opportunity cost (and other “psychic” costs, like it being dangerous to travel) and hence reducing the distance to attend school makes schooling cheaper. Again, feigned ignorance allows them to ignore the entire previous literature on the demand for schooling. Based on the paper we have no idea whether the implicit price elasticity of demand for schooling was exactly what the previous literature suggested, or whether this paper was arguing their evidence was for a higher or lower impact than expected.
So, my reaction to an RCT demonstrating that children in a village in which a community (or village based) school was established were more likely to attend than those in villages where there was no school is: “Of course. But that cannot, in and of itself, be considered a contribution to knowledge as literally everyone involved in the economics of education—or, more broadly, in the domain of schooling policy—or more broadly, people will just common sense–has already believed that, for decades.”
(Parenthetically, one could make the argument that the paper agreed this was the general finding but that it was testing these propositions for Afghanistan, which might have been different. But this hardly suffices to explain publication in a top journal because: (a) suppose NW Afghanistan was different and proximity did not matter, then this would hardly be of “general interest” in a top economics journal and (b) they did not find Afghanistan was different (except maybe that proximity mattered more and differentially more for girls (but neither of these points are proved relative to other places).)
But the argument for this paper seems to be that because the paper reports on the results of an RCT the “knowledge” this paper adds is unique and special. People, in some sense, shouldn’t have known what they thought they knew. Phrased in (quasi) Bayesian terms this is an intellectual stance that people’s “priors”: (a) should have been that proximity did not matter, with mass including, or even centered around zero (or even concentrated on zero) and/or (b) their prior that “proximity matters” had a very large variance (perhaps diffuse over a large range).
I call this stance “feigned ignorance” because it is not actually a stance about what people’s priors were or what they should be in actual practice. It is a methodological stance that recommends that “we” academics should act as if our priors are centered on zero unless there exists a very special kind of evidence (a kind called “rigorous” evidence) and/or act as if our variance is very high in the absence of such evidence.
It is only in this “feigned ignorance” methodological mindset that a paper from 13 villages in NW Afghanistan finding proximity matters, kids learn in school, and proximity matters more for girls could be considered generally interesting and important. Only with a very particular stance about belief formation could something that everyone knew be considered new knowledge. This hinges on a belief that there are special methods that have special claims to produce knowledge that allow all previous evidence and knowledge to be ignored entirely.
The reader might already guess that I find this viewpoint wrong. Wrong as a way of forming actual practical beliefs. Wrong as a way of doing disciplinary science. And wrong in ways that have made the development economics research being produced less rather than more useful.
Let me compare the pre and post RCT approaches to the question: “Suppose I put village based schools into villages of NW Afghanistan what would I expect the impact on enrollment to be?” (The same applies to the questions of magnitude of learning and differential impact on girls so I will just focus on proximity).
The “pre” RCT approach is what I would call “(i) theory based, (ii) sign and bound of bias adjusted, (iii) quality weighted, (iv) relevance weighted, (v) informal Bayesian in mean and variance, review of the literature.”
The typical “pre-RCT” development economist would have (i) had some theory like that demand for schooling depended on choices (perhaps utility maximization, perhaps utility maximization with behavioral biases, perhaps satisficing) and that this meant schooling demand depended on income and effective prices, and that the effective price depended on distance as that determined travel costs (both actual time use, psychic and risk), (ii) would have started from existing OLS (and other) estimates of the relationship of enrollment to distance and then would have “sign and bound” adjusted the OLS estimates for the known biases (like the fact that schools may have been selectively placed and that would have some impact on estimates), (iii) would have quality weighted the studies for overall quality, precision, (iv) would have adjusted estimates for those more relevant to NW Afghanistan (e.g. maybe giving more weight to studies from Pakistan than from Argentina)—where features like the mode of travel and safety of travel and differential risks to girls would have been taken into account and (v) built that into an estimate of the “typical” estimate with a mean and a variance, acknowledging that the literature would produce substantial heterogeneity and hence the forecast estimates would have to base case, plus high and low.
Then, if one were building estimates of the expansion of enrollment due to expanding school availability (village based or other) would have likely “ground-truthed” that the resulting estimates were consistent with other evidence, like the time series evidence in expansion in schools and enrollments in both the place in which construction was going to be done and in other places (e.g. if I were estimating impact of having a school in a village versus not having it, I would compare enrollments in other single school villages with similar characteristics and if enrollment there were 60 percent and my model said 95 percent I might re-visit assumptions).
It is important to stress that the pre-RCT approach was not be some slavish use of OLS (or weaker, e.g. cross-tab) estimates. Everyone has known for a very, very long time that “correlation is not causation” and that OLS cannot resolve questions of causal identification and that standard OLS methods don’t identify structural parameters. The pre-RCT approach tried to “sign and bound” the bias in observational methods. What is the direction of the bias? How big is it likely to be?
If one were doing an ex ante cost-benefit analysis of a program of school construction one might know that if the enrollment gain is going to be larger than C (for “critical value”) percent then the project will pass a C-B test at a threshold rate of return. Supposed I do the standard way of coming up with estimates of enrollment gains and find that the expected value is Y percent, Y>C. The first question is whether the bias from observational data would lead X to be too high or too low (or unknown). If the bias leads Y to be too low than the truth, then for this decision it doesn’t matter. So “signing” the bias is important and most theories of why there is a bias lead to a sign of the bias. If the bias makes Y too high, the question is “how much to high?” Suppose Y is twice as high as C then the bias in Y could be 10 percent or 20 percent or even 50 percent and not change the decision. Efforts to “bound” the bias can potentially be helpful, even if they cannot be exact.
The pre-RCT prior distribution of the elasticity of enrollment with respect to distance would be non-zero but context specific. In this “pre-RCT” approach one study of placing non-formal (community or village) schools in 13 villages among 1500 children in NW Afghanistan for estimates of school expansion impact elsewhere (Haiti or Niger or Tanzania or Myanmar) would be “meh.” It would be one among literally hundreds of pieces of evidence about the shape of the enrollment-proximity relationship. Its usefulness of this study for altering priors about the distance elasticity in other places and times would be completely unknown. It is perfectly possible (perhaps even plausible because it has been shown to be true for other topics, like the impact of micro-credit) that observational estimates from relevant locations would produce better predictions that cleanly identified estimates from less relevant contexts.
How does one get to the situation in which a single small RCT is considered important and interesting?
A key was to create a climate of extreme skepticism about the possibility of “sign and bound.” One could make the claim that, although yes, there were many reasons to believe “proximity matters” (e.g. the law of demand) and that although yes, there were many estimates of proximity based on observational data, and that although yes, these estimates mostly showed a negative effect of distance on enrollment that the “true” impact might be zero. The true causal impact might be zero because there is no way to sign and bound the bias in observational estimates we can therefore assume that the bias is whatever we feel like believing it is.
This creates at least a methodological stance that: (a) one’s informal Bayesian prior “should be” (or at least “could be”) centered on zero (either tightly centered or diffuse) and that (b) that one’s Bayesian priors could only be affected by “rigorous” evidence.
This meant that, since very few RCTs had been done, any RCT on any topic was a “significant” contribution to the literature–because the previous literature (and reality, and theory) was completely dismissed.
The paper under discussion illustrates this intellectual play perfectly. The fourth paragraph of this paper is: “In this paper, we evaluation a simple intervention entirely focused on access to primary schools. The empirical challenge is the potential endogenous relationship between the school availability and household characteristics. [Footnote 1]. Governments, for example, may place schooling either in areas of high demand for education or in areas with low demand for education, in the hopes of encouraging higher participation levels. Either will bias simple cross-sectional estimates of the relationship between access and enrollment. Footnote 1 is: Existing research has demonstrated that improved access significantly increases school enrollment in other contexts. See, for example Duflo (2001) and Andrabi, Das and Khwaja (2013).
It is worth pausing and appreciating just how stunning this is. One can make a vague, hand-waving, argument that there might be bias—with no assertion as to whether there actually is bias or what the direction of the bias might be or what the magnitude of the bias might be—and “poof” the “review of the literature” about the effect of proximity is two (!?) papers in a footnote. Once one accepts the methodological stance of extreme skepticism about sign and bound then authors are under no obligation to demonstrate that there actually is bias or its direction or magnitude. Since all of the existing literature might be tainted one can conclude it is tainted and moreover, moreover tainted to such a degree it need not even be mentioned.
There are at least four huge problems with this “cannot sign and bound so we will feign ignorance” stance.
First, it is completely ridiculous as either pragmatism or science. If one were assembling evidence for any pragmatic purpose (say, doing a cost-benefit analysis of a proposed project) the assumption that in the absence of rigorous evidence we should ignore experience, common sense, existing associations, and accepted theory is a non-starter. But, even as a scientific stance this has zero credible justification and doesn’t seem to have really been thought through. That is, suppose I have 20 studies that use observational methods (call it “OLS”) to estimate a proximity effect and these have some substantial heterogeneity but are centered on that proximity increases enrollment. To, in the face of those studies, assert a prior centered on zero is an extremely weird assertion. This is an assertion that the bias in each of those studies is exactly what it would need to be in order to reconcile the OLS point estimate and a zero “true” causal impact. This is just not just a set of measure of zero, it is a weird set of measure zero. Why would be world be such that the “true” impact is centered on zero (and hence constant across countries) but the bias in OLS (which is also the result of a model of behavior) have heterogeneity, and of exactly the magnitude needed to reconcile existing estimates and zero?
A possible response that it is not so much that the prior is centered on zero but that the variance is completely diffuse so it is not “centered” anywhere. This claim is also just weird as it is asking someone to accept just wildly implausible values of the mean and variance of the OLS bias—to have a diffuse prior in the face of 20 existing studies is again to make a specific claim about the bounds on the OLS—one has to accept the bias to OLS is ridiculously huge (without any actual attempt of course to “bound” the bias). The only rationale for this “feigned ignorance” is that it justifies that producing a new “rigorous” estimate is a valuable endeavor.
Second, without engaging in “sign and bound” one cannot have any idea of where RCT (or other clean identification methods) would actually be useful. For instance, OLS (or simple) comparisons of test scores of private versus public school students nearly always find higher scores in private schools. In this case a “sign and bound” approach leads one to believe that the observed differences (especially without, but even with, controls for observables in OLS) are within the range that could be produced by a plausible degree of selection effects. This “sign and bound” of the private school effects depends on magnitudes of observed selection (e.g. differences in observables between private and public students), decomposition of variance of outcomes (e.g. that scores are highly correlated with student SES), etc. This is not a “pox on the house of all observational studies” or “feigned ignorance” approach, a focus on more precise estimate of the causal impact (LATE) actually emerges from “sign and bound” and a careful attention to the existing literature. In contrast the idea that the OLS estimates of the proximity effect have any, or any large, or any policy relevant, degree of bias has never had any empirical foundation at all (and, as the authors themselves say, it is not even clear what direction the bias would be).
Third, the “feigned ignorance” approach of making the previous literature (and experience, and common sense, and theory) completely disappear avoids the problem than there is no logically coherent way to add a new RCT finding into an existing literature. That is, suppose there had been 40 previous studies of the proximity effect and those had been normed to a common metric (e.g. percentage point drop in enrollment per additional kilometer of distance at a given distance) and that the mean of the existing studies as a 40 percentage point drop going from zero to 1 kilometer with a standard deviation across studies of 20 percentage points. Now along comes this RCT from 13 villages in NW Afghanistan. How should it affect our prior distribution of likely proximity impact for a “typical” (but specific) country/region that isn’t NW Afghanistan? Unless one can answer this question it is hard to see how one can claim that what happened in these 13 villages deserves publication in a top economics journal.
But the idea that one should center one’s prior of the impact of proximity on enrollment on the new “rigorous” evidence does not stand up to even the mildest scrutiny (as I have pointed out in previously published articles here and here). An OLS estimate from context c can be exactly decomposed into the “true” impact (LATE) and the bias. This means any statement about the way in which is should change my belief about the true impact in context c is necessarily a statement about how I should change my belief about the bias to OLS in context c. So, suppose I claim that the profession’s priors about the impact of proximity should collapse onto (and the same argument applies to the weaker “moves toward”) the new rigorous study that reports a consistent estimate for the true impact in 13 villages. This implies almost certainly that from among the previous 40 OLS studies there is one that reports a higher impact and one that reports a lower impact (though again, the argument does not depend on this) in which case call then context h and context l. This means the “special” role of the RCT study means I should assume that the OLS bias in context h is positive (because I am shifting my prior of the “true” value in h downwards, which implies I believe the OLS bias made the estimate for h too high) and that the OLS bias in context l is negative. The idea that I should “tighten” my priors of the previous 40 countries towards the RCT value is a claim that the OLS bias in each of those countries took 50 unique values. Again, this configuration of values of the OLS bias is a weird set of measure zero that has just no justification. The only way to get away with such a weird claim is to not make it explicit—but if one claims the RCT paper deserves “special” pride of place because it is “rigorous” then one is still making this very weird assertion even if one doesn’t know it or say it.
Fourth, the “feigned ignorance” approach to the importance of an RCT study also often ignores other concrete empirical attempts to “sign and bound” using non-RCT methods. For instance, a paper by Deon Filmer examining the proximity effect on enrollment using data from 21 countries (working paper in 2004, published in 2007) uses data based methods to “sign and bound” the bias from endogenous placement and cites five other instances of attempts to do so. Only one of those six papers was cited, that of Duflo. This makes it seems as if the previous literature has not been aware of and used other methods to address the potential bias in OLS and therefore makes the use of an RCT to do so seem much more special and important than it really is. Good science encompasses the previous literature, and even if a new technique is has some claims to being better in some ways the question of “by how much” does it make a difference relative to other attempts to address the potential biases needs to be addressed.
Let me conclude by re-iterating that this is not a critique of the authors of this paper nor of the paper itself in terms of what it reports. I am just using this paper as an example (and there are many others that could be used) for a general critique of the creation of an intellectual stance created by a stance of extreme skepticism that is not justified on any Bayesian or quasi Bayesian grounds as either useful pragmatically or for the advancement of science.
My 2017 book “Building State Capability” (with Matthew Andrews and Michael Woolcock)–which is available for free download) proposes the idea there is a “Big Stuck” in state capability. We showed that many of the available cross- national indicators of country level state capability showed four things (in Table 1):
Many countries still show very low levels of state capability (below 4 on a zero to ten scale),
Very few countries classified as “developing” countries have reached a high level of state capability—above 6.5–and most of those are small states so the total population in high capability states is very small.
There appears to be very little progress in state capability as the measured growth rates show most countries with negative growth.
Even for countries with positive growth, it is mostly slow and very few countries are on track to achieve high capability
Given the long lead times in producing a book, even though the book came out in 2017 the variety of country state capability indicators we used only extended to 2012 (now eight years ago) and mainly started in 1996 (when the World Bank’s Worldwide Governance Indicators (WGI) began). I decided to bring that table up to date to April of 2020 using the latest available indicators.
In the course of doing so, I put four different (sets of) indicators on a common 0 to 10 scale and produced graphs showing both the cross-national correlation of those indicators and their evolution over time (and their evolution over time). This note reports the results of this update of the key table with an indicator of state capability built from the Worldwide Governance Indicators as the simple average of their Rule of Law, Government Effectiveness, and Control of Corruption scores. I also report exactly the same table calculated form the Quality of Government indicator Quality of Government web-site which is itself originally from the International Country Risk Guide (ICRG) (all this explained below in “methods”).
The table is the book’s Table 1 update showing a country’s WGI State Capability level in 2018 in four categories (fragile, weak, moderate, and strong) and their measured rate of growth over the 1996 to 2018 period in four categories (collapsing, slow negative, slow positive, and rapid positive), where trend growth is the least squares growth rate . The three letter codes for each country are in each cell, sorted from the lowest to highest 2018 WGI SC level within each category so that, for instance, in the “moderate/slow negative growth” category Peru is the lowest, at just over 4 and Botswana is the highest, at just over 6.
To give some intuition in interpreting the rates of growth the scale is 0 to 10 so a growth rate of .05 implies it would take 20 years to improve (or decline) by 1 point or 200 years to improve by 10 points. For instance, in Latin America Haiti is roughly a 2, Peru roughly a 4 and Costa Rica roughly a 6. So even if Peru were (just) in the “rapid” growth category and had a growth rate of .05 it would take 40 years at progress of .05 to reach Costa Rica’s current level of state capability (improve two points). In Asia, the Philippines is 4.2 and Malaysia 6.4 so if the Philippines were just in rapid progress (.05) it would take 44 years to reach Malaysia’s current level. The category of “rapid” growth is not in fact super rapid.
Perhaps not surprisingly (as we have added only six years of data) the updated data confirm the main points of the “big stuck.”
First, 65/111 (only developing countries with population over 1 million are included) or 57 percent of all countries are still below a state capability level of 4, with 19 below 2.5 (fragile). While of course all “lines” in this space are arbitrary, countries below 4 have serious capability issues, just below 4 for instance are countries like: Egypt 3.81, Ethiopia 3.86, Zambia 3.82.
Second, only 6 countries are above the threshold for “strong” of 6.5. The level of 6.5 is roughly the lower end of the traditional OECD countries, for instance, Spain in 2018 is at 6.7 as is the Czech Republic and the high capability OECD countries are well above that (e.g. Denmark at 9.2, 2.5 points ahead of Spain). Of these six four have a population less than 10 million, Chile has 16 million and Korea is around 50 million so less than 100 million people total live in high capability developing countries.
Third, about 56 percent of countries (62/111) have recorded negative growth rates over the period 1996-2018.
Fourth, even of those with growth that is positive most have slow growth, with only 9 recording rapid growth (over .05), which implies that even the 40 countries with positive but slow progress recorded less than a 1 point gain on a 10 point scale over the 22 year period. Colombia, with a annual growth of .035 improved from 3.71 to 4.37, which, relative to most other countries is impressive progress, but at that pace it would take another 60 years to reach high capability.
Total by level of capability
Collapsing (g<-.05)
Slow negative (-.05<g<0)
Slow positive (0<g<.05)
Rapid positive (g>.05)
Percent of all developing countries
Strong capability (l>6.5)
Names
0
CHL
URY, KOR, SGP
ARE, HKG
Number
6
0
1
3
2
5.4%
Moderate capability (4<l<6.5)
Names
TTO, KWT, ZAF, PRI
PER, LSO, BRA, PHL, MNG, PAN, MAR, LKA, SEN, ARG, TUN, THA, IND, BHR, NAM, OMN, CRI, BWA
Table 1: Update of Table 1 from Building State Capability with data 1996-2018 from the World Governance Indicators. State Capability is the simple average of three WGI Indicators: Rule of Law, Bureaucratic Quality, and Control of Corruption on a zero (lowest country ever) to 10 (highest country ever) scale.
In order to address the issues with the fact that the WGI variable are not technically adequate for comparing overall trends over time as they are re-normed each year (though they do compare trends over time of each country relative to the average country in each year) we turn to a variable called “Quality of Government” (QOG) which is the simple average of three variables from the International Country Risk Group: Corruption, Bureaucracy Quality and Law and Order. This variable has data available going back to 1984 but we use only the data 1996 to 2018 (the latest available in May 2020). Reassuringly, this variable is very highly correlated with the WGI State Capability variable, the correlation in 2018 is .944 (Figure 1).
Figure 1: The cross-national correlation between the WGI State Capability and the ICRG QOG variable is very high
The QOG variable is meant to be comparable both across countries and over time and Figure 2 compares the growth rates 1998-2018 of WGI SC and the QOG variables. The correlation of the measured trends in the two variables are less correlated (it is nearly always true changes are less correlated than levels across countries) but is still correlated at .5 (only countries over 5 million population are included in this graph). Interestingly, the median measured annual growth in the QOG is substantially more negative than the trend for this set of countries (developing countries with more than 5 million people) at -.019 versus -.004 for WGI. This suggests the annual re-norming of the WGI, rather than masking a positive trend for developing countries may well be masking a negative overall trend.
Figure 2: There is a strong correlation between the measured trends in the WGI State Capability variable and the ICRG QOG variable—and the median growth rate is lower for QOG than WGI SC
Table 2: The “Big Stuck” table using QOG (Quality of Government) instead of WGI State Capability
The four principal “big stuck” findings emerge using the QOG data for the 90 developing countries with population over 1 million for which there is both WGI and QOG data.
First, 43/90 (48.8 percent) of these countries have Weak or Fragile Capability. This is lower than WGI SC mainly because QOG had more countries in the “moderate capability, slow negative growth” cell (24 versus 18 even though QOG has 20 less countries total) but, as can be seen in figure 1 many of these countries are just above the threshold of 4.
Second, only 5 countries are above the threshold for “strong” of 6.5.
Third, given that the average recorded growth rate in QOG over the 1996 to 2018 period is lower, many more countries show negative trends. 70 of 90 (77.8 percent) countries have negative trends. This data suggests that the re-normalization each year of the WGI data may prevent it from showing an overall downward trend in state capability.
Fourth, there are only 5 countries with rapid growth, two with high capability already (Indonesia and United Arab Republic (ARE)) and three recovering but still at low levels (Iraq, Niger and Guinea-Bissau).
I also examined the data from the Fund For Peace using their ranking on “Public Services” and the data from the Bertelsmann Transformation Index. In doing so, I create a large number of graphs that display the data for each country and display the cross-national correlations, both in levels and trends, among the six WGI variables plus the created State Capability variable, and the other indicators. All of these are available in the links provided.
Figure 3 and Figure 4 show the evolution of the QOG rating for the developing country population weighted average and for the largest eight countries by population: China, India, Indonesia and Brazil in Figure 3 and Pakistan, Nigeria, Bangladesh, and Mexico in Figure 4. This is to allow the reader to ‘groundtruth’ the ratings against countries they know something about. For instance, the QOG regards India (green dotted line) as well above the average and having had quite stable ratings over this entire period. Indonesia is seen by the QOG ranking to have experienced a substantial deterioration from 1996 to a nadir in 2004, after which a recovery to near 1996 levels by 2009, followed by some deterioration, followed by a large jump in 2018. TheQOG regards Brazil (green dash-dotted line) as having experienced a considerable deterioration over the period since 1996 (with some volatility). Having lived and worked in India off and on since 1991 I think the “stagnation” of state capability would be regarded as a generous assessment. Again, having lived in Indonesia from 1998 to 2000 and worked there off and on since, this general pattern of weakening capability from the “top down” authoritarian capability of the Soeharto era but re-built to 2010 seems plausible. I have no idea what the QOG thinks happened in 2019, I suspect this is a mistake but is in the original data. Brazil I know too little about to have a view.
Figure 3: Evolution of the QOG measure 1996-2018, developing country population weight average and four largest countries
Figure 4: Evolution of the QOG measure 1996-2018, developing country population weight average and next four largest countries
Appendix I: Data and methods
There are four primary sources of data, here I describe each and the transformations made to each variable. On the web site both the raw data and the programs that transform the data will be posted.
First, to make the data annual I interpolated between the early years when the data are only available every other year (e.g. I filled in 1997 as the average of 1996 and 1998).
Second, I transformed each of the six WGI variables to a scale of 0 to 10 by taking the worst ever recorded value for any country any year and the best for any country any year and re-scaling zero to the worst and 10 to the best.
Third, I created a variable called “State Capability” as the average of Rule of Law, Government Effectiveness, and Control of Corruption. I feel the other three are conceptually distinct from capability: Voice and Accountability, Political Stability, and Regulatory Quality. Note that this variable is not exactly scaled 0 to 10 no country was 0 or 10 in all three variables in the same year.
Quality of Government
I downloaded this variable from the Quality of Government Institute web site (https://qog.pol.gu.se/) which provides cross-national, over time data on hundreds and hundreds of variables. One of those is called (confusingly, if you ask me) Quality of Government and is the average from the International Country Risk Guide’s ratings since 1984 of Law and Order, Bureaucracy Quality, and Corruption. (This is because the original data from ICRG is (or has been) proprietary and expensive so only the average could be shared).
I only use data from 1996 to 2018 as (a) this is the only period for which the WGI overlaps and (b) there are some questions about the long-run comparability.
This variable is scaled 0 to 1 and I rescale 0 to 10 in the same way as above.
Fund for Peace: Public Services
The Fund for Peace produces a Fragile States Index based on 12 indicators (https://fragilestatesindex.org/data/). One of those indicators is “Public Services” and I used that indicator, which is described as:
The Public Services Indicator refers to the presence of basic state functions that serve the people. On the one hand, this may include the provision of essential services, such as health, education, water and sanitation, transport infrastructure, electricity and power, and internet and connectivity. On the other hand, it may include the state’s ability to protect its citizens, such as from terrorism and violence, through perceived effective policing. Further, even where basic state functions and services are provided, the Indicator further considers to whom – whether the state narrowly serves the ruling elites, such as security agencies, presidential staff, the central bank, or the diplomatic service, while failing to provide comparable levels of service to the general populace – such as rural versus urban populations. The Indicator also considers the level and maintenance of general infrastructure to the extent that its absence would negatively affect the country’s actual or potential development.
This data is available from 2006 to 2019. I re-normed the data to the 0 to 10 scale from the raw data (in which high numbers are bad).
Bertlesmann Transformation Index
This index of country performance has three components: political transformation, economic transformation, and governance index. The governance index has five elements: the level of difficulty, steering capability, resource efficiency, consensus building, and international cooperation. As indicators of state capability I only focus on two of those: steering capability and resource efficiency, each of which has three sub-components. The sub-components of steering capability are: prioritization, implementation, and policy learning and of resource efficiency are: efficient use of assets, policy coordination, and anti-corruption.
The indicator I use in the graphs is the average of steering capability and resource efficiency, each of which is normed to 0 to 10 and then added (so again the added indicator is not again re-normed so it is not minimum of 0 and maximum of 10).
As can be seen in the graphs below, these four indicators are reasonably highly correlated across countries in levels—but some higher (e.g. WGI: SC and ICRG: QOG at .944) and some lower (e.g. FSI: PS and BTI: SC+RE at only .584). The correlations are even lower in trends (again these graphs exist) over time and the BTI and FSI only exist since 2006 so have a relatively short time series.
The BTI shows a much higher trend rate of improvement than the other indicators, for reasons I do not understand and therefore am not sure I trust but would produce a more favorable view of big stuck than the others.
All of the transformations from raw data, calculations and the production of the graphs was done in GAUSS. In this folder the raw (excel) versions of the downloaded data and all of the programs are available and should allow replication.
Appendix II: Descriptive graphs produced
In the course of producing this update I produced descriptive graphs of seven different types. Here I describe each type of grand and show a single example (some graphs were produced for each of 177 countries so they cannot all fit in a single document). All of the graphs will be made available on-line on the Building State Capability web site.
Graphs showing the time series evolution of each of the six WGI variables
These graphs show the data over time for each of the six WGI variables, with an overall trend line, the trend of the last 10 years, and years to a high level (>6.5) at either the overall or the recent trend line. All graphs have a vertical axis from 0 to 10 so that levels and trends can be visually compared across the variables and across countries. The link has a folder with a downloadable graph for each of the 177 WGI countries, like the one shown below for Tunisia.
Graphs showing the cross-national correlations of the six WGI variables and the derived State Capability variable
The second type has seven graphs that show the cross-national correlation in 2018 between one of the WGI variables and all other six and reports the correlation between the variables. Shown below is the example graph showing the correlation between Voice and Accountability, the other five WGI variables and State Capability (which is the average of Rule of Law, Control of Corruption and Government Effectiveness and hence has a high correlation with those three measures by construction). These seven graphs are available here.
Graphs of all seven WGI variables showing the association across variables of the trends
This set of graphs show the cross-national association of the trends in each of the WGI variables against each of the six other WGI variables. Each point is the least squares estimated growth rate for each variable for each country. Shown here is the graph for Rule of Law with all six other WGI variables and shows the extent to which the trends are associated across the variables. These seven graphs (each with six panels) are available here.
Showing the level and trend in the seven WGI variables across countries
These graphs show, for each of the seven WGI variables a cross-national graphs showing the level of the indicator on the horizontal axis (either in 1996 or in 2018) and the trend on the vertical axis. If the starting point is 1996 this potentially shows “convergence” if countries that start with low levels have a tendency to more rapid growth). Since there are many countries the graphs can get very cluttered so some of the graphs show just countries over a threshold population (like 1 million, 5 million, 10 million) for ease of identifying countries (the graphs shown is only for countries over 10 million and so excludes some interesting countries that had massive governance changes over this period like Rwanda and Georgia). These graphs have lines dividing the levels and growth rates into exactly the thresholds used in Table 1 and hence are the exact graphic counter-part of the table (when 2018 is the endpoint). Shown is the graph for state capability (which is exactly Table 1 above except that the Table is limited to developing countries and that the axis is flipped (e.g. in Table 1 levels are vertical and growth rates horizontal). These seven graphs (for each population threshold), each with two panels, are available here.
Trends in four different indicators of state capability
As described above in “methods” I explored four different sets of indicators of aspects of the broad concept of “state capability”: (i) the World Governance Indicators which have six indicators, (ii) the Quality of Government indicator which is built from the ICRG ranking of three indicators, (iii) the Fund for Peace rankings of state fragility, which have ten variables, I focus on the “public services” indicator, (iv) the Bertlesmann Transformation Index which has a number of indicators and sub-indicators and I focus on the ranking of “Steering Capability” and “Resource Efficiency.”
The third set of graphs show the evolution over time for each of these four indicators (for whichever are available) for each of the 177 countries with WGI data. Since the data is available for different periods the graphs show the trends for the 1996-2006 period (for just WGI and QOG), the 2006 until end of data (between 2018 and 2020) period for all four variables and the overall period (for just WGI and QOG). The country example (of the 177 available) is Mexico. These 177 graphs are available here.
Cross section of all four different sources of state capability
These graphs show, for each indicator, the cross national association another of the three indicators and the correlation.
The example shown here is the correlation of the BTI average of Steering Capability and Resource Efficiency with the FSI indicator of public services. These 12 graphs are available here.
The association of the trends in the four indicators of state capability
These graphs show the cross-national association for any one of the four variables between the trend (over the available time period) of that variable and the other three variables over that same period. This also shows the median of the trend growth in the two indicators.
The example shown is the trends of the FSI Public Services indicator from 2006 to 2019 with the WGI State Capability variable from 2006 to 2018 (this is for countries with over 5 million in population). These graphs (both for all countries and those over 5 million) are available here.
When I was in Australia in January 2020 (before 2020 travel was stopped) I spoke at the Development Policy Centre at ANU Annual Australasian Aid Conference organized (in part) by my friend Stephen Howes.
My topic, which is a subject I am grappling with and trying to come up with a better way of modeling and expressing, is “pre-mature load bearing.” This is probably the worst named of the concepts from my 2017 book Building State Capability with Matthew Andrews and Michael Woolcock (the notion of “isomorphic mimicry”–which is not original to us–is much catchier). Pre-mature load bearing is the idea that if one puts too much weight too soon on a not fully constructed bridge, or a not fully cured broken bone–or on a weak organization–the result in a collapse. We propose the idea of pre-mature load bearing as one way of explaining why, even after decades and decades of development efforts to “build institutions” or “create good governance” or “improve public sector management” the organizations of the state in many many countries remain weak.
In this presentation I discuss the possibility (with some mild evidence) that it is precisely the pressure to adopt “best practice” laws, regulations, policies, and programs in countries with weak public sector organizations (and weak background systems in which those organizations live) that actually leads to terrible outcomes. The “theory of change” often (implicitly) adopted is that “better laws drive better practices.” This (again implicitly) assumes a dynamic between laws and practices that is uniform–so small improvements in laws produce small improvements in practices and big improvements in laws produce big improvements in practices.
However, we are all aware of many everyday phenomena in which dynamics are non-linear (though of course we don’t explicitly think of them that way). For instance, we all know that if we want to move a steel object with a magnet we have to keep the magnet close. Small movements in the magnet produce small movements in the object but big suddent movements in the magnet won’t move the object. The force of magnetism declines with distance and non-linearly.
The example I use in the talk (which is in the slides) is a rubber band. If you put a rubber band around your fingers connecting your left hand and right hand then if you move your right hand it creates a force on your left hand. As your right hand gets further and further away the pressure on your left had to move gets larger and larger. But if you move your right hand too far from your left the rubber band snaps and there is no more pressure on your left.
My hypothesis is that it is possible that “good law destroys the rule of law.” That is, organizations have a set of capabilities that are embedded in their actual practices. Many countries are under pressure to and are otherwise politically attracted to making big pronouncements that adopt “best practice” laws and regulations (in taxation, in environmental regulation, in land use permits, in basic education). They assume that they can achieve Denmark’s (or Canada or Australia or etc.) practices by adopting Denmark’s de jure policies (laws, regulations, etc.). However, it is possible if the practices are too far from the existing organizational capability these new laws create too much pressure on the agents of the organization to deviate from the de jure. If this is the case then the de jure and de facto diverge and the left hand might know what the right hand is doing, but doesn’t really care. Once one gets into these negative dynamics of “pre-mature load bearing” then the “good” (but not achievable) laws actually make organizational practices worse and create pressures that prevent incremental improvements in practices.
This of course feeds the idea of “strategic incrementalism” that the way one gets to good laws is by first getting to good practices–and then enshrining these already accepted and (mostly) followed de facto “good enough” practices into the de jure policies.
Of course it is hard to get good evidence on the gap between the law and practice (as making this gap invisible is a very large part of what organizations do: bureaucracies often exist to fail without blame. But some co-authors and I can show some empirical results in a paper that suggest that countries with weak organizations for enforcement that make de jure regulations (about building permits) stronger actually end up with weaker de facto compliance.
Here are the slides. I also did a podcast Good Will Hunters with Rachel Mason Nunn, I was episode 67, so this is a pretty long-running series. Also, this is a variant on a talk I have earlier at Center for International Development’s conference of which there is a video. Also a version of this talk I have at a conference “From Politics to Power” at Manchester University. The advantage of that video is that explains the pictures of the cat named Duke, which is a funny, and instructive, story.
John Halstead and Hauke Hillebrandt have an interesting new paper posted on the Effective Altruism Forum. To be fair, it draws a fair bit on some of my research and so I am likely biased. But it does raise and argue some important points about effective altruism.
The “new philanthropy”–by which I mean mainly the philanthropy of the new fortunes, mainly in tech–has generated a lot of interesting thought and debates. The general idea of “effective altruism” with its focus on getting beyond just “warm glow” has a lot to be said for it. As a professional economist I am a big fan of the challenge to prove that the proposed “charitable” projects actually are better than cash (e.g. Blattman and Niehaus 2014). This is a hard standard as the overhead costs of delivering projects (particularly if they were costed at opportunity cost of the work done) are often very high and the incremental benefits over cash often low (or non-existent).
However, this debate about “which type of intervention/project/program is the most cost effective” is limited but one hopes not limiting. These types of interventions are still mostly “linear” in costs and benefits. Suppose giving a specific girl (targeted perhaps by a certain age, in a certain region (perhaps distance from school), a certain household income/socio-economic status) is a cost effective way of raising the likelihood the girl attends school (as suggested in Muralidharan and Prakash 2017 based on program of that type in Bihar India). Beyond a certain scale (and one reason why cash is often cost effective as programs have large overheads on small numbers of beneficiaries) this impact is (roughly) linear in costs–each girl getting a new bicycle requires buying a new bicycle–and (roughly) linear in benefits–each girl benefits the same amount (if anything, one would expect if targeting were effective the marginal benefits would decline).
However, one thing the creators of the new fortunes understand is non-linearity of costs, and maybe benefits. That is, all producers of software know that the marginal cost of an additional user is next to zero. Moreover, it could be that the value to an additional user of using a given product might be increasing in the number of users. If that can be turned into higher marginal revenue per additional user then one has (the possibility of) an enormous fortune as margins (marginal revenue over marginal cost) increase with number of users. These non-linear economics create “winner take all” dynamics in sub-segments: something like 90 percent of all searches are done on Google, across its platforms Facebook’s four platforms report 6 billion users (not all different individuals as some use multiple platforms). The economics of “infrastructure” (what I would take should define the term) are often “club goods” elements of delivering a service which are non-rival (until congestion externalities set in) but excludable.
Far and away the most important “club good” in the world today is the national development of the country you live in. What I mean by “national development” is the progress in the four-fold transformation of a country in having high economic productivity, a responsive state, a capable administration (of both state and non-state organizations), and equality of treatment of citizens. (Practically) all indicators of human well-being (income, poverty, health (infant mortality, malnutrition), schooling and education, safety) are very strongly predicted by national development. For instance, a country’s level of headcount consumption/income poverty is completely predicted by the consumption of the median (typical) household.
This is why it is kind of puzzling that people’s whose private fortunes are generated by non-linearity would spend so much time debating which was the best (cost-effective) linear way to give away their private fortunes: cash versus non-cash? Bicycles versus conditional cash transfers? Business training versus loans/financing? Giving away shoes? (just kidding, that was obviously dumb as altruism, but maybe super smart as corporate marketing).
Since human well-being is strongly determined by one’s access to the excludable club good of national development there are two obvious ways to promote human well-being: (a) reduce the numbers excluded from moving to good “club good” places (reduce barrier to labor/personal mobility) or (b) improve the quality of the club goods people who live in poor countries have access to without moving by improving national development.
There are four arguments against a focus by philanthropists on national development:
No one knows how to, or can know how to, promote national development, it just is what it is due to deep determinants.
While someone might know how to promote national development the instruments available to us, as private philanthropists, cannot be used effectively to promote those things that promote national development.
While there might be ways for us, as philanthropists, to effectively promote national development we cannot do so in ways in which the positive benefits of those actions can be reliably attributedto us so we cannot get credit for what we did.
Engaging in national development versus linear privately organized transfers might bring higher benefits, but it also brings much higher reputational (and other) risks to us of being engaged with national governments (or other actors).
All of these are arguments to explore with analysis and evidence.
However, one important point is that the argument against investing in promoting actions that would facilitate higher (and more stable) rates of broad based growth in poor countries that “we don’t know what to do” is insufficient. After all, the obvious response to “we don’t know what to do” is to fund and engage in research and learning to learn what to do. The wildly popular agenda for better causal identification in impact evaluation is premised on the idea “we don’t know currently what to do” (otherwise, why spend millions on research?). So the argument against philanthropic engagement in promoting economic productivity (one aspect of national development) has to be not just what “we don’t know what to do now” but also that “there is no set of learning activities or research that could improve our knowledge of what to do that passes a cost effectiveness test (of gains in the value of useful knowledge versus expense)”. That is a possible argument, but much, much tougher argument to make.
That is, I could argue that in the allocation of funding for physics research no new research into technologies for faster than speed of light travel of human beings should be funded because our best available physics theories say this it is impossible. I could argue against research into changing the rest mass of an electron on the basis that, in our best available theories, this is impossible and that it is a universal constant. Neither of those is true of economic growth. It is certainly not constant over time for countries–we see massive accelerations and decelerations of economic growth. And, it is not the case that our best available theories say it is impossible to influence growth–and we have seen leaders and elites of countries change strategies and accelerate growth, and change strategies and induce economic disasters.
A perhaps useful analogy is a decision a philanthropist concerned about the well-being of African-Americans in the USA would have had in the early 1960s. The United Negro College Fund was founded in 1994 and gave scholarships to individuals and supported funding of historically black colleges and universities. Suppose (and I don’t doubt it) that this linear funding opportunity was cost effective. This would be a very attractive investment. Against that, there was a new organization, founded in 1957, the Southern Christian Leadership Council that was engaged in advocacy around civil rights. There are lots and lots of reasons why support to the SCLC was risky–maybe it is impossible to change civil rights legislation in America over any reasonable time horizon, maybe this particular organization doesn’t have a correct “theory of change”, maybe funding this organization will expose me to reputation and other risks from their strategy and tactics. Moreover, there was no way to bring reliable “scientific” evidence to the UNCF vs SCLC decision. I think (and this isn’t my academic area) that ex post having been an early funder of the SCLC would be the equivalent of being early venture capital into Google or Facebook..but way better of course.
I think there should be a strong presumption that the allocation of a large philanthropic portfolio in the development space should not be 100 percent proven cost effective linear interventions and to identifying and proving the effectiveness of new innovative linear interventions. Just as with financial portfolio allocations the right allocation depends on the magnitude, the horizon, the individual’s risk tolerance, but it seems to be that a large share should be devoted to non-linear, potentially transformative agendas, in national development (including economic growth). At the very least, this debate is interesting and important.
My friend Jeff Hammer has helped create something both very cool and very useful. He has created a set of video recordings of 100 households in India that interactively show 360 degree views of the exterior, interior, kitchen, water source and other features of the home and household. Each of the 100 homes was chosen because they were, based on the usual kind of household survey, were at a given percentile of the income distribution. So one can see visually the poor and the rich of India, the world’s largest country, arrayed.
I sometimes say that, for all its hassles and expense, I have to travel because my imagination is both too powerful and not powerful enough. My imagination is too powerful as it can conjure up powerful and persuasive images based on words I use like: poor, rich, middle class. My imagination is not powerful enough to actually get it right without seeing it. This site gives you sight without having to travel.
There are two points that this site makes very powerful about the use of words like “rich” and “poor.”
The first is Dani Rodrik’s point from many years ago that “the poor of the rich” have much higher incomes than the “rich of the poor” and asks people whether they would rather be the 90th percentile in a poor country or the 10th percentile in a rich country. When people talk about the “rich” of India they often envision Mukesh Ambani’s house in Mumbai. And yes, that is one very rich person.
But he 100th percentile, not the 99th percentile or 95th percentile or 90th percentile “rich.” Here are pictures taken from the 100 Homes sites of the exterior, water source and kitchen of a household of Allahabad Uttar Pradesh who is “rich” in the sense of being in the 90th percentile of the income distribution in India.
The “statistical” rich in India are very poor by rich country standards. This 90th percentile household has a measured consumption per person per day of $8.47 (in purchasing power adjusted dollars), which is higher than the highest poverty line the World Bank reports of P$5.50. But the US “guideline” poverty line for a family of 4 in 2019 was $17.63 per day. The “rich” Indian household would have to have income twice as high to not be poor in the USA.
So next time someone says that something (like, say, economic growth) benefited “the rich” in India I hope this site can inform your imagination about whether that meant this 90th percentile household or the Ambanis.
The second important point the 100 Homes site makes, powerfully and visually, is that trying to divide Indian households into “poor” and “not poor” is making distinctions among households that are, for all practical purposes, indistinguishable. In the “lessons” section of the site it shows 3 pairs of households and asks you to guess which is richer (on the standard measure of spending per person per day). I got two of three wrong.
The current poverty line in India says 24 percent of Indian households are “poor.” Here are pictures of two homes (exterior and kitchen), one is a household that is poor (14th percentile) and one that is “not poor” (30th percentile). Which is which?
The point is not whether you happen to get it right or wrong, the point is that it is hard. It is hard because the distinction between “poor” and “non-poor” is trying to make a distinction between households that are really, for all intents and purposes–economically, socially and politically–the same.
The “middle class” periodically gets attention. Here is the household at the 51st percentile in Jodhpur Rajasthan and their kitchen. Signs of incipient prosperity: a stone house, a cook stove, a daughter in school. But hardly what “middle class” might mean in the UK (where Michael McIntyre associates “middle class” with shopping at Waitrose) or the USA (where one thinks suburban house, two cars, two kids, cat, dog and all that).
The 100 Homes project is a wonderful resource. It allows us to go beyond the “X$ a day” statistics and get a glance into what poor, middle class, and rich mean in concrete and visible terms in India today.
The debate at the Australian Aid conference is now available as a podcast so you can listen. But I wanted to point out the second reason debating about RCTs is fun: the examples that advocate use of RCTs are often self-refuting as to their importance.
For instance, in the debate the main proponent of the idea that RCTs were an important innovation in development economics used the example of TOMS shoes. He said (roughly, and I am just trying to summarize what he said, I have no first hand knowledge of this topic) that the owner or CEO of TOMS shoes had been in a developing country and seen kids without shoes and so had decided to donate a pair of shoes for every pair sold. Then, after some time, they had done a RCT of the impact. He said that from the RCT they had learned that the shoe donations had little or no impact. Moreover he (the RCT proponent) said the RCT taught them they were giving the wrong kind of shoe and, if they were going to give shoes, they should give sneakers, not loafers. This, he said was a good example of the way in which RCTs contribute to development and development economics.
The table below shows, from standard World Bank sources, the headcount poverty rates at PPP$ 5.5 and 1.9 per day and the table is sorted by the absolute number of people poor at the P$5.5 poverty line among the world’s 30 most populated countries. I (and a number of other economists) argue for much higher poverty lines (“who is not poor“) for measuring global poverty, more like P$10 or P$15 so let’s take P$5.5 as a “split the difference” between low bar and high bar poverty). (Even if we take the penurious “dollar a day” line (updated for inflation) it is roughly the same set of countries with just more weight to the poorest large countries).
These countries cumulatively have 2.7 billion P$5.5 a day poor people and hence, if one were going to address global poverty one would have to do so by addressing it in these countries.
Conversely, if your development issue/tool/learning is not addressing the important development concerns of these countries it is not really an important item on the development agenda.
Now, just imagine you had an opportunity to make a presentation to the leadership of any of these developing countries (where leadership could be political, intellectual, civil society). And here is your pitch: “Development economics has an important new tool, RCTs, and with that tool a good example of what we have learned about development is that giving away free shoes doesn’t really work, and secondarily, if you do give free shoes anyway, give the kinds of shoes that kids really want to wear.”
I can imagine two responses.
One. “Really? That is what you have to offer? This is what you think development is about? You have come to our country (India/Indonesia/Nigeria/Ethiopia/etc.) and what you think we care about, our vision of the future of our nation and people, our goals, dreams and ideals, our vision of national development, hinges on the effectiveness of the charity of an American shoe company? Is there any way you could have been more condescending to us, the leadership of this country, than taking up our time talking about giving away shoes?”
Two, suspicions about your own judgment, in two senses. First, how did you personally come to be spending your time talking about something so trivial? Second, in what way did you think economics or development economics “learned” from this RCT? The standard economics (from theory and long empirics of the impact of in-kind transfers) would have expected the impact of a pair of free shoes to be roughly the re-sale value, which, even for a poor person, is likely small as even for a household at the P$1.9 per day poverty line annual income is P$2777 so a pair of shoes in a small increment so what is new in a Bayesian sense? Second, standard economics (from theory and empirics) would have suggested that if one were going to give something in-kind as charity choosing something the household wanted would be best (if only to reduce the losses from transactions costs of re-sale).
Country Name
Total population
Headcount
People Poor
P$1.9 per day rank
P$5.5 per day
P$1.9 per day
P$5.5 per day
P$1.9 per day
Total
5093
2764.4
506.7
India
1311
82.3%
13.4%
1078.6
176.0
1
China
1376
27.2%
0.7%
374.4
10.0
11
Indonesia
257
67.0%
7.2%
172.1
18.5
7
Nigeria
182
90.0%
47.8%
163.8
86.9
2
Pakistan
188
78.0%
5.2%
146.6
9.8
12
Bangladesh
160
84.8%
15.2%
135.8
24.2
5
Ethiopia
99
85.5%
27.0%
84.6
26.7
4
Congo, Democratic Republic of
77
97.1%
72.3%
74.8
55.7
3
Philippines
100
64.2%
8.3%
64.2
8.3
13
Egypt, Arab Republic of
91
61.9%
1.3%
56.3
1.2
20
Tanzania
53
91.0%
40.7%
48.2
21.6
6
Mexico
127
37.9%
3.3%
48.1
4.2
15
Brazil
207
19.4%
3.4%
40.1
7.0
14
Kenya
46
86.6%
37.3%
39.8
17.2
8
Myanmar
53
67.6%
6.4%
35.8
3.4
16
Uganda
39
87.4%
39.2%
34.1
15.3
9
South Africa
54
57.1%
18.9%
30.8
10.2
10
Vietnam
93
31.6%
2.3%
29.3
2.1
19
Sudan
40
59.1%
7.7%
23.6
3.1
17
Iraq
36
56.0%
2.2%
20.2
0.8
22
Colombia
48
28.7%
4.5%
13.8
2.2
18
Morocco
34
30.0%
0.9%
10.2
0.3
23
Iran, Islamic Republic of
79
11.8%
0.4%
9.4
0.3
24
Turkey
78
11.5%
0.3%
9.0
0.2
26
Peru
31
24.3%
3.6%
7.5
1.1
21
Thailand
67
7.1%
0.0%
4.8
0.0
29
Argentina
43
8.1%
0.6%
3.5
0.3
25
Ukraine
44
7.8%
0.1%
3.4
0.1
28
Malaysia
30
2.9%
0.0%
0.9
0.0
30
Korea, Republic of
50
1.2%
0.2%
0.6
0.1
27
As in the previous entry, if I had chosen the TOMS shoes give-away impact evaluation as an example of an RCT and what they can do and why they are a contribution I would have been rightly criticized as concocting a straw man. But I didn’t.
I was part of a debate about the value of RCTs in development and development economics at ANU’s Australian AID conference. The moderator suggested I defend the title of my paper “Randomizing Development: Method or Madness” versus a proponent who argued they had “revolutionary” impact. I did this with vigor, arguing that RCTs were too often focused on narrow, individuated, targeted programs due to methodological limitations in generating a clean experiment with adequate statistical power on bigger issues of broad importance.
After our opening statements, in response to questions from the audience the proponent of the value of RCTs used as a positive example of the usefulness of RCTs an RCT about giving away free shoes. Apparently some shoe company CEO decided that giving kids free shoes would be a good idea to help “the poor” (and market his product). After some time of giving away shoes he did an RCT and found, surprise, surprise, giving a kid a free pair of shoes didn’t make a big (any statistically identifiable) difference. The RCT advocate said that they perhaps did learn how to give free shoes better: give sneakers instead of loafers (or vice versa, or who cares). This, he said, was a good example of the benefits RCTs bring to development and development economics.
Imagine going to any developing country in the world (India, Indonesia, Haiti, Nigeria, Ethiopia, Egypt) and telling that country’s economists and economic policy thought leaders (or political leaders or civil society leaders or just the typical woman or man in the street):
“Development economics has this great new tool that is going to improve development. What we have learned with this great new method is that when a Western shoe company markets their product by associating it with beneficial philanthropy by giving away shoes it doesn’t work unless they should give the kinds of shoes kids are actually going to wear.”
I imagine a couple of possible (related) responses.
“This is madness. How could you have come to believe that anywhere in our top 10, nay, top 20, nay top 50 concerns about development and human well-being in our country was the question of whether (or how) to give away free shoes?”
“Do you not see how condescending to the aspirations of our people and our country this is? To reduce our desire for national development to questions of how Western corporations should design their “charity” to be effective marketing of shoes via buffing their corporate image because this type of “intervention” fits your method is beyond insulting.”
Or perhaps they would be polite and respond with bewildered silence.
If I had used this example of an RCT I think I would have legitimately been criticized as being snarky and sarcastic and attacking a straw man by picking the worst possible instance of an RCT to attack. But I didn’t. This isn’t a straw man. This is sincerely what at least one prominent proponent of the “revolutionary” impact of RCTs thinks is a good example. of the contribution of RCTs to development.
Here is the link to the YouTube version of the speech I have in Lahore.
I am adding this in part because the other day I was going to visit my grandson in Oakland and was running late so my daughter was showing him videos of his grandpa from the internet. Being under two years old he didn’t really get the mathy parts but did say “Red grandpa” or “Blue grandpa” so here is a video of “blue
Yogi Berra famously said:
“It is tough to make predictions, especially about the future.”
However, there is one set of predictions about the future of
the world’s advanced economies that are almost certainly right, are
historically unprecedented, have dramatic implications, and yet are mostly
ignored. These are the demographic
forecasts in the UN Population Division’s 2019 World Population Prospects.
The UN’s seven (sub) regions of advanced economies: North
America, Australia and New Zealand, Europe (with four regions: East, West, North and South) and Japan are
headed into a historically unprecedented demographic transition into much, much
older populations.
As fertility in these regions had dropped below replacement
population growth has slowed. But the overall growth of the population is not the
main economic issue: it is the change in the age composition of the
population. Over the next decades the
number of people who are old will continue to grow and the number of people who
are young and of “labor force” aged will shrink. Hence the ratios of prime labor force aged (25
to 64) to “retirement” aged (65 plus) populations will fall to numbers never
before observed.
Figure 1 shows for these seven regions the evolution of the ratio of prime labor force aged (PLFA) to retirement aged (RA) population using actual estimates from 1950 to 2020 and the UN Zero Migration (ZM) scenario for predictions from 2020 to 2050 (the two are the same for 2020). The Zero Migration scenario illustrates most clearly the demographic implications of lower fertility and extended longevity.
Figure 1: Ratios of prime labor force aged (25-64) to retirement aged populations (65 plus) without migration are forecast to fall to historically unprecedented levels in every advanced economy region
The lowest PLFA/RA
ratio in 1950 was 5. Before 2000 the
lowest of any region in the world was 3.28 (roughly the same ratio in 2000 for
North, West, and South Europe and Japan).
In 2020 this ratio is lower in every
one of these regions than it was in any
region in the world in 2000.
The aggregate “advanced region” ratio in 2020 (2.76) is less
than half what it was in 1960 (5.66).
In 2015 Japan’s ratio fell below 2, less than two people of
prime work force age for every retirement aged person. In the Zero Migration scenario every advanced economy region is forecast
to be below 2 by 2050.
It is far from obvious that these ratios of work force to
aged population can sustain the existing schemes of public and private pensions
and social security for the aged and the implied health expenditures. All of these schemes rose to maturity in a
historical era in which “pay as you ago” support for these schemes was based on
favorable demographics of large ratios of workers to retirees and growing
populations (and growing numbers in formal sector employment and rising wages
of the employed).
Over
the next few decades the aged population continues to growth whereas the growth
of the younger population turns sharply negative.
Figure 2 illustrates the evolution of PLFA and RA aged populations separately. In the UN WPP Zero Migration scenario the 65 plus population is forecast to grow by 95 million from 2020 to 2050, from 246 million to 340 million (annual rate of growth of 1.09%). In contrast, the 25-64 aged population is forecast to fall by 121 million people (a negative annual growth rate of 0.65%). So, while the average population falls by only .25 percent a year, a phenomenon that might seem small and ignorable and not a crisis or cause for action, the growth differential between the aged and younger populations is massive, leading to the very large changes in the ratios.
Figure 2: Way more old and way fewer work force aged leaves a massive gap in the workforce aged population needed to keep work force aged to retirement aged ratios constant
With these raw forecasts I do a very simple counter-factual
calculation. Suppose one wanted to keep
the ratio of prime work force aged population to retiree aged population as its
current ratio of 2.76. What would the
population aged 25 to 64 have to be to keep that ratio constant? That is simple arithmetic as each older
person would need 2.76 PLFA people so there would need to be 262 million
(=95*2.76) more PLFA people in 2050
than in 2020.
But, given the demographics without migration, there are not
going to be more labor force aged people,
there are going to be massively less PLFA
people—121 million less.
So the “stable ratio (PLFA/RA)” population of these advanced
regions would have to be 941 million whereas on demographics alone there are
going to be 558 million. The gap is 383
million people.
There are three ways in which 383 million is a massive
number.
First, the entire 2020 population of North America (USA plus
Canada) is 368 million and the entire population of West and North Europe is
302 million. So the “prime aged
population gap” of the advanced regions is equal to the total (all ages) populations of huge advanced regions like North
America or the EU.
Second, and this is adding a hypothetical to a
counter-factual, suppose that gap were filled by migration. This would imply that 40 percent of the total PLFA population (=383/941) would by the
result of migration.
Third, according to the UN estimates of the stocks of
migrants by destination and origin in 2017 there were only 119 million migrants
from “less developed” countries in “more developed” countries. This means that if the gap were to be made up
by migrants from “less developed” regions (and of course since each of the
advanced regions are losing labor force aged populations movement among them
cannot change the overall ratios) this would imply the stock of migrants in
more developed from less developed countries would have to triple over the next 30 years.
Let
me conclude with the four points from the opening paragraph.
One,
while lots of things about the future are hard to predict the demographics of
the next few decades isn’t one of them.
Everyone who is going to be 65 in 2050 is already alive and 35 years old
today so they are neither counter-factual or hypothetical people, the only
question is predicting their survival and the actuarial tables have been quite
stable and predictable. So, compared to
predicting political (who saw the collapse of the Soviet Union?) or economic
(who saw the timing or magnitude 2008 financial crisis?) or social trends (who
saw gay marriage coming so quickly in the USA?), demography is very
predictable.
Two,
these trends are historically unprecedented.
Humankind has never seen entire national and regional populations choose
fertility rates so low the demographic pyramid inverts (more old than young)
and total populations fall so dramatically for such extended periods. The advanced economies of the world are
moving into completely uncharted demographic territory.
Three,
the implications of these demographic trends are likely to be dramatic
(although the implications are hard to predict because these conditions have
never existed before) for the economy, for fiscal balances, for politics, for
society at large.
Four,
so far, these coming demographic changes are largely ignored. As everyone has parents and grandparents and
everyone themselves ages (at the same rate in fact, one year per year) the
importance for the typical individual in these advanced economies lives of
these demographic shifts is likely larger than any of the other issues about
the foreseeable future that do chew up column inches, articles, international conferences,
and media attention. My daughter is 35
years old and very concerned about how climate change will affect her child. I think what demography implies for her (and
her cohort) when she is 65 in 2050 is at least as pressing an issue but has no
traction at all.