Abstract: The evidence supporting GiveWell’s top cause area, SMC (Seasonal Malaria Chemoprevention) is much weaker than it appears and would benefit from high-quality replication. Specifically, GiveWell’s assertion that every $5,000 spent on SMC saves a life is a stronger claim than the literature warrants on three grounds: 1) the effect size is small and imprecisely estimated; 2) co-interventions delivered simultaneously pose a threat to external validity; and 3) the research lacks the quality markers of the replication/credibility revolution. I conclude by arguing that any replication of SMC should meet the standards of rigor and transparency set by GiveDirectly, whose evaluations clearly demonstrate contemporary best practices in open science.
1. Introduction: the evidence for Seasonal Malaria Chemoprevention
GiveWell currently endorses four top charities, with first place going to the Malaria Consortium, a charity that delivers Seasonal Malaria Chemoprevention (SMC). GiveWell provides more context on its Malaria Consortium – Seasonal Malaria Chemoprevention page and its Seasonal Malaria Chemoprevention intervention report. That report is built around a Cochrane review of seven randomized controlled trials (Meremikwu et al. 2012). GiveWell discounts one of those studies (Dicko et al. 2008) for technical reasons and includes an additional trial published later (Tagbor et al. 2016) in its evidence base.
No new research has been added since then, and GiveWell’s SMC report was last updated in 2018. It appears as though GiveWell treats the question of “does SMC work?” as effectively settled.
I argue that GiveWell should revisit its conclusions about SMC and should fund and/or oversee a high-quality replication study on the subject. While there is very strong evidence that SMC prevents the majority of malaria episodes, “including severe episodes” (Meremikwu et al. 2012, p. 2), GiveWell’s estimate that every $5,000 of SMC saves a life in expectation is shaky on three grounds related to research quality: 1) the underlying effect size is small, relative to the sample size, and statistically imprecise; 2) SMC is often tested in places receiving other interventions, which threatens external validity because we don’t know which set of interventions bests maps onto the target population; and 3) the evidence comes from studies that are pre-credibility revolution, and therefore lack quality controls such as detailed pre-registration, open code and data, and sufficient statistical power.
2. Three grounds for doubting the relationship between SMC and mortality
2.1 The effect size is small and imprecisely estimated
Across an N of 12,589, Meremikwu et al. record 10 deaths in the combined treatment groups and 16 in the combined control groups. Subtracting the one study that GiveWell discounts and including the one they supplement with, we arrive at 10 deaths for treatment and 15 for control. As the authors note, “the difference was not statistically significant” (p. 12), “and none of the trials were adequately powered to detect an effect on mortality…However, a reduction in death would be consistent with the high quality evidence of a reduction in severe malaria” (p. 4).[1]
Overall, the authors conclude, SMC “probably prevents some deaths,” but “[l]arger trials are necessary to have full confidence in this effect” (p. 4).
As a benchmark, a recent study on deworming (N = 14,172) estimates that deworming saves 18 lives per 1000 childbirths, versus about 0.4 for SMC.
GiveWell forthrightly acknowledges this "limited evidence" on its SMC page, and explains why it believes SMC reduces morality to a larger degree than the assembled studies suggest directly. This is laudably transparent, but the question is foundational to all of GiveWell's subsequent analyses of SMC. Especially given the organization's strong funding position, GiveWell should devote resources towards bolstering that limited evidence through replication.
2.2 It’s unclear which studies map directly to the target population
Of the seven studies analyzed by Meremikwu et al., four test SMC in settings where both treatment and control samples are already receiving anti-malaria interventions. Two studies test SMC along with “home-based management of malaria (HMM)” while two others test SMC “alongside ITN [insecticide treated nets] distribution and promotion” (p. 9).
GiveWell’s SMC intervention report notes that SMC + ITN trials found “similar proportional reduction in malaria incidence to trials which did not promote ITNs.” This finding is useful and interesting, but not does not self-evidently help us estimate the effects of just SMC on mortality, which is the basis of GiveWell’s cost-benefit analyses. To make the leap between the four studies that include co-interventions and those that don’t, we need an additional identifying assumption about external validity, such as:
- any interaction effect between SMC and co-interventions is negligible or negative, which makes these estimates minimally or downwardly biased;
- the target population will have a mix of people receiving ITNs, HMM, or neither, and, therefore, we should aggregate these studies to mirror the target population.
GiveWell does not take a position on this. The SMC intervention report says that the organization has “not carefully considered whether Malaria Consortium’s SMC program is operating in areas where ITN coverage is being expanded.” It does not mention HMM.
If we only look at the two studies that estimate the relationship between just SMC and mortality, we see that five children died in the combined control group (N = 1,139), while four died in the combined treatment group (N = 1,122). Every death of a child is a tragedy; but this difference is not a strong basis for determining where the marginal dollar is most likely to save a life, and we are always triaging.
This issue merits more careful attention than it currently receives from GiveWell. At a minimum, the SMC intervention page might be amended to note GiveWell’s position on the relationship between co-interventions and external validity. More broadly, an SMC replication could have multiple treatment arms to tease out the effects of both SMC and SMC + co-interventions.
2.3 The provided studies lack the quality markers of the credibility revolution
As Andrew Gelman puts it, “What has happened down here is the winds have changed;” as recently as 2011, “the replication crisis was barely a cloud on the horizon.” In the ten years since Meremikwu et al. was published – as well as in the six years since Tagbor et al. (2016) – we’ve learned a lot about what good research looks like. We’ve also learned, as described in a recent essay by Michael Nielsen and Kanjun Qiu, that studies meeting contemporary best practices – detailed pre-registration, “large samples, and open sharing of code, data and other methodological materials” – are systematically more likely to replicate successfully.
The studies cited by GiveWell in support of SMC do not clearly meet these criteria.
- While the original trials are large enough to detect an effect on incidence of malaria, for effects on mortality, “the trials were underpowered to reach statistical significance” (Meremikwu et al. 2012, p. 2).
- The code, data, and materials are not publicly available (as far as I can tell);
- These studies were indeed preregistered (e.g. here and here), but not in ways that would meaningfully constrain researcher degrees of freedom.[2]
This isn’t to say they won’t replicate if we run them again using contemporary best practices. But given the literally hundreds of millions of dollars at stake, let’s verify rather than assume.
3. Conclusion
One of the unsettling conclusions of the replication revolution is that when studies implement stringent quality standards, they’re more likely to produce null results.[3]
As it happens, GiveDirectly, formerly one of GiveWell’s top charities, already has evaluations that meet the highest standards of credibility. Haushofer and Shapiro (2016), for instance, have a meticulous pre-registration plan, large sample sizes, and publicly available code and data; they also “hired two graduate students to audit the data and code” for reproducibility (p. 1977). A subsequent evaluation by the same authors found more mixed results: some positive, enduring changes but also some negative spillovers within treated communities. But GiveDirectly was much more likely to generate null and contradictory findings because its evaluations were so carefully done.
GiveWell argues that it only recommends charities that are at least 10X as effective as cash. Right now, that comparison is confounded by large differences in research quality between GiveDirectly’s evaluations and those supporting SMC.
GiveWell can remedy this by funding an equally high-quality replication for SMC – and then, ideally, for each of its top cause areas.
Thanks to Alix Winter and Daniel Waldinger for comments on an early draft.
Thanks for your entry!
Thanks for these thoughts!
A question: How large do you expect the effects of such a replication to be? Maybe you could estimate "a study of cost x would lead to a change if effect size of y with probability z" for some instances of x,y,z. That would help to estimate whether the study would, in expectation, be worth more than one life saved per 5000 dollars.
And an observation: I think it would be very difficult to get ethical approval for such a study. SMC is (according to current knowledge) an amazing intervention. Any controlled trial would require a control group that does not receive SMC, nor other interventions that could act as confounding factors. Think about it... you'd expect the study to cause ~10 additional preventable child deaths in the control group, just so it can measure an effect! It might be more feasible to make comparison studies between different types of SMC, but of course these don't directly answer your question.
I made an attempt to estimate the cost-effectiveness of replicating research on Deworming in a previous post. There's especially large uncertainty in the Deworming's effect size, so I doubt you'd get as big an effect for SMC. But I think a similar Bayesian modeling approach could for this!
Thanks, I look forward to checking it out! I haven't really followed the worm wars since like 2015 (I was in grad school at the time and a professor in the department wrote something about it that I liked a lot: http://www.columbia.edu/~mh2245/w/worms.html)and I would enjoy jumping back in, time permitting..but I actually just came down with covid so I think it's time to take a rest 😃
Isn't that an objection to any RCT of treatments that have been shown to work in some contexts?
Yes, absolutely.
As far as I can tell, that type of RCT indeed is not being done. I don't know much about research on SMC specifically, but Givewell reports the following quote of Christian Lengeler, author of Cochrane Review of insecticide-treated bed nets:
That's fascinating, the norm is extremely different in economics and I have never heard of this norm. What is the boundary between a necessary replication and something that would be considered unethical?
Hi, and thanks for giving this a close read!
I considered providing an estimate like the one you suggest, but shied away for two reasons:
I am not a subject matter expert and I don’t have a good sense of what the effect size would be — as GiveWell notes, across all seven studies, mortality in both groups is lower than you expect, so there’s some disconnect between theory and empirics here that I/we lack context on;
the expected value of a new finding hinges on equilibrium effects that I can’t really get a handle on. Let’s say that GiveWell finds smaller effects than they expect and then shifts a different charity to be #1. Is that intervention’s evidence really solid, or should that intervention also be closely re-examined and then replicated? I do not know; if I had had more time I would have like to do this type of analysis for the other three interventions as well.
My hope is that if I help point GiveWell in the right direction, people who are more experienced at cost-benefit analysis can take it from there. My comparative advantage is reading RCTs and meta-analyses.
As to the ethical concerns — that depends on whether the control group is likely to have received an anti-malaria treatment in the absence of an intervention, i.e. the point I made in section 2. If everybody is receiving bed nets anyway, let's study that population.
That seems fair. I agree that my request for an estimate is a big, maybe even unreasonable, request.
I asked because I am wondering if there really is enough reason to doubt the results of existing SMC trials. If I understand your post correctly, your main worry is not about actual errors in the trials; we don't have concrete reasons to believe they are wrong. Indeed, the trials provide high-quality evidence that SMC reduces malaria cases, including severe cases.
Your worries seem to be that (1) studies are underpowered to quantify reduction in malaria deaths. I'm not sure if that is a big problem, given that there are clear causal links between malaria cases and malaria deaths. (2) The trials did not follow the new best practices that we've identified since they were published. This indeed makes the trials less reliable than we would wish for, but I'm not sure whether the problem extends to a meta-analysis of seven trials.
For all these reasons, I keep wondering: how strongly do you really believe these results are wrong? And by how much? Even some rough answer would be OK here... and I'm sure it would also help GiveWell when they evaluate this post.
Hi Sjlver,
I've been thinking about this and I think you're right, I do believe that running this replication trial passes a cost-benefit test, and I should try to explain why.
I think there's a 50% chance that a perfectly done SMC replication would find mortality effects that are statistically indistinguishable from a null, for two reasons: 1) the documented empirical effects are strange and don't gel with our underlying theory of malaria; 2) our theory also conflicts with the repeated observation that people living in extreme poverty don't seem to take malaria as seriously as outsiders do, which is prima facie evidence that we're misunderstanding something big.
I know this was all very approximate for a cost-benefit analysis, but IMO,we need a stronger basis for our assumptions about effect sizes than we currently have to be more specific.
putative because I'm pretty sure it doesn't come from Human Challenge Trials, i.e. malaria was not actually the thing randomly assigned. FWIW I don't think that that trial would pass a cost-benefit test.
"The length of follow-up for the included trials varied from six months to two years; with one year being most common" (Meremikwu et al p. 8)
I appreciate the thoughts! I'm going to think about this more thoroughly... but here's a quick guess about the low death numbers:
These trials involved measuring malaria prevalence in children. Presumably, children with a positive result would then get medication or be referred to a health center. Malaria is a curable disease, so this approach would save lives. Unfortunately, it's also quite likely that the child would not receive appropriate treatment in the absence of a diagnosis, due to lack of knowledge of the parents, distance to health facilities, etc.
Anyway, it's just a quick guess. Might be worth checking if the studies describe what happened to children with positive test results.
Looks like I can confirm this. Relevant passages from Cissé et al (2006):
The study was designed to measure Malaria, not deaths:
Children with positive malaria tests received treatment:
I'll still think more about this... but here we have at least a lead towards better understanding of low death numbers in SMC trials.
Thank you for looking into it! Definitely interesting. To recap:
So all in all, a confusing situation. And given the high stakes, I suggest that GiveWell taps a team with expertise in both the subject matter and RCTs to design and run an intervention that maps directly onto the target population.
Two postscripts:
Thanks for the thoughts!
I think we are getting closer to the core of your question here: the relationship between cases of malaria (or severe malaria more specifically) and deaths. I think that it would indeed be good to know more about the circumstances under which children die from malaria, and how this is affected by various kinds of medical care.
The question might partially touch upon SMC. Besides preventing malaria cases, it could also have an effect on severity (I'm thinking of Covid vaccines as an analogy). That said, the case for SMC (as I understand it) is that it's an excellent way to prevent malaria infections. This is what the RCTs measure, and this is where its value comes from.
To answer the question, I believe it would be more helpful to do research into malaria as an illness, rather than doing an SMC trial replication. I continue to think that the evidence base for SMC is good enough. You have doubts since "most published research findings are false", but "most published research findings" might be the wrong reference class here:
You also ask about the settings in which SMC is rolled out. There is no specific answer here, since SMC is often rolled out for entire countries or regions, aiming to fully cover all eligible children. More than 30 million children received SMC last year. In their cost-effectiveness analysis, GiveWell looks at interventions by country and takes a number of relevant factors into account, such as the "mortality rate from malaria for 3-59 month olds".
In general, malaria fatality (deaths per case) is trending downwards a bit, due to factors such as better access to medical care, better diagnosis, better education of parents, and certainly many others. It could make sense to make this explicit when doing a cost-effectiveness analysis.
I'd expect GiveWell to be mindful about these things and to have thought of the most-relevant factors. I don't think additional RCTs would lead to large changes here.
Regarding the post-script about AMF: We are fortunate to have a board of trustees and leaders that think a lot about high-level questions and trends, both those closer to AMF's work (e.g., resistance to insecticides used in nets) and those more peripheral (e.g., the impact of new vaccines). There is also good and regular communication between GiveWell and AMF. As for myself, the day-to-day preoccupations are often much more mundane ;-)
Thanks as always for your careful and helpful read! I was just telling someone yesterday that this exchange is a positive reflection on the EA community and ethos — as a comparison point, it’s been way more constructive and collaborative than any of my experiences with academic peer review.
It sounds like I haven’t changed your mind on the core subject and that’s totally understandable. I speculate that this is something of a (professional) culture difference — the academics I discussed this essay with all started nodding along with the general idea the moment I mentioned “uncertainty about external validity” 😃
And thanks for the insight into AMF, y’all do great work.
The Right-Fit Evidence group provides good resources related to this post. They publish guidance on what types of evidence implementers should collect to demonstrate and monitor the impact of their programs.
Notably, different types of evidence are ideal depending on the stage of a program. In the beginning, when there is lots of uncertainty about an intervention, a randomized controlled trial is great. At a later stage, when the program is scaling to many recipients, it is more important to monitor the program and ensure that the implementation is done well.
In the case of SMC, millions of children receive treatments. A wealth of monitoring data is collected, much more than could be obtained in an RCT. Even though that data isn't randomized or controlled, its quantity might make up for these deficits and allow us to determine whether SMC works with sufficient confidence.
More information can be downloaded on the Right-Fit Evidence website. And here's an introduction to their framework.
Thanks, this is very useful and new to me! (I briefly consulted/worked for IPA in 2015-2016.)