bruce

2262 karmaJoined Oct 2021

Bio

Doctor from NZ, independent researcher (grand futures / macrostrategy) collaborating with FHI / Anders Sandberg. Previously: Global Health & Development research @ Rethink Priorities.

Feel free to reach out if you think there's anything I can do to help you or your work, or if you have any Qs about Rethink Priorities! If you're a medical student / junior doctor reconsidering your clinical future, or if you're quite new to EA / feel uncertain about how you fit in the EA space, have an especially low bar for reaching out.

Outside of EA, I do a bit of end of life care research and climate change advocacy, and outside of work I enjoy some casual basketball, board games and good indie films. (Very) washed up classical violinist and Oly-lifter.

All comments in personal capacity unless otherwise stated.

Posts
6

Sorted by New

bruce's Quick takes

bruce

· 2y ago · 1m read

126

Articles about recent OpenAI departures

bruce

· 5mo ago · 2m read

Historical Global Health R&D “hits”: Development, main sources of funding, and impact

Rethink Priorities

· 1y ago · 3m read

Better weather forecasting: Agricultural and non-agricultural benefits in low- and lower-middle-income countries

Rethink Priorities

· 1y ago · 4m read

Our research process: an overview from Rethink Priorities’ Global Health and Development team

Rethink Priorities

· 2y ago · 8m read

109

How effective are prizes at spurring innovation?

Rethink Priorities

· 2y ago · 103m read

Comments
105

StrongMinds should not be a top-rated charity (yet)

bruce2y76

Thanks for writing this post!

I feel a little bad linking to a comment I wrote, but the thread is relevant to this post, so I'm sharing in case it's useful for other readers, though there's definitely a decent amount of overlap here.

TL; DR

I personally default to being highly skeptical of any mental health intervention that claims to have ~95% success rate + a PHQ-9 reduction of 12 points over 12 weeks, as this is is a clear outlier in treatments for depression. The effectiveness figures from StrongMinds are also based on studies that are non-randomised and poorly controlled. There are other questionable methodology issues, e.g. surrounding adjusting for social desirability bias. The topline figure of $170 per head for cost-effectiveness is also possibly an underestimate, because while ~48% of clients were treated through SM partners in 2021, and Q2 results (pg 2) suggest StrongMinds is on track for ~79% of clients treated through partners in 2022, the expenses and operating costs of partners responsible for these clients were not included in the methodology.

(This mainly came from a cursory review of StrongMinds documents, and not from examining HLI analyses, though I do think "we’re now in a position to confidently recommend StrongMinds as the most effective way we know of to help other people with your money" seems a little overconfident. This is also not a comment on the appropriateness of recommendations by GWWC / FP)

(commenting in personal capacity etc)

Edit:
Links to existing discussion on SM. Much of this ends up touching on discussions around HLI's methodology / analyses as opposed to the strength of evidence in support of StrongMinds, but including as this is ultimately relevant for the topline conclusion about StrongMinds (inclusion =/= endorsement etc):

StrongMinds should not be a top-rated charity (yet)
- Comments (1, 2) about outsider perception of HLI as an advocacy org
- Comment about ideal role of an org like HLI, as well as trying to decouple the effectiveness of StrongMinds with whether or not WELLBYs / subjective wellbeing scores are valuable or worth more research on the margin.
- Twitter exchange between Berk Özler and Johannes Haushofer, particularly relevant given Özler's role in an upcoming RCT of StrongMinds in Uganda (though only targeted towards adolescent girls)
Evaluating StrongMinds: how strong is the evidence? and the comment section. In particular:
- Thread 1
- Thread 2
James Snowden's analysis of household spillovers
GiveWell's Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds
Comments in the post: The Happier Lives Institute is funding constrained and needs you!
- Greg claims "study registration reduces expected effect size by a factor of 3"
- Topline finding weighted 13% from StrongMinds RCT, where d = 1.72
- "this is a very surprising mistake for a diligent and impartial evaluator to make"
- Greg commits to: "donat[ing] 5k USD if the [Baird] RCT reports an effect size greater than d = 0.4 - 2x smaller than HLI's estimate of ~ 0.8, and below the bottom 0.1% of their monte carlo runs."
- Comment thread on discussion being harsh and "epistemic probation"
- James and Alex push back on some claims they consider to be misleading.
Learning from our mistakes: how HLI plans to improve
Update on the Baird RCT

Brainstorming ways to make EA safer and more inclusive

bruce2y23

While I agree that both sides are valuable, I agree with the anon here - I don't think these tradeoffs are particularly relevant to a community health team investigating interpersonal harm cases with the goal of "reduc[ing] risk of harm to members of the community while being fair to people who are accused of wrongdoing".

One downside of having the bad-ness of say, sexual violence^[1]be mitigated by their perceived impact,(how is the community health team actually measuring this? how good someone's forum posts are? or whether they work at an EA org? or whether they are "EA leadership"?) when considering what the appropriate action should be (if this is happening) is that it plausibly leads to different standards for bad behaviour. By the community health team's own standards, taking someone's potential impact into account as a mitigating factor seems like it could increase the risk of harm to members of the community (by not taking sufficient action with the justification of perceived impact), while being more unfair to people who are accused of wrongdoing. To be clear, I'm basing this off the forum post, not any non-public information

Additionally, a common theme about basically every sexual violence scandal that I've read about is that there were (often multiple) warnings beforehand that were not taken seriously.

If there is a major sexual violence scandal in EA in the future, it will be pretty damning if the warnings and concerns were clearly raised, but the community health team chose not to act because they decided it wasn't worth the tradeoff against the person/people's impact.

Another point is that people who are considered impactful are likely to be somewhat correlated with people who have gained respect and power in the EA space, have seniority or leadership roles etc. Given the role that abuse of power plays in sexual violence, we should be especially cautious of considerations that might indirectly favour those who have power.

More weakly, even if you hold the view that it is in fact the community health team's role to "take the talent bottleneck seriously; don’t hamper hiring / projects too much" when responding to say, a sexual violence allegation, it seems like it would be easy to overvalue the bad-ness of the immediate action against the person's impact, and undervalue the bad-ness of many more people opting to not get involved, or distance themselves from the EA movement because they perceive it to be an unsafe place for women, with unreliable ways of holding perpetrators accountable.

That being said, I think the community health team has an incredibly difficult job, and while they play an important role in mediating community norms and dynamics (and thus have corresponding amount of responsibility), it's always easier to make comments of a critical nature than to make the difficult decisions they have to make. I'm grateful they exist, and don't want my comment to come across like an attack of the community health team or its individuals!

(commenting in personal capacity etc)

^{^}
used as an umbrella term to include things like verbal harassment. See definition here.

CEA/EV + OP + RP should engage an independent investigator to determine whether key figures in EA knew about the (likely) fraud at FTX

bruce2y75

If this comment is more about "how could this have been foreseen", then this comment thread may be relevant. I should note that hindsight bias means that it's much easier to look back and assess problems as obvious and predictable ex post, when powerful investment firms and individuals who also had skin in the game also missed this.

TL;DR:
1) There were entries that were relevant (this one also touches on it briefly)
2) They were specifically mentioned
3) There were comments relevant to this. (notably one of these was apparently deleted because it received a lot of downvotes when initially posted)
4) There has been at least two other posts on the forum prior to the contest that engaged with this specifically

My tentative take is that these issues were in fact identified by various members of the community, but there isn't a good way of turning identified issues into constructive actions - the status quo is we just have to trust that organisations have good systems in place for this, and that EA leaders are sufficiently careful and willing to make changes or consider them seriously, such that all the community needs to do is "raise the issue". And I think looking at the systems within the relevant EA orgs or leadership is what investigations or accountability questions going forward should focus on - all individuals are fallible, and we should be looking at how we can build systems in place such that the community doesn't have to just trust that people who have power and who are steering the EA movement will get it right, and that there are ways for the community to hold them accountable to their ideals or stated goals if it appears to, or risks not playing out in practice.

i.e. if there are good processes and systems in place and documentation of these processes and decisions, it's more acceptable (because other organisations that probably have a very good due diligence process also missed it). But if there weren't good processes, or if these decisions weren't a careful + intentional decision, then that's comparatively more concerning, especially in context of specific criticisms that have been raised,^[1] or previous precedent. For example, I'd be especially curious about the events surrounding Ben Delo,^[2] and processes that were implemented in response. I'd be curious about whether there are people in EA orgs involved in steering who keep track of potential risks and early warning signs to the EA movement, in the same way the EA community advocates for in the case of pandemics, AI, or even general ways of finding opportunities for impact. For example, SBF, who is listed as a EtG success story on 80k hours, has publicly stated he's willing to go 5x over the Kelly bet, and described yield farming in a way that Matt Levine interpreted as a Ponzi. Again, I'm personally less interested in the object level decision (e.g. whether or not we agree with SBF's Kelly bet comments as serious, or whether Levine's interpretation as appropriate), but more about what the process was, how this was considered at the time with the information they had etc. I'd also be curious about the documentation of any SBF related concerns that were raised by the community, if any, and how these concerns were managed and considered (as opposed to critiquing the final outcome).

Outside of due diligence and ways to facilitate whistleblowers, decision-making processes around the steering of the EA movement is crucial as well. When decisions are made by orgs that bring clear benefits to one part of the EA community while bringing clear risks that are shared across wider parts of the EA community,^[3] it would probably be of value to look at how these decisions were made and what tradeoffs were considered at the time of the decision. Going forward, thinking about how to either diversify those risks, or make decision-making more inclusive of a wider range stakeholders^[4], keeping in mind the best interests of the EA movement as a whole.

(this is something I'm considering working on in a personal capacity along with the OP of this post, as well as some others - details to come, but feel free to DM me if you have any thoughts on this. It appears that CEA is also already considering this)

If this comment is about "are these red-teaming contests in fact valuable for the money and time put into it, if it misses problems like this"

I think my view here (speaking only for the red-teaming contest) is that even if this specific contest was framed in a way that it missed these classes of issues, the value of the very top submissions^[5] may still have made the efforts worthwhile. The potential value of a different framing was mentioned by another panelist. If it's the case that red-teaming contests are systematically missing this class of issues regardless of framing, then I agree that would be pretty useful to know, but I don't have a good sense of how we would try to investigate this.

^{^}
This tweet seems to have aged particularly well. Despite supportive comments from high-profile EAs on the original forum post, the author seemed disappointed that nothing came of it in that direction. Again, without getting into the object level discussion of the claims of the original paper, it's still worth asking questions around the processes. If there was were actions planned, what did these look like? If not, was that because of a disagreement over the suggested changes, or the extent that it was an issue at all? How were these decisions made, and what was considered?
^{^}
Apparently a previous EA-aligned billionaire ?donor who got rich by starting a crypto trading firm, who pleaded guilty to violating the bank secrecy act
^{^}
Even before this, I had heard from a primary source in a major mainstream global health organisation that there were staff who wanted to distance themselves from EA because of misunderstandings around longtermism.
^{^}
This doesn't have to be a lengthy deliberative consensus-building project, but it should at least include internal comms across different EA stakeholders to allow discussions of risks and potential mitigation strategies.
^{^}
e.g. A critical review of GiveWell's 2022 cost-effectiveness model, Methods for improving uncertainty analysis in EA cost-effectiveness models, and
Biological Anchors external review

Winners of the EA Criticism and Red Teaming Contest

bruce2y47

As requested, here are some submissions that I think are worth highlighting, or considered awarding but ultimately did not make the final cut. (This list is non-exhaustive, and should be taken more lightly than the Honorable mentions, because by definition these posts are less strongly endorsed by those who judged it. Also commenting in personal capacity, not on behalf of other panelists, etc):

Bad Omens in Current Community Building
I think this was a good-faith description of some potential / existing issues that are important for community builders and the EA community, written by someone who "did not become an EA" but chose to go to the effort of providing feedback with the intention of benefitting the EA community. While these problems are difficult to quantify, they seem important if true, and pretty plausible based on my personal priors/limited experience. At the very least, this starts important conversations about how to approach community building that I hope will lead to positive changes, and a community that continues to strongly value truth-seeking and epistemic humility, which is personally one of the benefits I've valued most from engaging in the EA community.

Seven Questions for Existential Risk Studies
It's possible that the length and academic tone of this piece detracts from the reach it could have, and it (perhaps aptly) leaves me with more questions than answers, but I think the questions are important to reckon with, and this piece covers a lot of (important) ground. To quote a fellow (more eloquent) panelist, whose views I endorse: "Clearly written in good faith, and consistently even-handed and fair - almost to a fault. Very good analysis of epistemic dynamics in EA." On the other hand, this is likely less useful to those who are already very familiar with the ERS space.

Most problems fall within a 100x tractability range (under certain assumptions)
I was skeptical when I read this headline, and while I'm not yet convinced that 100x tractability range should be used as a general heuristic when thinking about tractability, I certainly updated in this direction, and I think this is a valuable post that may help guide cause prioritisation efforts.

The Effective Altruism movement is not above conflicts of interest
I was unsure about including this post, but I think this post highlights an important risk of the EA community receiving a significant share of its funding from a few sources, both for internal community epistemics/culture considerations as well as for external-facing and movement-building considerations. I don't agree with all of the object-level claims, but I think these issues are important to highlight and plausibly relevant outside of the specific case of SBF / crypto. That it wasn't already on the forum (afaict) also contributed to its inclusion here.

I'll also highlight one post that was awarded a prize, but I thought was particularly valuable:

Red Teaming CEA’s Community Building Work
I think this is particularly valuable because of the unique and difficult-to-replace position that CEA holds in the EA community, and as Max acknowledges, it benefits the EA community for important public organisations to be held accountable (and to a standard that is appropriate for their role and potential influence). Thus, even if listed problems aren't all fully on the mark, or are less relevant today than when the mistakes happened, a thorough analysis of these mistakes and an attempt at providing reasonable suggestions at least provides a baseline to which CEA can be held accountable for similar future mistakes, or help with assessing trends and patterns over time. I would personally be happy to see something like this on at least a semi-regular basis (though am unsure about exactly what time-frame would be most appropriate). On the other hand, it's important to acknowledge that this analysis is possible in large part because of CEA's commitment to transparency.

Ozzie Gooen's Quick takes

bruce2mo6

I think that certain EA actions in ai policy are getting a lot of flak.
Also, I suspect that the current EA AI policy arm could find ways to be more diplomatic and cooperative

Would you be happy to expand on these points?

The Happier Lives Institute is funding constrained and needs you!

bruce2mo6

It sounds like you're interpreting my claim to be "the Baird RCT is a particularly good proxy (or possibly even better than other RCTs on group therapy in adult women) for the SM adult programme effectiveness", but this isn't actually my claim here; and while I think one could reasonably make some different, stronger (donor-relevant) claims based on the discussions on the forum and the Baird RCT results, mine are largely just: "it's an important proxy", "it's worth updating on", and "the relevant considerations/updates should be easily accessible on various recommendation pages". I definitely agree that an RCT on the adult programme would have been better for understanding the adult programme.

(I'll probably check out of the thread here for now, but good chatting as always Nick! hope you're well)

The Happier Lives Institute is funding constrained and needs you!

bruce2mo19

Yes, because:

1) I think this RCT is an important proxy for StrongMinds (SM)'s performance 'in situ', and worth updating on - in part because it is currently the only completed RCT of SM. Uninformed readers who read what is currently on e.g. GWWC^[1]/FP^[2]/HLI website might reasonably get the wrong impression of the evidence base behind the recommendation around SM (i.e. there are no concerns sufficiently noteworthy to merit inclusion as a caveat). I think the effective giving community should have a higher bar for being proactively transparent here - it is much better to include (at minimum) a relevant disclaimer like this, than to be asked questions by donors and make a claim that there wasn't capacity to include.^[3]

2) If a SM recommendation is justified as a result of SM's programme changes, this should still be communicated for trust building purposes (e.g. "We are recommending SM despite [Baird et al RCT results], because ...), both for those who are on the fence about deferring, and for those who now have a reason to re-affirm their existing trust on EA org recommendations.^[4]

3) Help potential donors make more informed decisions - for example, informed readers who may be unsure about HLI's methodology and wanted to wait for the RCT results should not have to go search this up themselves or look for a fairly buried comment thread on a post from >1 year ago in order to make this decision when looking at EA recommendations / links to donate - I don't think it's an unreasonable amount of effort compared to how it may help. This line of reasoning may also apply to other evaluators (e.g. GWWC evaluator investigations).^[5]

^{^}
GWWC website currently says it only includes recommendations after they review it through their Evaluating Evaluators work, and their evaluation of HLI did not include any quality checks of HLI's work itself nor finalise a conclusion. Similarly, they say: "we don't currently include StrongMinds as one of our recommended programs but you can still donate to it via our donation platform".
^{^}
Founders Pledge's current website says:
We recommend StrongMinds because IPT-G has shown significant promise as an evidence-backed intervention that can durably reduce depression symptoms. Crucial to our analysis are previous RCTs
^{^}
I'm not suggesting at all that they should have done this by now, only ~2 weeks after the Baird RCT results were made public. But I do think three months is a reasonable timeframe for this.
^{^}
If there was an RCT that showed malaria chemoprevention cost more than $6000 per DALY averted in Nigeria (GDP/capita * 3), rather than per life saved (ballpark), I would want to know about it. And I would want to know about it even if Malaria Consortium decided to drop their work in Nigeria, and EA evaluators continued to recommend Malaria Consortium as a result. And how organisations go about communicating updates like this do impact my personal view on how much I should defer to them wrt charity recommendations.
^{^}
Of course, based on HLI's current analysis/approach, the ?disappointing/?unsurprising result of this RCT (even if it was on the adult population) would not have meaningfully changed the outcome of the recommendation, even if SM did not make this pivot (pg 66):
Therefore, even if the StrongMinds-specific evidence finds a small total recipient effect (as we present here as a placeholder), and we relied solely on this evidence, then it would still result in a cost-effectiveness that is similar or greater than that of GiveDirectly because StrongMinds programme is very cheap to deliver.
And while I think this is a conversation that has already been hashed out enough on the forum, I do think the point stands - potential donors who disagree with or are uncertain about HLI's methodology here would benefit from knowing the results of the RCT, and it's not an unreasonable ask for organisations doing charity evaluations / recommendations to include this information.
^{^}
Based on Nigeria's GDP/capita * 3
^{^}
Acknowledging that this is DALYs not WELLBYs! OTOH, this conclusion is not the GiveWell or GiveDirectly bar, but a ~mainstream global health cost-effectiveness standard of ~3x GDP per capita per DALY averted (in this case, the ~$18k USD PPP/DALY averted of SM is below the ~$7k USD PPP/DALY bar for Uganda)

The Happier Lives Institute is funding constrained and needs you!

bruce2mo41

My view is that HLI^[1], GWWC^[2], Founders Pledge^[3], and other EA / effective giving orgs that recommend or provide StrongMinds as an donation option should ideally at least update their page on StrongMinds to include relevant considerations from this RCT, and do so well before Thanksgiving / Giving Tuesday in Nov/Dec this year, so donors looking to decide where to spend their dollars most cost effectively can make an informed choice.^[4]

^{^}
Listed as a top recommendation
^{^}
Not currently a recommendation, (but to included as an option to donate)
^{^}
Currently tagged as an "active recommendation"
^{^}
Acknowledging that HLI's current schedule is "By Dec 2024", though this may only give donors 3 days before Giving Tuesday.

Kaya Guides Pilot Results

bruce4mo12

Congratulations on the pilot!

I just thought I'd flag some initial skepticism around the claim:

Our estimates indicate that next year, we will become 20 times as cost-effective as cash transfers.

Overall I expect it may be difficult for the uninformed reader to know how much they should update based on this post (if at all), but given you have acknowledged many of these (fairly glaring) design/study limitations in the text itself, I am somewhat surprised the team is still willing to make the extrapolation from 7x to 20x GD within a year. It also requires that the team is successful with increasing effective outreach by 2 OOMs despite currently having less than 6 months of runway for the organisation.^[1]

I also think this pilot should not give the team "a reasonable level of confidence that [the] adaptation of Step-by-Step was effective" insofar as the claim is that charitable dollars here are cost competitive with top GiveWell charities / have good reason to believe you will be 2x top GiveWell charities next year) (though perhaps you just meant from an implementation perspective, not cost-effectiveness). My current view is that while this might be a reasonable place to consider funding for non-EA funders (or e.g. specifically interested in mental health or mental health in India), I'd hope that the EA community who are looking to maximise impact through their donations in the GHD space would update based on higher evidentiary standards than what has been provided in this post, which IMO indicates little beyond feasibility and acceptability (which is still promising and exciting news, and I don't want to diminsh this!)

I don't want this to come across as a rebuke of the work the team is trying to do - I am on the record for being excited about more people doing work that use subjective wellbeing on the margin, and I think this is work worth doing. But I hope the team is mindful that continued overconfident claims in this space may cause people to negatively update and less likely to fund this work in future, and for totally preventable communication-related decisions, and not because wellbeing approaches are bad/not worth funding in principle.

^{^}
A very crude BOTEC based only on the increased time needed for the 15min / week calls with 10,000 people indicates something like 17 additional guides doing the 15min calls full time, assuming they do nothing but these calls every day. The increase in human resources to scale up to reaching 10,000 people are of course much more intensive than this, even for a heavily WhatsApp based intervention.

10000 * 0.25 * 6 * 0.27 / 40 / 6 = 16.875
(number reached * hours per week * weeks * retention / hours per week / week)

Ben Stevenson's Quick takes

bruce4mo7

Hey Ben! A few quick Qs:

Did the team consider a paid/minimum wage position instead of an unpaid one? How did it decide on the unpaid positions?
Is the theory of change for impact here mainly an "upskill students/early career researchers" thing, or for the benefits to RP's research outputs?
What is RP's current policy on volunteers?
Does RP expect to continue recruiting volunteers for research projects in the future?

bruce

Bio

Posts 6

Comments105

Posts
6

Comments
105