Bio

Participation
4

I currently work with CE/AIM-incubated charity ARMoR on research distillation, quantitative modelling and general org-boosting to support policy advocacy for market-shaping tools to incentivise innovation and ensure access to antibiotics to help combat AMR

I previously did AIM's Research Training Program, was supported by a FTX Future Fund regrant and later Open Philanthropy's affected grantees program, and before that I spent 6 years doing data analytics, business intelligence and knowledge + project management in various industries (airlines, e-commerce) and departments (commercial, marketing), after majoring in physics at UCLA and changing my mind about becoming a physicist. I've also initiated some local priorities research efforts, e.g. a charity evaluation initiative with the moonshot aim of reorienting my home country Malaysia's giving landscape towards effectiveness, albeit with mixed results. 

I first learned about effective altruism circa 2014 via A Modest Proposal, Scott Alexander's polemic on using dead children as units of currency to force readers to grapple with the opportunity costs of subpar resource allocation under triage. I have never stopped thinking about it since, although my relationship to it has changed quite a bit; I related to Tyler's personal story (which unsurprisingly also references A Modest Proposal as a life-changing polemic):

I thought my own story might be more relatable for friends with a history of devotion – unusual people who’ve found themselves dedicating their lives to a particular moral vision, whether it was (or is) Buddhism, Christianity, social justice, or climate activism. When these visions gobble up all other meaning in the life of their devotees, well, that sucks. I go through my own history of devotion to effective altruism. It’s the story of [wanting to help] turning into [needing to help] turning into [living to help] turning into [wanting to die] turning into [wanting to help again, because helping is part of a rich life].

How others can help me

I'm looking for "decision guidance"-type roles e.g. applied prioritization research.

How I can help others

Do reach out if you think any of the above piques your interest :)

Comments
197

Topic contributions
3

(Just adding the FrontierMath/GPQA and ARC-AGI charts you mentioned for my own benefit, and others)

o Series Performance
r/OpenAI - OpenAI's new model, o3, shows a huge leap in the world's hardest math benchmark

The LW tag is useful here: it says More Dakka is "the technique of throwing more resources at a problem to see if you get better results". 

I like David Manheim's A Dozen Ways to Get More Dakka; copying it over to reduce the friction of link-clicking:

So if you’re doing something, and it isn’t working well enough, here’s a dozen ways to generate more dakka, and how each could apply if you’re a) exercising, or b) learning new mathematics.

A Dozen Ways

  1. Do it again.
    1. Instead of doing one set of repetitions of the exercise, do two.
    2. If you read the chapter once, read it again.
  2. Use more.
    1. If you were lifting 10 pounds, lift 15.
    2. If you were doing easy problems, do harder ones.
  3. Do more repetitions.
    1. Instead of 10 repetitions, do 15.
    2. If you did 10 problems on the material, do 15.
  4. Increase intensity.
    1. Do your 15 repetitions in 2 minutes instead of 3.
    2. If you were skimming or reading quickly, read more slowly.
  5. Schedule it.
    1. Exercise at a specific time on specific days. Put it on your calendar, and set reminders.
    2. Make sure you have time scheduled for learning the material and doing problems.
  6. Do it regularly.
    1. Make sure you exercise twice a week, and don’t skip.
    2. Make sure you review what you did previously, on a regular basis.
  7. Do it for a longer period.
    1. Keep exercising for another month.
    2. Go through another textbook, or find more problem sets to work through.
  8. Add types.
    1. In addition to push-ups, do bench presses, chest flyers, and use resistance bands.
    2. In addition to the problem sets, do the chapter review exercises, and work through the problems in the chapter on your own.
  9. Expand the repertoire.
    1. Instead of just push–ups, do incline push ups, loaded push-ups, and diamond push-ups.
    2. Find (or invent!) additional problem types; try to prove things with other methods, find different counter-examples or show why a relaxed assumption means the result no longer holds, find pre-written solutions and see if you can guess next steps before reading them.
  10. Add variety.
    1. Do leg exercises instead of just chest exercises. Do cardio, balance, and flexibility training, not just muscle building.
    2. Do adjacent types of mathematics, explore complex analysis, functional analysis, and/or harmonic analysis.
  11. Add feedback.
    1. Get an exercise coach to tell you how to do it better.
    2. Get someone to grade your work and tell you what you’re doing wrong, or how else to learn the material.
  12. Add people.
    1. Have the whole team exercise. Find a group, gym, or exercise class.
    2. Collaborate with others in solving problems. Take a course instead of self-teaching. Get others to learn with you, or teach someone else to solidify your understanding.

Here is a link to archived webpage captures of the article to bypass the paywall. 

On the more practical side, froolow's A critical review of GiveWell's 2022 cost-effectiveness model. GiveWell's CEA spreadsheets now are a lot better in many ways than back then, when they had the same kinds of model design and execution issues as the ones I used to see in my previous day job managing spreadsheet-based dashboards to track management metrics at a fast-growing company full of very bright inexperienced young analysts — this part resonated with my daily pain, as a relative 'non-genius' versus my peers (to borrow froolow's term):

It is fairly clear that the GiveWell team are not professional modellers, in the same way it would be obvious to a professional programmer that I am not a coder (this will be obvious as soon as you check the code in my Refactored model!). That is to say, there’s a lot of wasted effort in the GiveWell model which is typical when intelligent people are concentrating on making something functional rather than using slick technique. A very common manifestation of the ‘intelligent people thinking very hard about things’ school of model design is extremely cramped and confusing model architecture. This is because you have to be a straight up genius to try and design a model as complex as the GiveWell model without using modern model planning methods, and people at that level of genius don’t need crutches the rest of us rely on like clear and straightforward model layout. However, bad architecture is technical debt that you are eventually going to have to service on your model; when you hand it over to a new member of staff it takes longer to get that member of staff up to speed and increases the probability of someone making an error when they update the model.

Angelina Li's Level up your spreadsheeting (longer version: Level up your Google Sheets game) is great too, and much more granular. I would probably recommend their resource to most folks for spreadsheeting in general, and yours for CBAs more specifically.

On the "how to think about modelling better more broadly" side, Methods for improving uncertainty analysis in EA cost-effectiveness models, also by froolow, is one I think about often. I don't have a health economics background, so this argument shifted my perspective:

Uncertainty analysis is a major omission from most published EA models and seems to me like the proverbial ‘hundred dollar bill on the sidewalk’ – many of the core EA debates can be informed (and perhaps even resolved) by high-quality uncertainty analysis and I believe this could greatly improve the state of the art in EA funding decisions.

The goal of this essay is to change the EA community’s view about the minimal acceptable standard for uncertainty analysis in charity evaluation. To the extent that I use the GiveWell model as a platform to discuss broader issues of uncertainty analysis, a secondary goal of the essay is to suggest specific, actionable insights for GiveWell (and other EA cost-effectiveness modellers) as to how to use uncertainty analysis to improve their cost-effectiveness model.

This contributes to a larger strategic ambition I think EA should have, which is improving modelling capacity to the point where economic models can be used as reliable guides to action. Economic models are the most transparent and flexible framework we have invented for difficult decisions taken under resource constraint (and uncertainty), and in utilitarian frameworks a cost-effectiveness model is an argument in its own right (and debatably the only kind of argument that has real meaning in this framework). Despite this, EA appears much more bearish on the use of economic models than sister disciplines such as Health Economics. My conclusion in this piece is that there scope for a paradigm shift in EA modelling before which will improve decision-making around contentious issues.

This too, further down (this time emphasis mine): 

There is probably no single ‘most cost-effective use of philanthropic resources’. Instead, many people might have many different conceptions of the good which leads them to different conclusions even in a state of perfect knowledge about the effectiveness of interventions [1]. From reading the forums where these topics come up I don't think this is totally internalised - if it was totally internalised people would spend time discussing what would have to be true about morality to make their preferred EA cause the most cost-effective, rather than arguing that it is the actual best possible use of resources for all people [2].

Insofar as the GiveWell model is representative, it appears that resolving 'moral' disagreements (e.g. the discount rate) are likely to be higher impact than 'factual disagreements' (e.g. the effectiveness of malaria nets at preventing malaria). This is not unusual in my experience, but it does suggest that the EA community could do more to educate people around these significant moral judgements given that those moral judgements are more 'in play' than they are in Health Economics. Key uncertainties which drive model outputs include:

  • What should the discount rate for life-years and costs be? (And should it be the same for both?)
  • What is the ratio at which we would trade life-years for consumption-doublings?
  • How could we strengthen our assumptions about charity level adjustments?
  • How risk-averse should we be when donating to a charity with both upside and downside risk?

I do a lot of modelling in my job, and I have to say this is the best tacit knowledge piece I've read on modelling in a while (the MC gsheet template is a nice bonus too). Bookmarked for (I expect) frequent future reference. Thanks Richard. 

Awhile back John Wentworth wrote the related essay What Do GDP Growth Curves Really Mean?, where he pointed out that you wouldn't be able to tell that AI takeoff was boosting the economy just by looking at GDP growth data simply because of the way GDP is calculated (emphasis mine):

I sometimes hear arguments invoke the “god of straight lines”: historical real GDP growth has been incredibly smooth, for a long time, despite multiple huge shifts in technology and society. That’s pretty strong evidence that something is making that line very straight, and we should expect it to continue. In particular, I hear this given as an argument around AI takeoff - i.e. we should expect smooth/continuous progress rather than a sudden jump.

Personally, my inside view says a relatively sudden jump is much more likely, but I did consider this sort of outside-view argument to be a pretty strong piece of evidence in the other direction. Now, I think the smoothness of real GDP growth tells us basically-nothing about the smoothness of AI takeoff. Even after a hypothetical massive jump in AI, real GDP would still look smooth, because it would be calculated based on post-jump prices, and it seems pretty likely that there will be something which isn’t revolutionized by AI. At the very least, paintings by the old masters won’t be produced any more easily (though admittedly their prices could still drop pretty hard if there’s no humans around who want them any more). Whatever things don’t get much cheaper are the things which would dominate real GDP curves after a big AI jump.

More generally, the smoothness of real GDP curves does not actually mean that technology progresses smoothly. It just means that we’re constantly updating the calculations, in hindsight, to focus on whatever goods were not revolutionized. On the other hand, smooth real GDP curves do tell us something interesting: even after correcting for population growth, there’s been slow-but-steady growth in production of the goods which haven’t been revolutionized.

I do agree with your remark that

well-chosen economic indices might track “AI capabilities” in a sense more directly tied to the social and geopolitical implications of AI we actually care about for some purposes.[4] Badly chosen economic indices might not.

but for the GDP case I don't actually have any good alternative suggestions, and am curious if others do.

Curious if you happen to have written this up since?

I like this; feels like a more EA-flavored version of Gwern's My Ordinary Life: Improvements Since the 1990s

I'm thinking of all of his cost-effectiveness writings on this forum.

In 2011, GiveWell published the blog post Errors in DCP2 cost-effectiveness estimate for deworming, which made me lose a fair bit of confidence in DCP2 estimates (and by extension DCP3): 

we now believe that one of the key cost-effectiveness estimates for deworming is flawed, and contains several errors that overstate the cost-effectiveness of deworming by a factor of about 100. This finding has implications not just for deworming, but for cost-effectiveness analysis in general: we are now rethinking how we use published cost-effectiveness estimates for which the full calculations and methods are not public.

The cost-effectiveness estimate in question comes from the Disease Control Priorities in Developing Countries (DCP2), a major report funded by the Gates Foundation. This report provides an estimate of $3.41 per disability-adjusted life-year (DALY) for the cost-effectiveness of soil-transmitted-helminth (STH) treatment, implying that STH treatment is one of the most cost-effective interventions for global health. In investigating this figure, we have corresponded, over a period of months, with six scholars who had been directly or indirectly involved in the production of the estimate. Eventually, we were able to obtain the spreadsheet that was used to generate the $3.41/DALY estimate. That spreadsheet contains five separate errors that, when corrected, shift the estimated cost effectiveness of deworming from $3.41 to $326.43. We came to this conclusion a year after learning that the DCP2’s published cost-effectiveness estimate for schistosomiasis treatment – another kind of deworming – contained a crucial typo: the published figure was $3.36-$6.92 per DALY, but the correct figure is $336-$692 per DALY. (This figure appears, correctly, on page 46 of the DCP2.) ... 

I agree with their key takeaways, in particular (emphasis mine)

  • We’ve previously argued for a limited role for cost-effectiveness estimates; we now think that the appropriate role may be even more limited, at least for opaque estimates (e.g., estimates published without the details necessary for others to independently examine them) like the DCP2’s.
  • More generally, we see this case as a general argument for expecting transparency, rather than taking recommendations on trust – no matter how pedigreed the people making the recommendations. Note that the DCP2 was published by the Disease Control Priorities Project, a joint enterprise of The World Bank, the National Institutes of Health, the World Health Organization, and the Population Reference Bureau, which was funded primarily by a $3.5 million grant from the Gates Foundation. The DCP2 chapter on helminth infections, which contains the $3.41/DALY estimate, has 18 authors, including many of the world’s foremost experts on soil-transmitted helminths.

That said, my best guess is such spreadsheet errors probably don't change your bottomline finding that charity cost-effectiveness really does follow a power law — in fact I expect the worst cases to be actively harmful (e.g. PlayPump International), i.e. negative DALYs/$. My prior essentially comes from 80K's How much do solutions to social problems differ in their effectiveness? A collection of all the studies we could find, who find: 

There appears to be a surprising amount of consistency in the shape of the distributions.

The distributions also appear to be closer to lognormal than normal — i.e. they are heavy-tailed, in agreement with Berger’s findings. However, they may also be some other heavy-tailed distribution (such as a power law), since these are hard to distinguish statistically.

Interventions were rarely negative within health (and the miscellaneous datasets), but often negative within social and education interventions (10–20%) — though not enough to make the mean and median negative. When interventions were negative, they seemed to also be heavy-tailed in negative cost effectiveness.

One way to quantify the interventions’ spread is to look at the ratio of between the mean of the top 2.5% and the overall mean and median. Roughly, we can say:

  • The top 2.5% were around 20–200 times more cost effective than the median.
  • The top 2.5% were around 8–20 times more cost effective than the mean.

Overall, the patterns found by Ord in the DCP2 seem to hold to a surprising degree in the other areas where we’ve found data. 

Regarding your future work I'd like to see section, maybe Vasco's corpus of cost-effectiveness estimates would be a good starting point. His quantitative modelling spans nearly every category of EA interventions, his models are all methodologically aligned (since it's just him doing them), and they're all transparent too (unlike the DCP estimates). 

Load more