Jonathan Harris, PhD | jonathan@total-portfolio.org
Total Portfolio Project's goal is to help altruistic investors prioritize the most impactful funding opportunities, whether that means grants, investing to give or impact investments. Projects we've completed range from theoretical research (like in this post), to advising on high impact investment deals, to strategic research on balancing give now versus later for new major donors.
I wouldn't put the key point here down to 'units'. I would say the aggregate units of GiveWell tends to use ('units of value' and lives saved) and of GIF (person-year of income-equivalent, "PYI") are very similar. I think any differences in terms of these units is going to be more about subjective differences in 'moral weights'. Other than moral weight differences, I'd expect the same analysis using GiveWell vs GIF units to deliver essentially the same results.
The point you're bringing up, and that Ken discusses as 'Apples vs oranges', is that the analysis is different. GiveWell's Top Charity models are about the outputs your dollar can buy now and the impacts those outputs will generate (e.g., over a few years for nets, and a few more for cash). As part of its models, GIF projects outputs (and impacts) out to 10 years. Indeed this is necessary when looking at early-stage ventures as most of their impact will be in the future and these organizations do not have to be cost-effective in their first year to be worth funding. If you were considering funding LLINs in 2004, or even earlier, you most likely would want to be doing a projection and not just considering the short-term impacts.
Of course, as has been repeatedly discuss in these comments, when projected impact is part of the model how much contribution to assign to the original donors becomes a big issue. But I believe that it is possible to agree on a consistent method for doing this. And once you have that, this really becomes more of an 'Apples vs apples' comparison.
For example, you might decide that a dollar to a malaria net charity now buys an amount of nets right now, but has limited impact on the future growth of that charity. So the current GiveWell estimates don't need to be modified even if you're including impact projections.
My current understanding of the state of play around projected / forecast impact is:
Thank you for engaging with this discussion, Ken!
It's great to have these clarifications in your own words. As you highlight there are many important and tricky issues to grapple with here. I think we're all excited about the innovative work you're doing and excited to learn more as you're able to publish more information.
Actually, they are more of a grant fund than an impact investment fund. I've updated the post to clarify this. Thanks for bringing it up.
One might call them an 'investing for impact' fund - making whatever investments they think will generate the biggest long-term impact.
The reported projections aren't adjusted for counterfactuals (or additionality, contribution, funging, etc.). I wonder if the fact we're mostly talking about GIF grants vs GiveWell grants changes your worry at all?
For my part, I'd be excited to see more grant analyses (in addition to impact investment analyses) explicitly account for counterfactuals. I believe GiveWell does make some adjustments for funging, though I'm uncertain if they are comprehensive enough.
I'm torn with this post as while I agree with the overall spirit (that EAs can do better at cooperation and counterfactuals, be more prosocial), I think the post makes some strong claims/assumptions which I disagree with. I find it problematic that these assumptions are stated like they are facts.
First, EA may be better at "internal" cooperation than other groups, but cooperation is hard and internal EA cooperation is far from perfect.
Second, the idea that correctly assessed counterfactual impact is hyperopic. Nope, hyperopic assessments are just a sign of not getting your counterfactual right.
Third, the idea that Shapley values are the solution. I like Shapley values but only within the narrow constraints for which they are well specified. That is, environments where cooperation should inherently be possible: when all agents agree on the value that is being created. In general you need an approach that can hand both cooperative and adversial environments and everything in between. I'd call that general approach counterfactual impact. I see another commentor has noted Toby's old comments about this and I'll second that.
Finally, economists may do more counterfactual reasoning than other groups but that doesn't mean they have it all figured out. Ask your average economist to quickly model a counterfactual and it could easily end up being as myopic or hyperopic too. The solution is really to get all analysts better trained on heuristics for reasoning about counterfactuals in a way that is prosocial. To me that is what you get to if you try to implement philosophies like Toby's global consequentialism. But we need more practical work on things like this, not repetitive claims about Shapley values.
I'm writing quickly and hope this comes across in the right spirit. I do find the strong claims in this post frustrating to see, but I welcome that you raised the topic.
A comment and then a question. One problem I've encountered in trying to explain ideas like this to a non-technical audience is that actually the standard rationales for 'why softmax' are either a) technical or b) not convincing or even condescending about its value as a decision-making approach. Indeed, the 'Agents as probabilistic programs' page you linked to introduces softmax as "People do not always choose the normatively rational actions. The softmax agent provides a simple, analytically tractable model of sub-optimal choice." The 'Softmax demystified' page offers relatively technical reasons (smoothing is good, flickering bad) and an unsupported claim (it is good to pick lower utility options some of the time). Implicitly this makes presentations of ideas like this have the flavor of "trust us, you should use this because it works in practice, even it has origins in what we think is irrational or that we can't justify". And, to be clear, I say that as someone who's on your side, trying to think of how to share these ideas with others. I think there is probably a link between what I've described above and Michael Plant's point (3).
So, I'm wonder if 'we can do better' in justifying softmax (and similar approaches). What is the most convincing argument you've seen?
I feel like the holy grail would be an empirical demonstration that an RL agent develops softmax like properties across a range of realistic environments. And/or a theoretical argument for why this should happen.
Yeah, it seems we do have a semantic difference here. But, how you're using 'raw impact units' makes sense to me.
Nice, clear examples! I feel inspired by them to sketch out what I think the "correct" approach would look like. With plenty of room for anyone to choose their own parameters.
Let's simplify things a bit. Say the first round is as described above and its purpose is to fund the organization to test its intervention. Then let's lump all future rounds together and say they total $14m and fund the implementation of the intervention if the tests are successful. That is, $14m of funding in the second round, assuming the tests are a success, produces 14m units of impact.
The 14m is what I would call the org's potential 'Gross Impact', with no adjustment for the counterfactual. We need to adjust for what would otherwise happen without the org to get its potential 'Enterprise Impact' (relative to the counterfactual).
For one, yes, the funders would have invested their money elsewhere. So, the org will only have a positive Enterprise Impact if it is more cost-effective than the funder's alternative. I think the 'generous-to-GiveWell option' is more extreme than it might appear at first glance. It's not only assuming that the funders would otherwise donate in line with GiveWell (GW). It's also assuming that they are somehow suckered into donating to this less effective org, despite being GW donors.
A more reasonable assumption, in my view, is that the org only gets funding if its cost-effectiveness is above the bar of its funders. It also seems likely to me that the org, if successful, will be able to attract funders that are not GW donors. There are plenty of funders with preferences that are not that aligned with cost-effectiveness. As long as these other reasons line up with this hypothetical org, then it could get non-GW funding. Indeed, in GW's model for Malaria Consortium it looks like they are assuming the Global Fund is 2-3x less effective than GW spending and that Domestic Governments are 6x-10x less effective. Furthermore, if the org is able to adopt a for-profit operating model, it could get commercial funding with relatively little impact in the counterfactual.
As an example, let's say GW top charities produce 1 unit of impact per dollar and the org's second round funders typically make grants that are 10x less effective than GW. The counterfactual impact of the funder's alternative grants would be 1.4 million units of impact. So, based on this consideration the potential Enterprise Impact = 14 million - 1.4 million = 12.6 million units of impact.
Another consideration is that if the org didn't exist, or even if it does, then another org may have developed that solves the same problem for the same beneficiaries. Let's say the probability of an alternative org replicating the Enterprise Impact is 21% (just an illustrative number). Then adjusting for this consideration makes the potential Enterprise Impact actually (1-21%) * 12.6 million = 10 million units of impact.
Next, we need to go from potential Enterprise Impact to expected Enterprise Impact. That is, we need to account for the probability the org is successful after the first round tests. Let's say 10% - a fairly standard early-stage success rate. That makes the expected Enterprise Impact equal to 1 million units.
Now we can look at the impact of GIF's funding. That is, how did their decision to fund the $1m in the first round change the expected Enterprise Impact?
This will depend on a combination of how much potential interest there was from other funders and how much the organization is able to scale with more funding (e.g., improving the statistical power of their test intervention, testing in more locations,...). At one extreme, all other funders may be non-cost effective donors and only be willing to participate if GIF led the round, in which case I'd say GIF's $1m enabled the entire $2m round. At the other extreme, it could be the org only really needed $1m and there were plenty of other funders willing to step in, in which case GIF's investor impact would be close to zero.
For this example, let's say there was some potential of other funding but it was far from guaranteed, that the typical cost-effectiveness of the other funders would be small, and there were some diminishing returns to scale in the round. Altogether, suppose this means that GIF's 50% of the round only has a contribution versus the counterfactual equal to 30% of the expected Enterprise Impact. That is, an Investor Impact of 300k units of impact.
To summarize the whole stream of calculations is (14m - 1.4m) * (1 - 0.21) * 10% * 30% = 300k. That is, 0.3 units of impact per dollar. Or, 0.3x GW (according to my illustrative assumptions).
Based on GIF's published methodology and Ken's comments here, I believe GIF's reported numbers for this example would be something like 14m * 10% * 50% = 700k. Or, 0.7x GW. Given they actually reported 3x GW, to calibrate this example to their reporting, I'd increase the value of the (14m * 10%) part by 4.3 = 3/0.7. This can be interpreted as the actual scale or duration of the orgs GIF funds being greater than 14m, or that their average probability of success is higher than 10%.
With the calibration change, my illustrative estimate of GIF's effectiveness would be 4.3 times my original value of 0.3x. That is, 1.3x GW.
The only "real" number here is the calibration to being 3x GW according to my version of GIF's calculations. The point of the 1.3x result is just to illustrate how I would adjust for the relevant counterfactuals. Relative to my version of GIF's calculation, my calculation includes a 71% = (14m - 1.4m)/ 14m * (1 - 0.21) multiplier, that translates Gross Impact into Enterprise Impact, and a 60% = 30% /50% multiplier, that gets us to the final Investor Impact. With these values, that I consider highly uncertain but plausible, the effectiveness of GIF would be above GW top charities.
For emphasis, I'm not claiming GIF is or is not achieving this effectiveness. I'm just seeking to illustrate that it is plausible. And, if someone were to do an independent analysis, I'd expect the results to shape up along the lines of the approach I've outlined here.