Aidan Whitfield🔸

Researcher @ Giving What We Can

199 karmaJoined Nov 2024Working (0-5 years)

Message

Posts
1

Sorted by New

177

GWWC's 2024 evaluations of evaluators

Giving What We Can🔸

· 5mo ago · 7m read

Comments
14

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸4mo3

Hi Rosie, thanks for sharing your thoughts on this! It’s great to get the chance to clarify our decision-making process so it’s more transparent, in particular so readers can make their own judgement as to whether or not they agree with our reasoning about FP GHDF. Some one my thoughts on each of the points you raise:

We agree there is a positive track record for some of FP GHDF’s grants and this is one of the key countervailing considerations against our decision not to rely on FP GHDF in the report. Ultimately, we concluded that the instances of ‘hits’ we were aware of were not sufficient to conclude that we should rely on FP GHDF into the future. Some of our key reasons for this included:
1. These ‘hits’ seemed to fall into clusters for which we expect there is a limited supply of opportunities, e.g., several that went on to be supported by GiveWell were AIM-incubated charities. This means, we expect these opportunities to be less likely to be the kinds of opportunities that FP GHDF would fund on the margin with additional funding
2. We were not convinced that these successes would be replicated in the future under the new senior researcher (see our crux relating to consistency of the fund).
Ultimately, what we are trying to do is establish where the next dollar can be best spent by a donor. We agree it might not be worth it for a researcher to spend as much time on small grants, but this by itself should not be a justification for us to recommend small grants over large ones (agree point 3 can be a relevant consideration here though).
We agree that the relative value donors place on supporting early stage and riskier opportunities compared to more established orgs could be a crux here. However, we still needed a bar against which we could assess FP GHDF (i.e., we couldn’t have justifiably relied on FP GHDF on the basis of this difference in worldview, independent of the quality of FP GHDF’s grantmaking). As such, we tried to assess whether FP GHDF grant evaluations convincingly demonstrated that opportunities met their self-stated bar. As we have acknowledged in the report, just because we don’t think the grant evaluations convincingly show opportunities meet the bar, doesn’t mean they really don’t (e.g., the researcher may have considered information not included in the grant evaluation report). However, we can only assess on the basis of the information we reviewed.
Regarding our focus on the BOTECs potentially being misplaced, I want to be clear that we did review all of these grant evaluations in full, not just the BOTECs. If we thought the issues we identified in the BOTECs were sufficiently compensated for by reasoning included in the grant evaluations more generally this would have played a part in our decision-making. I think assessing how well the BOTECs demonstrate opportunities surpass Founders Pledge’s stated bar was a reasonable evaluation strategy because: a) As mentioned above, these BOTECs were highly decision relevant — grants were only made if BOTECs showed opportunities to surpass 10x GiveDirectly and we know of no instances where an opportunity scored above 10x GiveDirectly and would not have been eligible for FP GHDF funding. b) The BOTECs are where many of the researcher’s judgements are made explicit and so can be assessed. At least for the three evaluations we reviewed in detail, a significant fraction of the work in the grant evaluation was justifying inputs to the BOTECs. On the other point raised here, it is true that not all of the concerns we had with the BOTECs were errors. Some of our concerns related to inputs that seemed (to us) optimistic and were, in our view, insufficiently justified considering the decision-relevant effect they had on the overall BOTEC. While not errors, these made it more difficult for us to justifiably conclude that the FP GHDF grants were in expectation competitive with GiveWell.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸5mo3

Thanks for the comment! While we think it could be correct that the quality of evaluations differs between our recommendations in different cause areas, my view is that the evaluating evaluators project applies pressure to increase the strength of evaluations across all cause areas. In our evaluations we communicate areas where we think evaluators can improve. Because we are evaluating multiple options in each cause area, if in future evaluations we find one of our evaluators has improved and another has not, then the latter evaluator is less likely to be recommended in future, which provides an incentive for both evaluators to improve their processes over time.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸5mo3

Thanks for your comment, Huw! I think Michael has done a great job explaining GWWC’s position on this, but please let us know if we can offer any clarifications.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸5mo3

Thanks so much for your comment, Karolina! We are looking forward to re-evaluating AWF next year.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸5mo5

Thanks for the comment, Matt! We are very grateful for the transparent and constructive engagement we have received from you and Rosie throughout our evaluation process.

I did want to flag that you are correct in anticipating that we do not agree that with the three differences in perspectives that you note here nor do we think our approach implies we do agree:

1) We do not think a grant can only be identified as cost-effective in expectation if a lot of time is spent making an unbiased, precise estimate of cost-effectiveness. As mentioned in the report, we think a rougher approach to BOTECing intended to demonstrate opportunities meet a certain bar under conservative assumptions is consistent with a GWWC recommendation. When comparing the depth of GiveWell’s and FP’s BOTECs we explicitly address this:

[This difference] is also consistent with FP’s stated approach to creating conservative BOTECs with the minimum function of demonstrating opportunities to be robustly 10x GiveDirectly cost-effectiveness. As such, this did not negatively affect our view of the usefulness of FPs BOTECs for their evaluations.

Our concern is that, based on our three spot checks, it is not clear that FP GHDF BOTECs do demonstrate that marginal grants in expectation surpass 10x GiveDirectly under conservative assumptions.

2) We would not claim that CEAs should be the singular determinant of whether a grant is made. However, considering that CEAs seem decisive in GHDF grant decisions (in that grants are only made from the GHDF when they are shown by BOTEC to be >10x GiveDirectly in expectation), we think it is fair to assess these as important decision-making tools for the FP GHDF as we have done here.

3) We do not think maximising calculated EV in the case of each grant is the only way to maximise cost-effectiveness over the span of a grantmaking program. We agree some risk-neutral grantmaking strategies should be tolerant to some errors and ‘misses’, which is why we checked three grants, rather than only checking one. Even after finding issues in the first grant we were still open to relying on FP GHDF if these seemed likely to be only occurring to a limited extent, but in our view their frequency across the three grants we checked was too high to currently justify a recommendation.

I hope these clarifications make it clear that we do think evaluators other than GiveWell (including FP GHDF) could pass our bar, without requiring GiveWell levels of certainty about individual grants.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸5mo1

Thanks for your comment, Caroline! We are excited to continue hosting THL as a supported program on the GWWC platform, so donors can continue supporting your important work.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸5mo4

Thanks for the comment! I first want to highlight that in our report we are specifically talking about institutional diet change interventions that reduce animal product consumption by replacing institutional (e.g., school) meals containing animal products with meals that don’t. This approach, which constitutes the majority of diet change programs that ACE MG funds, doesn’t necessarily involve convincing individuals to make conscious changes to their consumption habits.

Our understanding of a common view among the experts we consulted is that diet change interventions are generally not competitive with promising welfare asks in terms of cost-effectiveness, but that some of the most promising institutional diet change interventions plausibly could be. For example, I think some of our experts would have considered the grant ACE MG made to the Plant-Based Universities campaign worth funding. Reasons for this include:

The organisers have a good track record
The ask is for a full transition to plant-based catering, which reduces small animal replacement concerns
The model involves training students to campaign, meaning the campaign can reach more universities than the organisation could by going school-to-school themselves

As noted in the report, not all experts agreed that the institutional diet change interventions were on average competitive with the welfare interventions ACE MG funded. However, as you noted, this probably has a fairly limited impact on how cost-effective ACE MG is on the margin, not least because these grants made up a small fraction of ACE MG’s 2024 funding.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸5mo4

Hi Vasco, thanks for the comment! I should clarify that we are saying that we expect marginal cost-effectiveness of impact-focused evaluators to change more slowly than marginal cost-effectiveness for charities. All else equal, we think size is plausibly a useful heuristic. However, because we are looking at the margin, both the program itself and its funding situation can change, and as THL hasn’t been evaluated for how it allocates funding on the margin or starts new programs, but just on the quality of its marginal programs at the time of evaluation, there is a less robust signal there than there is for EA AWF, which we did evaluate on the basis of how it allocates funding on the margin. I hope that makes sense!

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸5mo3

Thanks for the comment — we appreciate the suggestions!

With respect to your first suggestion, I want to clarify that our goal with this project is to identify evaluators that recommend among the most cost-effective opportunities in each cause area according to a sufficiently plausible worldview. This means among our recommendations we don’t have a view about which is more cost-effective, and we don’t try to rank the evaluators that we don’t choose to rely on. That said, I can think of two resources that might somewhat address your suggestion:

This section on our 2024 evaluating evaluators page explains which programs have changed status following our 2024 evaluations and why
In the other supported programs section of our donation platform, we roughly order the programs based on our preliminary impression of which might be most interesting to impact-focused donors in each cause area. To do this we take into account factors like if we’ve previously recommended them and if they are currently recommended by an impact-focused evaluator.

With respect to your second suggestion, while we don’t include a checklist as such, we try to include the major areas for improvement in the conclusion section of each report. In future we might consider organising these more clearly and making them more prominent.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸5mo4

Thanks for the comment! We didn’t look deeply into the SADs framework as part of our evaluation, as we didn’t think this would have been likely to change our final decision. It is possible we will look into this more in future evaluations. I currently expect use of this framework to be substantially preferable to a status quo where there is not a set of conceptually meaningful units for comparing animal welfare interventions.

On ACE’s additional funds discounts, our understanding is that the 0% discount for low uncertainty is not a typo. ACE lists their uncertainty categories and the corresponding discounts under Criterion 2 on their evaluation criteria page.

Aidan Whitfield🔸

Posts 1

Comments14

Posts
1

Comments
14