Open Philanthropy had a request for proposals for "benchmarking LLM agents on consequential real-world tasks".
At least two of the grants went to professors who are developing agents (advancing capabilities).
$1,045,620 grant
From https://www.openphilanthropy.org/grants/princeton-university-software-engineering-llm-benchmark/
Open Philanthropy recommended a grant of $1,045,620 to Princeton University to support a project to develop a benchmark for evaluating the performance of Large Language Model (LLM) agents in software engineering tasks, led by Assistant Professor Karthik Narasimhan.
From Karthik Narasimhan's LinkedIn: "My goal is to build intelligent agents that learn to handle the dynamics of the world through experience and existing human knowledge (ex. text). I am specifically interested in developing autonomous systems that can acquire language understanding through interaction with their environment while also utilizing textual knowledge to drive their decision making."
$547,452 grant
From https://www.openphilanthropy.org/grants/carnegie-mellon-university-benchmark-for-web-based-tasks/
Open Philanthropy recommended a grant of $547,452 to Carnegie Mellon University to support research led by Professor Graham Neubig to develop a benchmark for the performance of large language models conducting web-based tasks in the work of software engineers, managers, and accountants.
Graham Neubig is one of the co-founders of All Hands AI which is developing OpenDevin.
All Hands AI's mission is to build AI tools to help developers build software faster and better, and do it in the open.
Our flagship project is OpenDevin, an open-source software development agent that can autonomously solve software development tasks end-to-end.
Webinar
In the webinar when the RFP's were announced, Max Nadeau said (minute 19:00): "a lot of the time when you construct the benchmark you're going to put some effort into making the capable LLM agent that can actually demonstrate accurately what existing models are capable of, but for the most part we're imagining, for both our RFPs, the majority of the effort is spent on performing the measurement as opposed to like trying to increase performance on it".
They were already aware that these grants would fund the development of agents and addressed this concern in the same webinar (minute 21:55).
Thanks for your thorough comment, Owen.
And do the amounts ($1M and $0.5M) seem reasonable to you?
As a point of reference, Epoch AI is hiring a "Project Lead, Mathematics Reasoning Benchmark". This person will receive ~$100k for a 6-month contract.