Also available on LessWrong.
Preceded by: Encultured AI, Part 1: Enabling New Benchmarks
Followed by: Announcing Encultured AI
In the preceding post, we talked about how our plan with Encultured is to enable new existential safety benchmarks for AI. In this post, we'll talk about involving humans and human data in those benchmarks. Many of the types of benchmarks we want to enable are made more useful if we can involve humans in them. For example, testing whether an AI system can align its values with another agent is especially interesting if that other agent is a human being.
So, we want a way to get lots of humans engaging with our platform. At first, we thought we’d pay humans to engage with the platform and generate data. In considering this, we wanted to make the process of engagement not-too-annoying for people, both so that it wouldn’t make their lives worse, and so that we wouldn’t have to pay them too much to engage. But then we thought: why not go a bit further, and provide something people intrinsically value? I.e., why not provide a service?
Out of the gate, we thought: what’s a service where people might not mind lots of experiments happening? A few possibilities come to mind for what we could build:
- Agriculture solutions. Agriculture is relatively geopolitically stabilizing (or non-destabilizing) as an AI application area, because powerful nations don’t get especially nervous when they find out another superpower is getting better at agriculture (as opposed to, say, aerospace and defense). So, this seems like an area where we’d like to enable progress, including safety testing. However, we didn’t see great ways of engaging lots of human users in a tool for agriculture, so this area didn’t seem like a great source of data about human values, and we decided not to focus on it.
- A social media tool. Social media is ripe for lots of experiments with language models, which are exploding in popularity right now. However, this area wasn’t a good fit for us, mainly because the concepts we want our benchmarks to explore, such as soft embodiment, are not easily represented on social media today. The ‘metaverse’ will probably evolve to make this easier as time goes on, but we don’t want to wait for that.
- A therapy or coaching tool. This would involve a lot of sensitive data-handling, which might be worth the effort to manage, except that — like with social media — this area wouldn’t allow us to engage with safety testing for assisting physical entities (people!) in a physically embodied context.
- Education or tutoring software. This is an area where it feels hard to grow our user base in an “aligned” way; the people who pay for education (parents, states) are not the people who use it most (people aged 5 - 25). Also, progress in AI-based education is not obviously geopolitically stabilizing, because State A could view State B’s progress in it as an enabling mechanism for centralized propaganda. Lastly, education tools aren’t easily amenable (at the back-end) to enable our benchmark ideas for assisting physically embodied agents.
- A healthcare solution. Healthcare is an area we care deeply about. And, if we made products like prosthetics or other wearables, we’d be dealing directly with the wellbeing of real-world people, and grappling with many of the concepts we think are most important for AI safety/alignment benchmarking, including the assistance of physically embodied persons. Moreover, progress in AI solutions for healthcare is probably relatively geopolitically stabilizing, i.e., powerful countries aren’t particularly scared of each other getting better at healthcare. So, this area came close to being our top choice, except for the fact that bureaucracy around privacy laws make the healthcare industry a difficult data source (although groups like OpenMined are working hard to change this).
- A video game platform. Here’s where we got excited. Video games are a playground for all sorts of activities — including simulations of physically embedded agents. They aren’t particularly geopolitically destabilizing, they’re a great way to engage lots of people, and video game companies can be extremely well-aligned with delivering a positive experience for their users. Specifically, users pay for what they find fun, and don’t pay for what they don’t. Also, we find “fun theory” to be a pretty compelling perspective on how to explore and serve human values.
Service Area | Can Grow Safely?* | Good Source of Training Data? | Relatively Geopolitically Stabilizing? | Enables “physical assistance” benchmarks? |
Agriculture | Possibly | Not great | ✅ Yes | ✅ Yes |
Social Media | Possibly | ✅ Yes | Not especially | No |
Education | Harder | ✅ Yes | Not especially | No |
Therapy | Possibly | Tough† | Not clear to us | No |
Healthcare | ✅ Yes | Tough† | ✅ Yes | ✅ Yes |
Video Games | ✅ Yes | ✅ Yes | ✅ Can be | ✅ Yes (via simulation) |
* i.e., we think we can safely grow the company by following market incentives and still end up with something aligned with our goals.
† i.e., tough in today’s data privacy climate.
Followed by: Announcing Encultured AI