Hide table of contents

I recently graduated with a CS degree and have been freelancing in "content evaluation" while figuring out what’s next. The work involves paid tasks aimed at improving LLMs, which generally fall into a few categories:

  • Standard Evaluations: Comparing two AI-generated responses and assessing them based on criteria like truthfulness, instruction-following, verbosity, and overall quality. Some tasks also involve evaluating how well the model uses information from provided PDFs.
  • Coding Evaluations: Similar to standard evaluations but focused on code. These tasks involve checking responses for correctness, documentation quality, and performance issues.
  • "Safety"-Oriented Tasks: Reviewing potentially adversarial prompts and determining whether the model’s responses align with safety guidelines, such as refusing harmful requests like generating bomb instructions.
  • Conversational Evaluations: Engaging with the model directly, labeling parts of a conversation (e.g., summarization or open Q&A), and rating its responses based on simpler criteria than the other task types.

Recently, I have been questioning the ethics of this work. The models I work with are not cutting-edge, but improving them could still contribute to AI arms race dynamics. The platform is operated by Google, which might place more emphasis on safety compared to OpenAI, though I do not have enough information to be sure. Certain tasks, such as those aimed at helping models distinguish between harmful and benign responses, seem like they could be geared towards applying RLHF and are conceivably net-positive. Others, such as comparing model performance across a range of tasks, might be relevant to interpretability, but I am less certain about this.

Since I lean utilitarian, I have considered offsetting potential harm by donating part of my earnings to AI safety organizations. At the same time, if the work is harmful enough on balance, I would rather stop altogether. Another option would be to focus only on tasks that seem clearly safety-related or low-risk, though this would likely mean earning less, which could reduce prospective donations.

I have no idea how to construct a concrete gears-level picture of how (if at all) my work influences eventual transformative AI. I'm unsure about the extent to which refining current models' coding capabilities accelerates timelines, whether some tasks are possibly net-positive, whether these impacts are easily offset, etc. I'd also estimate that I'm in the 30th percentile of coding workers, which suggests the counterfactual is worse than me continuing that work. But it's difficult to work with the belief that I'm morally compelled to do a bad job.

 

Any thoughts in this area would be greatly appreciated.

10

0
0

Reactions

0
0
New Answer
New Comment

2 Answers sorted by

I think that replaceability is very high, so the counterfactual impact is minimal. But that said, there is very little possibility in my mind that even helping with RLHF for compliance with their "safety" guidelines is more beneficial for safety than for accelerating capabilities racing, so any impact is negative.

I don't know how ethical it is compared to zero, but compared to the most likely counterfactual (some other person, not aligned with EA and less concerned about AI risks getting this job) I think it's better if you do it.

Curated and popular this week
Relevant opportunities