T

tae 🔸

345 karmaJoined Working (0-5 years)

Comments
47

Please accept my delayed gratitude for the comprehensive response! The conversation continues with my colleagues. The original paper, plus this response, have become pretty central to my thinking about alignment.

How about reducing the number of catered meals while increasing support for meals outside the venue? Silly example: someone could fill a hotel room with Soylent so that everyone can grab liquid meals and go chat somewhere--sort of a "baguettes and hummus" vibe. Or as @Matt_Sharp pointed out, we could reserve nearby restaurants. No idea if these exact plans are feasible, but I can imagine similarly scrappy solutions going well if planned by actual logistics experts.

Thanks so much for your work and this information!

I'm having an ongoing discussion with a couple professors and a PhD candidate in AI about "The Alignment Problem from a Deep Learning Perspective" by @richard_ngo, @Lawrence Chan, and @SoerenMind. They are skeptical of "3.2 Planning Towards Internally-Represented Goals," "3.3 Learning Misaligned Goals," and "4.2 Goals Which Motivate Power-Seeking Would Be Reinforced During Training". Here's my understanding of some of their questions:

  1. The argument for power-seeking during deployment depends on the model being able to detect the change from the training to deployment distribution. Wouldn't this require keeping track of the distribution thus far, which would require memory of some sort, which is very difficult to implement in the SSL+RLHF paradigm?
  2. What is the status of the model after the SSL stage of training? 
    1. How robust could its goals be?
    2. Would a model be able to know:
      1. what misbehavior during RLHF fine-tuning would look like?
      2. that it would be able to better achieve its goals by avoiding misbehavior during fine-tuning?
    3. Why would a model want to preserve its weights? (Sure, instrumental convergence and all, but what's the exact mechanism here?)
  3. To what extent would all these phenomena (situationally-aware reward hacking, misaligned internally-represented goals, and power-seeking behaviors) show up in current LLMs (say, GPT-4) vs. current agentic LLM-based systems (say, AutoGPT) vs. different future systems?
    1. Do we get any evidence for these arguments from the fact that existing LLMs can adopt goal-directed personas?

I'm guessing this has been discussed in the animal welfare movement somewhere

Yep, The Sexual Politics of Meat by Carol J. Adams is the classic I'm aware of.

I recorded the rough audio and passed it along to the audio editor, but I haven’t heard back since then :(

Hi! I relate so much to you. I'm seven years older than you and I'm pretty happy with how my life is going, so although I'm no wise old sage, I think I can share some good advice.

I've also been involved in EA, Buddhism, veganism, minimalism, sustainable fashion, etc. from a young age, plus I was part of an Orthodox Christian community as a teenager (as I assume you are in Greece). 

So, here's my main advice. 

The philosophies of EA, Buddhism, etc. are really really morally demanding. Working from the basic principles of these philosophies, it is difficult to find reasons to prioritize your own wellbeing; there are only pragmatic reasons such as "devote time and money to your own health so that you can work more effectively to help others". Therefore, if you predominantly engage in these communities through the philosophy, you will be exhausted. 

So, instead of going down internet rabbit holes and reading serious books, engage with the people in these communities. Actual EAs goof around at parties and write stories. Actual Buddhists have silly arguments at nice restaurants and go on long treks through the mountains. While good philosophies are optimized to be hard to argue with, good communities are optimized to be healthy and sustainable.

I'm guessing you don't have strong EA and Buddhist communities near you, though. Same here. In that case, primarily engage in other communities instead. When I was your age (ha that sounds ridiculous), I was deeply involved in choir. Would highly recommend! Having fun is so important to balance out the philosophies that can consume your life if you let them. 

In non-EA non-Buddhist communities, it might feel like you're the only one who takes morality seriously, and that can be lonely. Personally, I gravitate toward devout religious friends, because they're also trying to confront selfishness. Just make sure that you don't go into depressing rabbit holes together.

Of course, there are nice virtual EA and Buddhist communities too. They can't fully replace in-person communities, though. Also, people in virtual communities are more likely to only show their morally intense side. 

I hope this helps! You're very welcome to DM me about anything. I'll DM you first to get the conversation going.

P. S. You've got soooo much time to think about monasticism, so there's no reason to be concerned about the ethics of it for now, especially since the world could change so much by the time we retire! Still, just for the philosophical interest of it, I'm happy to chat about Buddhist monasticism if you like. Having lived at a monastery for several months and written my undergrad thesis on a monastic text, I've got some thoughts :)

General information about people in low-HDI countries to humanize them in the eyes of the viewer.

Similar for animals (except not “humanizing” per se!). Spreading awareness that e.g. pigs act like dogs may be a strong catalyst for caring about animal welfare. Would need to consult an animal welfare activism expert.

My premise here: it is valuable for EAs to viscerally care about others (in addition to cleverly working toward a future that sounds neat).

I'll just continue my anecdote! As it happens, the #1 concern that my friend has about EA is that EAs work sinisterly hard to convince people to accept the narrow-minded longtermist agenda. So, the frequency of ads itself increases his skepticism of the integrity of the movement. (Another manifestation of this pattern is that many AI safety researchers see AI ethics researchers as straight-up wrong about what matters in the broader field of AI, and therefore need to be convinced rather than collaborated with.)

(Edit: the above paragraph is an anecdote, and I'm speaking generally in the following paragraphs)

I think it is quite fair for someone with EA tendencies, who is just hearing of EA for the first time through these ads, to form a skeptical first impression of a group that invests heavily in selling an unintuitive worldview. 

I strongly agree that it's a good sign if a person investigates such things instead of writing them off immediately, indicating a willingness to take unusual ideas seriously. However, the mental habit of openness/curiosity is also unusual and is often developed through EA involvement; we can't expect everyone to come in with full-fledged EA virtues.

Sure! Thank you very much for your, ahem, forethought about this complicated task. Please pardon the naive post about a topic that you all have worked hard on already :)

Load more