Longtermist implications of aliens Space-Faring Civilizations - Introduction

Maxime Riché

Over the last few years, progress has been made in estimating the density of intelligent life in the universe (e.g., Olson 2015, Sandberg 2018, Hanson 2021). Bits of progress have been made in using these results to update longtermist macrostrategy, but these results are partial and stopped short of their potential (Finnveden 2019, Olson 2020, Olson 2021, Cook 2022). Namely, this work stopped early in its tracks, at best, only hinting at the meaty part of the implications and leaving half of the work almost untouched: comparing the expected utility produced by different Space-Faring Civilizations (SFCs). In this post, we hint at the possible macrostrategic implications of these works: A possible switch for the longtermist community from targeting decreasing X-Risks (including increasing P(Alignment)^[1]), to increasing P(Alignment | Humanity creates an SFC).

Sequence: This post is part 1 of a sequence investigating the longtermist implications of alien Space-Faring Civilizations. Each post aims to be standalone.

Summary

We define two hypotheses:

Civ-Saturation Hypothesis: Most resources will be claimed by Space-Faring Civilizations (SFCs) regardless of whether humanity creates an SFC^[2].
Civ-Similarity Hypothesis: Humanity's Space-Faring Civilization would produce utility^[3] similar to other SFCs.

If these hypotheses hold, this could shift longtermist priorities away from reducing pure extinction risks and toward specifically optimizing P(Alignment | Humanity creates an SFC)^[1]. This means that rather than focusing broadly on preventing misaligned AI and extinction, longtermists might need to prioritize strategies that specifically increase the probability of alignment conditional on humanity creating an SFC. Macrostrategy updates include the following:

(i) Deprioritizing significantly extinction risks, such as nuclear weapon and bioweapon risks.
(ii) Deprioritizing to some degree AI Safety agendas mostly increasing P(Humanity creates an SFC) but not increasing much P(Alignment | Humanity creates an SFC).
(iii) Giving more weight to previously neglected AI Safety agendas. E.g., a "Plan B AI Safety" agenda that would focus on decreasing P(Humanity creates an SFC | Misalignment), for example, by implementing (active & corrigible) preferences against space colonization in early AI systems.

The Civ-Saturation Hypothesis

Will Humanity's SFC grab marginal resources? The Civ-Saturation Hypothesis posits that when making decisions, we should assume most of Humanity’s SFC future resources will eventually be grabbed by SFCs regardless of whether Humanity's SFC exists or not.

Plausibly low marginal resources under EDT. The validity of this hypothesis can be studied using models estimating the frequency of Space-Faring Civilizations (SFCs) in the universe (Sandberg 2018, Finnveden 2019, Olson 2020, Hanson 2021, Snyder-Beattie 2021, Cook 2022). The validity will also depend on which decision theory we use and on our beliefs behind these. As soon as we put some credence on evidential decision theories and on our actions being correlated with those of our exact copies^[4], we may have to put significant weight on the Civ-Saturation Hypothesis. We will produce a first quantitative evaluation of this hypothesis in a later post.

Hinting at longtermist macrostrategic implications

What is the impact of human ancestry on SFC's expected utility? For simplicity, let’s assume the Civ-Saturation Hypothesis is 100% true. How much counterfactual value Humanity creates then depends entirely on the utility Humanity’s SFC creates relative to all SFCs. Are SFCs going to create more or less utility per unit of resources than Humanity’s SFC? I.e., how different are U(SFC) and U(SFC| Human-ancestry)? Little progress has been made on this question. For reference, see quotes from (Finnveden, 2019)^[5], (Brian Tomasik, 2015)^[6], (Brauner and Grosse-Holz, 2019)^[7], (Anthony DiGiovanni, 2021)^[8] in footnotes. Most discussions stop after a few of the following arguments.

Under moral anti-realism, humans are more likely to be higher utility since we should expect a lower level of convergence between moral values, and we are more likely to carry out our own precise values.
There may be convergence among goals, especially between ancestors of SFCs.
Human moral values may depend on the biological structure of our brain or on contingent cultural features.
It is plausible that humans are more compassionate than other Intelligent Civilizations since humans are somewhat abnormally compassionate among Earthly animals.
Human's SFC may cause more suffering than other SFCs because we are sentient, and conflicts may target our values.
Human's SFC may create more utility because we are sentient and are more likely to create a sentient SFC.
Humans are likely biased towards valuing themselves.

For clarity, I am not endorsing these arguments. I am listing arguments found in existing discussions.

No existing work directly studies this precise question in depth. Some related work exists but mostly looks at the moral values of alien or alien SFCs, much more rarely, at those of SFC’s alien ancestors, and not at the relative expected utility between Humanity’s SFC and other SFCs. I will introduce novel object-level arguments about this question in a later post.

A priori, Humanity's SFC expected utility is not special. For now, let’s assume we know nothing about how conditioning on Human-ancestry impacts the utility produced by an SFC, then U(SFC) ~ U(SFC | Human-ancestry). This assumption is similar to using the Principle of Mediocrity. What would be the macrostrategic longtermist implications in that case?

Reducing pure extinction risks is much less valuable. Increasing P(Humanity creates an SFC) has much less longtermist value, and Nuclear and Bio X-risk reduction agendas would have a reduced priority. Though their neartermist justifications would remain.
Longtermists should optimize P(Alignment | Humanity creates a SFC). Concerning AI Safety, from the point of view of impartial longtermists^[9], increasing P(Alignment | Humanity creates an SFC) would replace the currently commonly used target of increasing P(Alignment AND Humanity creates an SFC). Longtermist AI Safety agendas would need to be re-evaluated using this new target.
- Some existing AI Safety agendas may increase P(Alignment AND Humanity creates an SFC) while at the same time not increasing as much or even, if unlucky, reducing P(Alignment | Humanity creates an SFC). For example, such agendas may significantly prevent early AIs and AI usages from destroying, at the same time, the potential of Humanity and AIs.
- Other currently neglected agendas may increase P(Alignment | Humanity creates an SFC) while not increasing P(Alignment AND Humanity creates an SFC). Those include agendas aiming at decreasing P(Humanity creates an SFC | Misalignment). An example of intervention in such an agenda is overriding instrumental goals for space colonization and replacing them with an active desire not to colonize space. This defensive preference could be removed later, conditional on achieving corrigibility.

The Civ-Similarity Hypothesis

Is Human ancestry neutral, positive, or negative? The implications hinted above are only plausible if U(SFC) ~ U(SFC | Human-ancestry). We formulate this requirement as a hypothesis. The Civ-Similarity Hypothesis posits that the expected utility efficiency, per unit of resources, of Humanity's future SFC is similar to that of other SFCs.

How could this hypothesis be valid? There are two main components contributing in that direction:

High uncertainty about the future may flatten expected utilities. We may not know enough about how conditioning on Human (or others) ancestors impacts the value of the long-term future produced by an SFC.
SFCs are rare, and creating them may be very constrained, AKA convergent evolution and strong selection. We may observe that selection mechanisms and convergent evolutionary processes drastically reduce the space of possible characteristics an SFC’s ancestors can have.

How could this hypothesis be invalid?

We may know enough to predict significant differences in expected utilities. We may already have enough information to say that Humanity's SFC will be abnormal in some specific ways relative to other SFCs. If, additionally, we are confident in how these abnormalities impact the long-term utility of Humanity's SFC, then we should be able to conclude that our future SFC is significantly higher or lower utility than other SFCs.
We may only care about our precise values, and we may succeed at aligning our future SFC. We may consider that only our own precise values are valuable (e.g., no moral uncertainty). Additionally, if the distribution of alien moral values is much more diffuse than that of humans, even after conditioning on ancestors creating first an SFC. And if, finally, we are confident enough in how our values impact the long-term utility produced by SFCs (e.g., we think we will succeed at alignment). Then, we should conclude that the hypothesis is invalid.

In later posts, we will look deeper into evaluating the Civ-Similarity Hypothesis and the tractability of making further progress there. We will see that a lot can be said regarding this hypothesis.

The Existence Neutrality Hypothesis

A third hypothesis as the conjunction of the previous two. This third and last hypothesis is simply the conjunction of the first two hypotheses. The Existence Neutrality Hypothesis posits that influencing Humanity's chance at creating an SFC produces little value compared to increasing the quality of the SFC we would eventually create conditional on doing so. Let's note that this hypothesis somewhat contradicts Nick Bostrom's astronomical waste argument.

Whispers of plausible importance. A few discussions about the implications (from the existence of alien SFCs, including the Existence Neutrality Hypothesis) are already available online but, to my knowledge, never led to a proper assessment of these questions. For reference, in the footnotes, you can find relevant quotes from (Brian Tomasik 2015)^[10], (Jan M. Brauner and Friederike M. Grosse-Holz, 2018)^[11], (Anthony DiGiovanni, 2021)^[8], (Maxwell Tabarrok, 2022)^[12], (MacAskill, 2023)^[13], (Toby Ord's answer to MacAskill 2023)^[14], (Jim Buhler, 2023)^[15], (Magnus Vinding 2024)^[16].

Context

Evaluating the Neutrality Hypothesis - Introductory Series. This post is part of a series introducing a research project for which I am seeking funding: Evaluating the Neutrality Hypothesis. This project includes evaluating both the Civ-Saturation and the Civ-Similarity Hypotheses and their longtermist macrostrategic implications. This introductory series hints at preliminary research results and looks at the tractability of making further progress in evaluating these hypotheses.

Next: A first evaluation of the Civ-Saturation Hypothesis. Over the next few posts, we will introduce a first evaluation of the Civ-Saturation Hypothesis. Starting by reviewing existing SFC density estimates and models producing them and clarifying the meaning and impact of Civ-Saturation on which possible world we should bet on.

Plan of the sequence

(Introduction)

(1) Longtermist implications of aliens Space-Faring Civilizations - Introduction

(A first pass at evaluating the Civ-Saturation Hypothesis)

(2) Space-Faring Civilization density estimates and models - Review
(3) Decision-Relevance of worlds and ADT implementations
(4) Formalizing Civ-Saturation concepts and metrics
(5) Should we bet on worlds saturated with Space-Faring Civilizations? - A first-pass evaluation

(Objects-level arguments about the Civ-Similarity Hypothesis and its tractability)

(6) Selection Pressures on Space-Faring Civilization Shapers - Preliminary Insights
(7) High-level reasons for optimism in studying the Existence Neutrality Hypothesis

(Introducing the research project & implications)

(8) Evaluating the Existence Neutrality Hypothesis - A research project
(9) Macrostrategic Implications of the Existence Neutrality Hypothesis

Acknowledgments

Thanks to Tristan Cook, Magnus Vinding, Miles Kodama, and Justis Mills for their excellent feedback on this post and ideas. Note that this research was done under my personal name and that this content is not meant to represent any organization's stance.

^{^}
By increasing P(Alignment), I mean increasing the probability that the SFC Humanity would create is aligned with some kind of ideal moral value (e.g., CEV), and has the ability to optimize it strongly. This requires some degree of success at both technical alignment and AI governance.
^{^}
The hypothesis is specifically about what we should bet on when we are making decisions. Its extended version is: When making decisions, we should bet on the fact that most resources will be claimed by Space-Faring Civilizations (SFCs) regardless of whether humanity creates an SFC
^{^}
Expected utility per unit of resource grabbed.
^{^}
Exact copies are the group of agents that are exactly equivalent to you, the position of all the particles composing them is identical to the positions in you. They are perfect copies of you living in different parts of the world (e.g. multiverse).
^{^}
Quote: “How much one should value Earth-originating and alien civilisations is very unclear. If you accept moral anti-realism, one reason to expect aliens to be less valuable than Earth-originating civilisations is that humans are more likely to share your values, since you are a human. However, there might be some convergence among goals, so it’s unclear how strong this effect is.” (Finnveden 2019)
^{^}
Quote: “If we knew for certain that ETs would colonize our region of the universe if Earth-originating intelligence did not, then the question of whether humans should try to colonize space becomes less obvious. As noted above, it's plausible that humans are more compassionate than a random ET civilization would be. On the other hand, human-inspired computations might also entail more of what we consider to count as suffering because the mind architectures of the agents involved would be more familiar. And having more agents in competition for our future light cone might lead to dangerous outcomes.” (Brian Tomasik 2015)
^{^}
Quote: "We may however assume that our reflected preferences depend on some aspects of being human, such as human culture or the biological structure of the human brain^fn-48. Thus, our reflected preferences likely overlap more with a (post-)human civilization than alternative civilizations. As future agents will have powerful tools to shape the world according to their preferences, we should prefer (post-)human space colonization over space colonization by an alternative civilization." (Jan M. Brauner and Friederike M. Grosse-Holz, 2019)
^{^}
Quote: "Arguments on this point will very likely not be robust; on any side of the debate, we are left with speculation, as our data consists of only one sample from the distribution of potentially space-colonizing species (i.e., ourselves).^[51] On the side of optimism about humans relative to aliens, our species has historically displayed a capacity to extend moral consideration from tribes to other humans more broadly, and partly to other animals. Pessimistic lines of evidence include the exponential growth of factory farming, genocides of the 19th and 20th centuries, and humans’ unique degree of proactive aggression among primates (Wrangham, 2019).^[52] Our great uncertainty arguably warrants focusing on increasing the quality of future lives conditional on their existence, rather than influencing the probability of extinction in either direction.
It does seem plausible that, by evolutionary forces, biological nonhumans would care about the proliferation of sentient life about as much as humans do, with all the risks of great suffering that entails. To the extent that impartial altruism is a byproduct of cooperative tendencies that were naturally selected (rather than “spandrels”), and of rational reflection, these beings plausibly would care about as much as humans do about reducing suffering. If, as suggested by work such as that of Henrich (2020), impartial values are largely culturally contingent, this argument does not provide a substantial update against +ERR if our prior view was that impartiality is an inevitable consequence of philosophical progress.^[53] On the other hand, these cultures that tend to produce impartial values may themselves arise from convergent economic factors.^[54] Brauner and Grosse-Holz’s mathematical model also acknowledges the following piece of weak evidence against +ERR in this respect: intelligent beings with values orthogonal to most humans’ (or most philosophically deliberative humans’) would tend not only to create less value in the future, but also less disvalue. Given the arguments in section 2.2 for the simplicity of disvalue, however, this difference may not be large." (Anthony DiGiovanni, 2021)
^{^}
More precisely, from the point of view of impartial longtermists who also, at least, care for the impact of their exact copies (or believe in stronger forms of EDT).
^{^}
Quote: "If another species took over and built a space-faring civilization, would it be better or worse than our own? There's some chance it could be more compassionate, such as if bonobos took our place. But it might also be much less compassionate, such as if chimpanzees had won the evolutionary race, not to mention killer whales. On balance it's plausible our hypothetical replacements would be less compassionate, because compassion is something humans value a lot, while a random other species probably values something else more. The reason I'm asking this question in the first place is because humans are outliers in their degree of compassion. Still, in social animals, various norms of fair play are likely to emerge regardless of how intrinsically caring the species is. Simon Knutsson pointed out to me that if human survivors do recover from a near-extinction-level catastrophe, or if humans go extinct and another species with potential to colonize space evolves, they'll likely need to be able to cooperate rather than fighting endlessly if they are to succeed in colonizing space. This suggests that if they colonize space, they will be more moral or peaceful than we were. My reply is that while this is possible, a rebuilding civilization or new species might curb infighting via authoritarian power structures or strong ingroup loyalty that doesn't extend to outgroups, which might imply less compassion than present-day humans have." (Brian Tomasik 2015)
^{^}
Quote: "If humanity goes extinct without colonizing space, some kind of other beings would likely survive on earth^fn-47. These beings might evolve into a non-human technological civilization in the hundreds of millions of years left on earth and eventually colonize space. Similarly, extraterrestrials (that might already exist or come into existence in the future) might colonize (more of) our corner of the universe, if humanity does not.
In these cases, we must ask whether we prefer (post-)human space colonization over the alternatives. Whether alternative civilizations would be more or less compassionate or cooperative than humans, we can only guess. We may however assume that our reflected preferences depend on some aspects of being human, such as human culture or the biological structure of the human brain^fn-48. Thus, our reflected preferences likely overlap more with a (post-)human civilization than alternative civilizations. As future agents will have powerful tools to shape the world according to their preferences, we should prefer (post-)human space colonization over space colonization by an alternative civilization." (Jan M. Brauner and Friederike M. Grosse-Holz, 2018)
^{^}
Quote: "The base rate of formation of intelligent or morally valuable life on earth and in the universe is an essential but unknown parameter for EA Longtermist philosophy. Longtermism currently assumes that this rate is very low which is fair given the lack of evidence. If we find evidence that this rate is higher, then wide moral circle Longtermists should shift their efforts from shielding humanity from as much existential risk as possible, to maximizing expected value by taking higher volatility paths into the future." (Maxwell Tabarrok, 2022)
^{^}
Quote: "I think one could reasonably hold, for example, that the probability of a technologically-capable species evolving, if Homo sapiens goes extinct, is 90%, that non-Earth-originating alien civilisations settling the solar systems that we would ultimately settle is also 90%, and that such civilisations would have similar value to human-originating civilisation.
(They also change how you should think about longterm impact. If alien civilisations will settle the Milky Way (etc) anyway, then preventing human extinction is actually about changing how interstellar resources are used, not whether they are used at all.)
And I think it means we miss out on some potentially important ways of improving the future. For example, consider scenarios where we fail on alignment. There is no “humanity”, but we can still make the future better or worse. A misaligned AI system that promotes suffering (or promotes something that involves a lot of suffering) is a lot worse than an AI system that promotes something valueless. " (MacAskill 2023)
^{^}
Quote: "You are right that the presence or absence of alien civilisations (especially those that expand to settle very large regions) can change things. I didn't address this explicitly because (1) I think it is more likely that we are alone in the affectable universe, and (2) there are many different possible dynamics for multiple interacting civilisations and it is not clear what is the best model. But it is still quite a plausible possibility and some of the possible dynamics are likely enough and simple enough that they are worth analysing." (Toby Ord's answer to MacAskill 2023)
^{^}
Quote: "Hanson (2021) and Cook (2022) estimate that we should expect to eventually “meet” (grabby) alien AGIs/civilizations – just AGIs, from here on – if humanity expands, and that our corner of the universe will eventually be colonized by aliens if humanity doesn’t expand.
This raises the following three crucial questions:
1. What would happen once/if our respective AGIs meet? Values handshakes (i.e., cooperation) or conflict? Of what forms?
2. Do we have good reasons to think the scenario where our corner of the universe is colonized by humanity is better than that where it is colonized by aliens? Should we update on the importance of reducing existential risks?^[1]
3. Considering the fact that aliens might fill our corner of the universe with things we (dis)value, does humanity have an (inter-civilizational) comparative advantage in focusing on something the grabby aliens will neglect?" (Jim Buhler, 2023)
^{^}
Quote: "Impartial AI safety would plausibly give strong consideration to our potential impact on other cosmic agents, whereas AI safety that exclusively prioritizes, say, human survival or human suffering reduction would probably not give it strong consideration, if indeed any consideration at all. So the further we diverge from ideals of impartiality in our practical focus, the more likely we may be to neglect our potential impact on other cosmic agents." (Magnus Vinding 2024)

43 Reactions

More posts like this

Comments12

Sorted by

New & upvoted

Click to highlight new comments since: Today at 8:21 AM

JordanStoneFeb 223

The validity of this hypothesis can be studied using models estimating the frequency of Space-Faring Civilizations (SFCs) in the universe (Sandberg 2018, Finnveden 2019, Olson 2020, Hanson 2021, Snyder-Beattie 2021, Cook 2022). The validity will also depend on which decision theory we use and on our beliefs behind these

I'm very speculative about making moral decisions concerning the donations of potentially millions of dollars based on something so speculative. I think it's too far down the EA crazy train to prioritise different causes based on the density of alien civilisations. It's probably more speculative than the simulation hypothesis (which, if true, significantly increases the likelihood that you are the only sentient being in this universe), but we don't make moral decisions based on that.

I get that there's been a lot of work on this and that we can make progress on it (I know, I'm an astrobiologist), but I'm sure there are so many unknown unknowns associated with the origin of life, development of sentience, and spacefaring civilisation that we just aren't there yet. The universe is so enormous and bonkers and our brains are so small - we can make numerical estimates sure, but creating a number doesn't necessarily mean we have more certainty.

How much counterfactual value Humanity creates then depends entirely on the utility Humanity’s spacefaring civilisation creates relative to all spacefaring civilisations.

I've got a big moral circle (all sentient beings and their descendants), but it does not extend to aliens because of cluelessness.

I think you're posing a post-understanding of consciousness question. Consciousness might be very special or it might be an emergent property of anything that synthesises information, we just don't know. But it's possible to imagine aliens with complex behaviour similar to us, but without evolving the consciousness aspect, like superintelligent AI probably will be like. For now, the safe assumption is that we're the only conscious life, and I think it's very important that we act like it until proven otherwise.

So for now, I'm quite confident that if we're thinking about the moral utility of spacefaring civilisation, we should at least limit our scope to our own civilisation, more specifically, our own sentience and its descendants (I personally prefer to limit that scope even further to the next few thousand years, or just our Solar System to reduce the ambiguity a bit - longtermism still stands strong with this huge limitation). I think the main value in looking into the potential density of aliens in the universe helps figure out what our own future might look like. Even if humans only colonise the Solar System because alien SFCs colonise the galaxy, that's still 10^27 potential future lives (1.2 sextillion over the next 6000 years; future life equivalents based on the Solar System's carrying capacity; as opposed to 100 trillion if we stay on Earth till its destruction). We can control and predict that to an extent, and there's enough ambiguity and cluelessness already associated with how to make human civilisation's future in space good in the context of AI - but we can at least make some concrete decisions (e.g. work by Simon Institute & CLR).

Very interesting post though! Lots to think about and I can see that this could be the most important moral consideration... maybe... I look forward to your series and I definitely think it's worthwhile to try and figure out what that consideration might be.

Anthony DiGiovanniFeb 229

I've got a big moral circle (all sentient beings and their descendants), but it does not extend to aliens because of cluelessness.
...
I'm quite confident that if we're thinking about the moral utility of spacefaring civilisation, we should at least limit our scope to our own civilisation

I agree that the particular guesses we make about aliens will be very speculative/arbitrary. But "we shouldn't take the action recommended by our precise 'best guess' about XYZ" does not imply "we can set the expected contribution of XYZ to the value of our interventions to 0". I think if you buy cluelessness — in particular, the indeterminate beliefs framing on cluelessness — the lesson you should take from Maxime's post is that we simply aren't justified in saying any intervention with effects on x-risk is net-positive or net-negative (w.r.t. total welfare of sentient beings).

Maxime RichéFeb 22*2

I somewhat agree with your points. Here are some contributions, and pushbacks:

I get that there's been a lot of work on this and that we can make progress on it (I know, I'm an astrobiologist), but I'm sure there are so many unknown unknowns associated with the origin of life, development of sentience, and spacefaring civilisation that we just aren't there yet. The universe is so enormous and bonkers and our brains are so small - we can make numerical estimates sure, but creating a number doesn't necessarily mean we have more certainty.

Something interesting about these hypotheses and implications is that they get stronger the more uncertainty we are, as long as one uses some form of EDT (e.g., CDT + exact copies). The less we know about how conditioning on Humanity ancestry impacts utility production, the more the Civ-Similarity Hypothesis is close to correct. The broader our distribution over the density of SFC in the universe, the more the Civ-Saturation Hypothesis is close to correct. This seems true as long as you account for the impact of correlated agents (e.g., exact copies) and that they exist. For the Civ-Similarity Hypothesis, this comes from the application of the Mediocrity Principle. For the Civ-Saturation Hypothesis, this comes from the fact that we have orders of magnitude more exact copies in saturated worlds than in empty worlds.

I think you're posing a post-understanding of consciousness question. Consciousness might be very special or it might be an emergent property of anything that synthesises information, we just don't know. But it's possible to imagine aliens with complex behaviour similar to us, but without evolving the consciousness aspect, like superintelligent AI probably will be like. For now, the safe assumption is that we're the only conscious life, and I think it's very important that we act like it until proven otherwise.

Consciousness is indeed one of the arguments pushing the Civ-Similarity Hypothesis toward lower values (humanity being more important), and I am eager to discuss its potential impact. Here are several reasons why the update from consciousness may not be that large:

Consciousness may not be binary, in that case, we don't know if humans are low, medium, or high consciousness, I only know that I am not at zero. We should then likely assume we are average. Then, the relevant comparison is no longer between P(humanity is "conscious") and P(aliens creating SFCs are "conscious") but between P(humanity's consciousness > 0) and P(aliens-creating-SFC's consciousness > 0)
If human consciousness is a random fluke and has no impact on behavior (or it could be selected in or out), then we have no reason to think that aliens will create more or less conscious descendants than us. Consciousness needs to have a significant impact on behavior to change the chance that (artificial) descendants are conscious. But the larger the effect of consciousness on behaviors, the more likely consciousness is to be a result of evolution/selection.
We don't understand much about how the consciousness of SFC creators would influence the consciousness of (artificial) SFC descendants. Even if Humans are abnormal in being conscious, it is very uncertain how much that changes how likely our (artificial) descendants are to be conscious.

I am very happy to get pushback and to debate the strength of the "consciousness argument" on Humanity's expected utility.

JordanStoneFeb 221

Thanks for your reply, lots of interesting points :)

Consciousness may not be binary, in that case, we don't know if humans are low, medium, or high consciousness, I only know that I am not at zero. We should then likely assume we are average. Then, the relevant comparison is no longer between P(humanity is "conscious") and P(aliens creating SFCs are "conscious") but between P(humanity's consciousness > 0) and P(aliens-creating-SFC's consciousness > 0)

I particularly appreciate that reframing of consciousness. I think it's probably both binary and continuous though. Binary in the sense that you need a "machinery" that's capable of producing consciousness i.e. neurons in a brain seem to work. And then if you have that capable machinery, you then have the range from low to high consciousness, like we see on Earth. If intelligence is related to consciousness level as it seems to be on Earth, then I would expect that any alien with "capable machinery" that's intelligent enough to become spacefaring would have consciousness high enough to satisfy my worries (though not necessarily at the top of the range).

So then any alien civilisation would either be "conscious enough" or "not conscious at all", conditional on (a) the machinery of life being binary in its ability to produce a scale of consciousness and (b) consciousness being correlated with intelligence.

So I'm not betting on it. The stakes are so high (a universe devoid of sentience) that I would have to meet and test the consciousness of aliens with a 'perfect' theory of consciousness before I updated any strategy towards reducing P(ancestral-human SFC) even if there's an extremely high probability of Civ-Similarity Hypothesis being true.

David Mathers🔸Feb 212

"Some existing AI Safety agendas may increase P(Alignment AND Humanity creates an SFC) while at the same time not increasing as much or even, if unlucky, reducing P(Alignment | Humanity creates an SFC). For example, such agendas may significantly prevent early AIs and AI usages from destroying, at the same time, the potential of Humanity and AIs. "

This is compressing a complicated line of thought into such a small number of words that I find it impossible to understand.

Maxime RichéFeb 211

Sorry if that's not clear.

Are the reformulations in the initial summary helping? The second bullet point is the most relevant.

(i) Deprioritizing significantly extinction risks, such as nuclear weapon and bioweapon risks.
(ii) Deprioritizing to some degree AI Safety agendas mostly increasing P(Humanity creates an SFC) but not increasing much P(Alignment | Humanity creates an SFC).
(iii) Giving more weight to previously neglected AI Safety agendas. E.g., a "Plan B AI Safety" agenda that would focus on decreasing P(Humanity creates an SFC | Misalignment), for example, by implementing (active & corrigible) preferences against space colonization in early AI systems.

David Mathers🔸Feb 212

No, I still don't understand.

JordanStoneFeb 221

I don't get it either. Can you maybe run us through 2 worked examples for bullet point 2? Like what is someone currently doing (or planning to do) that you think should be deprioritised? And presumably, there might be something that you think should be prioritised instead?

I'm imagining here that you want to deprioritise an AI safety regime if it is focusing on making AIs that create technology that can be used for spacefaring civilisation, but aren't aligned? That wouldn't be an AI safety regime would it? That's just creating AI that wants to leave Earth

JordanStoneFeb 221

Other currently neglected agendas may increase P(Alignment | Humanity creates an SFC) while not increasing P(Alignment AND Humanity creates an SFC). Those include agendas aiming at decreasing P(Humanity creates an SFC | Misalignment). An example of intervention in such an agenda is overriding instrumental goals for space colonization and replacing them with an active desire not to colonize space. This defensive preference could be removed later, conditional on achieving corrigibility.

What's the difference between "P(Alignment | Humanity creates an SFC)" and "P(Alignment AND Humanity creates an SFC)"?

David Mathers🔸Feb 223

Well, at a technical level the first is a conditional probability and the second is an unconditional probability of a conjunction. So the first is to be read as "the probability that alignment is achieved, conditional on humanity creating a spacefaring civilization" whilst the second is "the probability that the following is happens: alignment is solved and humanity creates a spacefaring civilization". If you think of probability as a space, where the likelihood of an outcome=the proportion of the space it takes up, then:

-the first is the proportion of the region of probability space taken up humanity creating a space-faring civilization in which alignment occurs.

-the second is the proportion of the whole of probability space in in which both alignment occurs and humanity creates a space-faring civilization.

But yes, knowing that does not automatically bring real understanding of what's going on. Or at least for me it doesn't. Probably the whole idea being expressed would better written up much more informally, focusing on a concrete story of how particular actions taken by people concerned with alignment might surprisingly be bad our suboptimal.

JordanStoneFeb 221

Thanks David, that makes sense :)

Maxime RichéFeb 221

What's the difference between "P(Alignment | Humanity creates an SFC)" and "P(Alignment AND Humanity creates an SFC)"?

I will try to explain it more clearly. Thanks for asking.

P(Alignment AND Humanity creates an SFC) = P(Alignment | Humanity creates an SFC) x P(Humanity creates an SFC)

So the difference is that when you optimize for P(Alignment | Humanity creates an SFC), you no longer optimize for the term P(Humanity creates an SFC), which was included in the conjunctive probability.

Can you maybe run us through 2 worked examples for bullet point 2? Like what is someone currently doing (or planning to do) that you think should be deprioritised? And presumably, there might be something that you think should be prioritised instead?

Bullet point 2 is: (ii) Deprioritizing to some degree AI Safety agendas mostly increasing P(Humanity creates an SFC) but not increasing much P(Alignment | Humanity creates an SFC).

Here are speculative examples. The degree to which their priorities should be updated is to be debated. I only claim that they may need to be updated conditional on the hypotheses being significantly correct.

AI Misuse reduction: If the PTIs are (a) to prevent extinction through misuse and chaos, (b) to prevent the loss of alignment power resulting from a more chaotic world, and (c) to provide more time for Alignment research, then it is plausible that the PTI (a) would become less impactful.
Misalign AI Control: If the PTIs are (c) as above, (d) to prevent extinction through controlling early misaligned AI trying to take over, (e) to control misaligned early AIs to make them work on Alignment research, and (f) to create fire alarms (note: which somewhat contradicts the path (b) above), then it is plausible the PTI (d) would be less impactful since these early misaligned AI may have a higher chance to not create an SFC after taking over (e.g., they don't survive destroying humanity or don't care about space colonization).
- Here is another vague diluted effect: If an intervention, like AI control, increases P(Humanity creates an SFC | Early Misalignment), then this intervention may need to be discounted more than if it was increasing P(Humanity creates an SFC) only. Changing P(Humanity creates an SFC) may have no impact when the hypotheses are significantly correct, but P(Humanity creates an SFC | Misalignment) is net negative, and Early Misalignment and (Late) Misalignment may be strongly correlated.
AI evaluations: The reduction of the impact of (a) and (d) may also impact the overall importance of this agenda.

These updates are, at the moment, speculative.