Bio

“In the day I would be reminded of those men and women,
Brave, setting up signals across vast distances,
Considering a nameless way of living, of almost unimagined values.”

How others can help me

I would greatly appreciate anonymous feedback, or just feedback in general. Doesn't have to be anonymous.

Comments
295

If evolutionary biology metaphors for social epistemology is your cup of tea, you may find this discussion I had with ChatGPT interesting. 🍵

(Also, sorry for not optimizing this; but I rarely find time to write anything publishable, so I thought just sharing as-is was better than not sharing at all. I recommend the footnotes btw!)

Glossary/metaphors

Background

Once upon a time, the common ancestor of the palm trees Howea forsteriana and Howea belmoreana on Howe Island would pollinate each other more or less uniformly during each flowering cycle. This was "panmictic" because everybody was equally likely to mix with anybody else.

Then, on a beautifwl sunny morning smack in the middle of New Zealand and Australia, the counterfactual descendants had had enough. Due to varying soil profiles on the island, they all had to compromise between fitness for each soil type—or purely specialize in one and accept the loss of all seeds which landed on the wrong soil. "This seems inefficient," one of them observed. A few of them nodded in agreement and conspired to gradually desynchronize their flowering intervals from their conspecifics, so that they would primarily pollinate each other rather than having to uniformly mix with everybody. They had created a cline.

And a cline once established, permits the gene pools of the assortatively-pollinating palms to further specialize toward different mesa-niches within their original meta-niche. Given that a crossbreed between palms adapted for different soil types is going to be less adaptive for either niche,[1] you have a positive feedback cycle where they increasingly desynchronize (to minimize crossbreeding) and increasingly specialize. Solve for the general equilibrium and you get sympatric speciation.[2]

Notice that their freedom to specialize toward their respective mesa-niches is proportional to their reproductive isolation (or inversely proportional to the gene flow between them). The more panmictic they are, the more selection-pressure there is on them to retain 1) genetic performance across the population-weighted distribution of all the mesa-niches in the environment, and 2) cross-compatibility with the entire population (since you can't choose your mates if you're a wind-pollinating palm tree).[3]

From evo bio to socioepistemology

I love this as a metaphor for social epistemology, and the potential detrimental effects of "panmictic communication". Sorta related to the Zollman effect, but more general. If you have an epistemic community that are trying to grow knowledge about a range of different "epistemic niches", then widespread pollination (communication) is obviously good because it protects against e.g. inbreeding depression of local subgroups (e.g. echo chambers, groupthink, etc.), and because researchers can coordinate to avoid redundant work, and because ideas tend to inspire other ideas; but it can also be detrimental because researchers who try to keep up with the ideas and technical jargon being developed across the community (especially related to everything that becomes a "hot topic") will have less time and relative curiosity to specialize in their focus area ("outbreeding depression").

A particularly good example of this is the effective altruism community. Given that they aspire to prioritize between all the world's problems, and due to the very high-dimensional search space generalized altruism implies, and due to how tight-knit the community's discussion fora are (the EA forum, LessWrong, EAGs, etc.), they tend to learn an extremely wide range of topics. I think this is awesome, and usually produces better results than narrow academic fields, but nonetheless there's a tradeoff here.

The rather untargeted gene-flow implied by wind-pollination is a good match to mostly-online meme-flow of the EA community. You might think that EAs will adequately speciate and evolve toward subniches due to the intractability of keeping up with everything, and indeed there are many subcommunities that branch into different focus areas. But if you take cognitive biases into account, and the constant desire people have to be *relevant* to the largest audience they can find (preferential attachment wrt hot topics), plus fear-of-missing-out, and fear of being "caught unaware" of some newly-developed jargon (causing people to spend time learning everything that risks being mentioned in live conversations[4]), it's unlikely that they couldn't benefit from smarter and more fractal ways to specialize their niches. Part of that may involve more "horizontally segmented" communication.

Tagging @Holly_Elmore because evobio metaphors is definitely your cup of tea, and a lot of it is inspired by stuff I first learned from you. Thanks! : )

  1. ^

    Think of it like... if you're programming something based on the assumption that it will run on Linux xor Windows, it's gonna be much easier to reach a given level of quality compared to if you require it to be cross-compatible.

  2. ^

    Sympatric speciation is rare because the pressure to be compatible with your conspecifics is usually quite high (Allee effects network effects). But it is still possible once selection-pressures from "disruptive selection" exceed the "heritage threshold" relative to each mesa-niche.[5]

  3. ^

    This homegenification of evolutionary selection-pressures is akin to markets converging to an equilibrium price. It too depends on panmixia of customers and sellers for a given product. If customers are able to buy from anybody anywhere, differential pricing (i.e. trying to sell your product at above or below equilibrium price for a subgroup of customers) becomes impossible.

  4. ^

    This is also known (by me and at least one other person...) as the "jabber loop":

    This highlight the utter absurdity of being afraid of having our ignorance exposed, and going 'round judging each other for what we don't know. If we all worry overmuch about what we don't know, we'll all get stuck reading and talking about stuff in the Jabber loop. The more of our collective time we give to the Jabber loop, the more unusual it will be to be ignorant of what's in there, which means the social punishments for Jabber-ignorance will get even harsher.

  5. ^

    To take this up a notch: sympatric speciation occurs when a cline in the population extends across a separatrix (red) in the dynamic landscape, and the attractors (blue) on each side overpower the cohering forces from Allee effects (orange). This is the doodle I drew on a post-it note to illustrate that pattern in different context:

    I dub him the mascot of bullshit-math. Isn't he pretty?

(Publishing comment-draft that's been sitting here two years, since I thought it was good (even if super-unfinished…), and I may wish to link to it in future discussions. As always, feel free to not-engage and just be awesome. Also feel free to not be awesome, since awesomeness can only be achieved by choice (thus, awesomeness may be proportional to how free you feel to not be it).)

Yes! This relates to what I call costs of compromise.

Costs of compromise

As you allude to by the exponential decay of the green dots in your last graph, there are exponential costs to compromising what you are optimizing for in order to appeal to a wider variety of interests. On the flip-side, how usefwl to a subgroup you can expect to be is exponentially proportional to how purely you optimize for that particular subset of people (depending on how independent the optimization criteria are). This strategy is also known as "horizontal segmentation".[1]

The benefits of segmentation ought to be compared against what is plausibly an exponential decay in the number of people who fit a marginally smaller subset of optimization criteria. So it's not obvious in general whether you should on the margin try to aim more purely for a subset, or aim for broader appeal.

Specialization vs generalization

This relates to what I think are one of the main mysteries/trade-offs in optimization: specialization vs generalization. It explains why scaling your company can make it more efficient (economies of scale),[2] why the brain is modular,[3] and how Howea palm trees can speciate without the aid of geographic isolation (aka sympatric speciation constrained by genetic swamping) by optimising their gene pools for differentially-acidic patches of soil and evolving separate flowering intervals in order to avoid pollinating each other.[4]

Conjunctive search

When you search for a single thing that fits two or more criteria, that's called "conjunctive search". In the image, try to find an object that's both [colour: green] and [shape: X].

My claim is that this analogizes to how your brain searches for conjunctive ideas: a vast array of preconscious ideas are selected from a distribution of distractors that score high in either one of the criteria.

10d6 vs 1d60

Preamble2: When you throw 10 6-sided dice (written as "10d6"), the probability of getting a max roll is much lower compared to if you were throwing a single 60-sided dice ("1d60"). But if we assume that the 10 6-sided dice are strongly correlated, that has the effect of squishing the normal distribution to look like the uniform distribution, and you're much more likely to roll extreme values.

Moral: Your probability of sampling extreme values from a distribution depends the number of variables that make it up (i.e. how many factors convolved over), and the extent to which they are independent. Thus, costs of compromise are much steeper if you're sampling for outliers (a realm which includes most creative thinking and altruistic projects).

Spaghetti-sauce fallacies 🍝

If you maximally optimize a single spaghetti sauce for profit, there exists a global optimum for some taste, quantity, and price. You might then declare that this is the best you can do, and indeed this is a common fallacy I will promptly give numerous examples of. [TODO…]

But if you instead allow yourself to optimize several different spaghetti sauces, each one tailored to a specific market, you can make much more profit compared to if you have to conjunctively optimize a single thing.

Thus, a spaghetti-sauce fallacy is when somebody asks "how can we optimize thing  more for criteria ?" when they should be asking "how can we chunk/segment  into  cohesive/dimensionally-reduced segments so we can optimize for {, ..., } disjunctively?"


People rarely vote based on usefwlness in the first place

As a sidenote: People don't actually vote (/allocate karma) based on what they find usefwl. That's a rare case. Instead, people overwhelmingly vote based on what they (intuitively) expect others will find usefwl. This rapidly turns into a Keynesian Status Contest with many implications. Information about people's underlying preferences (or what they personally find usefwl) is lost as information cascades are amplified by recursive predictions. This explains approximately everything wrong about the social world.

Already in childhood, we learn to praise (and by extension vote) based on what kinds of praise other people will praise us for. This works so well as a general heuristic that it gets internalized and we stop being able to notice it as an underlying motivation for everything we do.

  1. ^

    See e.g. spaghetti sauce.

  2. ^

    Scale allows subunits (e.g. employees) to specialize at subtasks.

  3. ^

    Every time a subunit of the brain has to pull double-duty with respect to what it adapts to, the optimization criteria compete for its adaptation—this is also known as "pleiotropy" in evobio, and "polytely" in… some ppl called it that and it's a good word.

  4. ^

    This palm-tree example (and others) are partially optimized/goodharted for seeming impressive, but I leave it in because it also happens to be deliciously interesting and possibly entertaining as examples of a costs of compromise. I want to emphasize how ubiquitous this trade-off is.

Oh, this is excellent! I do a version of this, but I haven't paid enough attention to what I do to give it a name. "Blurting" is perfect.

I try to make sure to always notice my immediate reaction to something, so I can more reliably tell what my more sophisticated reasoning modules transforms that reaction into. Almost all the search-process imbalances (eg. filtered recollections, motivated stopping, etc.) come into play during the sophistication, so it's inherently risky. But refusing to reason past the blurt is equally inadvisable.

This is interesting from a predictive-processing perspective.[1] The first thing I do when I hear someone I respect tell me their opinion, is to compare that statement to my prior mental model of the world. That's the fast check. If it conflicts, I aspire to mentally blurt out that reaction to myself.

It takes longer to generate an alternative mental model (ie. sophistication) that is able to predict the world described by the other person's statement, and there's a lot more room for bias to enter via the mental equivalent of multiple comparisons. Thus, if I'm overly prone to conform, that bias will show itself after I've already blurted out "huh!" and made note of my prior. The blurt helps me avoid the failure mode of conforming and feeling like that's what I believed all along.

Blurting is a faster and more usefwl variation on writing down your predictions in advance.

  1. ^

    Speculation. I'm not very familiar with predictive processing, but the claim seems plausible to me on alternative models as well.

I disagree a little bit with the credibility of some of the examples, and want to double-click others. But regardless, I think this is a very productive train of thought and thank you for writing it up. Interesting!

And btw, if you feel like a topic of investigation "might not fit into the EA genre", and yet you feel like it could be important based on first-principles reasoning, my guess is that that's a very important lead to pursue. Reluctance to step outside the genre, and thinking that the goal is to "do EA-like things", is exactly the kind of dynamic that's likely to lead the whole community to overlook something important.

I'm not sure. I used to call it "technical" and "testimonial evidence" before I encountered "gears-level" on LW. While evidence is just evidence and Bayesian updating stays the same, it's usefwl to distinguish between these two categories because if you have a high-trust community that frequently updates on each others' opinions, you risk information cascades and double-counting of evidence.

Information cascades develop consistently in a laboratory situation [for naively rational reasons, in which other incentives to go along with the crowd are minimized]. Some decision sequences result in reverse cascades, where initial misrepresentative signals start a chain of incorrect [but naively rational] decisions that is not broken by more representative signals received later. - (Anderson & Holt, 1998)

Additionally, if your model of a thing has has "gears", then there are multiple things about the physical world that, if you saw them change, it would change your expectations about the thing.

Let's say you're talking to someone you think is smarter than you. You start out with different estimates and different models that produce those estimates. From Ben Pace's a Sketch of Good Communication:

Here you can see both blue and red has gears. And since you think their estimate is likely to be much better than yours, and you want get some of that amazing decision-guiding power, you throw out your model and adopt their estimate (cuz you don't understand or don't have all the parts of their model):

Here, you have "destructively deferred" in order to arrive at your interlocutor's probability estimate. Basically zombified. You no longer have any gears, even if the accuracy of your estimate has potentially increased a little.

An alternative is to try to hold your all-things-considered estimates separate from your independent impressions (that you get from your models). But this is often hard and confusing, and they bleed into each other over time.

"When someone gives you gears-level evidence, and you update on their opinion because of that, that still constitutes deferring."

This was badly written. I just mean that if you update on their opinion as opposed to just taking the patterns & trying to adjust for the fact that you received them through filters, is updating on testimony. I'm saying nothing special here, just that you might be tricking yourself into deferring (instead of impartially evaluating patterns) by letting the gearsy arguments woozle you.

I wrote a bit about how testimonial evidence can be "filtered" in the paradox of expert opinion:

If you want to know whether string theory is true and you're not able to evaluate the technical arguments yourself, who do you go to for advice? Well, seems obvious. Ask the experts. They're likely the most informed on the issue. Unfortunately, they've also been heavily selected for belief in the hypothesis. It's unlikely they'd bother becoming string theorists in the first place unless they believed in it.

If you want to know whether God exists, who do you ask? Philosophers of religion agree: 70% accept or lean towards theism compared to 16% of all PhilPaper Survey respondents.

If you want to know whether to take transformative AI seriously, what now?

Some selected comments or posts I've written

  • Taxonomy of cheats, multiplex case analysis, worst-case alignment
  • "You never make decisions, you only ever decide between strategies"
  • My take on deference
  • Dumb
  • Quick reasons for bubbliness
  • Against blind updates
  • The Expert's Paradox, and the Funder's Paradox
  • Isthmus patterns
  • Jabber loop
  • Paradox of Expert Opinion
  • Rampant obvious errors
  • Arbital - Absorbing barrier
  • "Decoy prestige"
  • "prestige gradient"
  • Braindump and recommendations on coordination and institutional decision-making
  • Social epistemology braindump (I no longer endorse most of this, but it has patterns)

Other posts I like

  • The Goddess of Everything Else - Scott Alexander
    • “The Goddess of Cancer created you; once you were hers, but no longer. Throughout the long years I was picking away at her power. Through long generations of suffering I chiseled and chiseled. Now finally nothing is left of the nature with which she imbued you. She never again will hold sway over you or your loved ones. I am the Goddess of Everything Else and my powers are devious and subtle. I won you by pieces and hence you will all be my children. You are no longer driven to multiply conquer and kill by your nature. Go forth and do everything else, till the end of all ages.”
  • A Forum post can be short - Lizka
    • Succinctly demonstrates how often people goodhart on length or other irrelevant criteria like effort moralisation. A culture for appreciating posts for the practical value they add to you specifically, would incentivise writers to pay more attention to whether they are optimising for expected usefwlness or just signalling.
  • Changing the world through slack & hobbies - Steven Byrnes
    • Unsurprisingly, there's a theme to what kind of posts I like. Posts that are about de-Goodharting ourselves.
  • Hero Licensing - Eliezer Yudkowsky
    • Stop apologising, just do the thing. People might ridicule you for believing in yourself, but just do the thing.
  • A Sketch of Good Communication - Ben Pace
    • Highlights the danger of deferring if you're trying to be an Explorer in an epistemic community.
  • Holding a Program in One's Head - Paul Graham
    • "A good programmer working intensively on his own code can hold it in his mind the way a mathematician holds a problem he's working on. Mathematicians don't answer questions by working them out on paper the way schoolchildren are taught to. They do more in their heads: they try to understand a problem space well enough that they can walk around it the way you can walk around the memory of the house you grew up in. At its best programming is the same. You hold the whole program in your head, and you can manipulate it at will.

      That's particularly valuable at the start of a project, because initially the most important thing is to be able to change what you're doing. Not just to solve the problem in a different way, but to change the problem you're solving."
Emrik
13
0
0
1
2
1

It took me 24 minutes from when I decided to start (half-way through reading this post), but that could be reduced if I had these tips at the start.

  1. The website requires you to enable cookies. If you (like me) select "no" before you realize this, you can enable them again like this.[1] (Safe to ignore otherwise.)
  2. Yes, you can simply copy-paste Ben's suggestions above into sections Policy proposal, Animal Welfare, and Further comments.
  3. The only question I wrote anything in is Q29 in the About you section, selecting "Other (please specify)":
Q29. ...your reason for taking part...

I am not responding on behalf of an organization, but I am mostly pasting a template from a policy proposal suggested by an acquaintance here:

https://forum.effectivealtruism.org/posts/mooBq4A3Hd8ttTyAY/ten-minutes-to-speak-up-for-4-5-million-caged-chickens

I am from Norway, but I care to write this response because I don't wish for the animals to suffer unnecessarily. I can answer on behalf of the policy response if you wish to contact me, or you may contact the author of that document.

  1. ^

    On Chrome:
     1) to the right of the URL bar, select Cookies and site dataManage on-device site data
     2) delete both cookies there
     3) refresh page, and you will now be prompted again

And a follow-up on why I encourage the use of jargon.

  • Mutation-rate ↦ "jargon-rate"
  • I tend to deliberately use jargon-dense language because I think that's usually a good thing. Something we discuss in the chat.
    • I also just personally seem to learn much faster by reading jargon-dense stuff.
    • As long as the jargon is apt, it highlights the importance of a concept ("oh, it's so generally-applicable that it's got a name of its own?").
    • If it's a new idea expressed in normal words, the meaning may (ill-advisably) snap into some old framework I have, and I fail to notice that there's something new to grok about it. Otoh, if it's a new word, I'll definitely notice when I don't know it.
    • I prefer a jargon-dump which forces me to look things up, compared to fluent text where I can't quickly scan for things I don't already know.
    • I don't feel the need to understand everything in a text in order to benefit from it. If I'm reading something with a 100% hit-rate wrt what I manage to understand, that's not gonna translate to a very high learning-rate.

To clarify: By "jargon" I didn't mean to imply anything negative. I just mean "new words for concepts". They're often the most significant mutations in the meme pool, and are necessary to make progress. If anything, the EA community should consider upping the rate at which they invent jargon, to facilitate specialization of concepts and put existing terms (competing over the same niches) under more selection-pressure.

I suspect the problems people have with jargon is mostly that they are *unable* to change them even if they're anti-helpfwl. So they get the sense that "darn, these jargonisms are bad, but they're stuck in social equilibrium, so I can't change them—it would be better if hadn't created them in the first place." The conclusion is premature, however, since you can improve things either by disincentivizing the creation of bad jargon, *or* increasing people's willingness to create them, so that bad terms get replaced at a higher rate.

That said, if people still insist on trying to learn all the jargon created everywhere because they'll feel embarrassed being caught unaware, increasing the jargon-rate could cause problems (including spending too much time on the forum!). But, again, this is a problem largely caused by impostor syndrome, and pluralistic ignorance/overestimation about how much their peers know. The appropriate solution isn't to reduce memetic mutation-rate, but rather to make people feel safer revealing their ignorance (and thereby also increasing the rate of learning-opportunities).

Naive solutions like "let's reduce jargon" are based on partial-equilibrium analysis. It can be compared to a "second-best theory" which is only good on the margin because the system is stuck in a local optimum and people aren't searching for solutions which require U-shaped jumps[1] (slack) or changing multiple variables at once. And as always when you optimize complex social problems (or manually nudge conditions on partial differential equations): "solve for the general equilibrium".

  1. ^

    A "U-shaped jump" is required for everything with activation costs/switching costs.

I predict with high uncertainty that this post will have been very usefwl to me. Thanks!

Here's a potential missing mood: if you read/skim a post and you don't go "ugh that was a waste of time" or "wow that was worth reading"[1], you are failing to optimise your information diet and you aren't developing intuition for what/how to read.

  1. ^

    This is importantly different from going "wow that was a good/impressive post". If you're just tracking how impressed you are by what you read (or how useful you predict it is for others), you could be wasting your time on stuff you already know and/or agree with. Succinctly, you need to track whether your mind has changed--track the temporal difference.

Load more