I'm a data scientist at the Humane and Sustainable Food Lab, working toward a bright future for human and nonhuman animals. I'm also interested in statistics, math, multi-agent RL, AI governance, and worldview bargaining.
Thanks Toby! Great question: outcome_2 isn't included because it would over-adjust our estimate for veganuary_2. By design, outcome_2 occurs after (or at the same time as) veganuary_2. If it occurs after, outcome_2 will "contain" the effect of veganuary_2 (and in the real world, this contained effect may be larger than the effect on outcome_3 given attenuating effects over time). If we include outcome_2, our model will adjust for the now updated outcome_2, and "control away" most or all of the effect when estimating for outcome_3. On the other hand, including activism_2 would successfully adjust for any inter-wave activism exposure.
There are then two directly related follow-up questions:
This is interesting and directly relevant to inferring events from measurements. In this study, the outcome was prospective (e.g., what is your current consumption), while the predictors were both prospective and retrospective (e.g., what happened in the last six months). For question 1, outcome_2 occurs after the retrospective predictors but not after the prospective ones, so we have a reverse causation problem for some of the predictors. For question 2, from the framing of the survey questions (how much activism were you exposed to in the last six months), it's not possible to determine whether activism_2 or veganuary_2 occurred first, meaning we would again have over-adjustment for many of the models.
In an ideal scenario, we would adjust for all potential confounders immediately prior to the exposure. But in those cases it's a tug-o-war between temporal precedence and no alternative explanations because in the real world as soon as you start measuring extremely close to the exposure, it becomes unclear where the confounding control ends and where the exposure begins.
I hope that helps and feel free to follow up!
Coming into this, I expected GPT to write clear, grammatically-correct responses that were neither conceptually cohesive nor logically consistent.
Following this idea, the first thing I analyzed was the logical connections between the topics chosen in GPT's responses.
Overall, I think GPT3 performs the best at the Robin-Hanson-esq task 2, followed by task 1, then the historical accident task 3. While most of the responses for task 2 were logically inconsistent or nonsensical, a few examples were consistent and even a little insightful. The three I think best, in ascending order, are:
We pretend that the education system is about preparing students for the future, but it's more about preparing them for standardized tests. If we cared about student success, we would focus more on experiential learning and critical thinking skills.
We pretend that the beauty industry is about helping people feel good about themselves, but it's more about promoting unrealistic beauty standards. If we cared about self-esteem, we would focus more on inner beauty and self-acceptance.
We pretend that the economy is about providing for people's needs, but it's more about maximizing profits for corporations. If we cared about people's well-being, we would prioritize a more equitable distribution of wealth and resources.
Each of these responses seems to have a logical base point, but the lack of further specificity leads to un-insightful answers. The response on the education system, while common knowledge and not novel, seems somewhat insightful in mentioning critical thinking. The rest of the responses were quite poor, with the notable reasons being: choosing an alternative path (A) that didn't relate to Y (e.g., providing rehabilitation as a way of finding justice), mismatching the subject of the clause experiencing Y and the subject of the clause experiencing Z (e.g., mentioning consumers' goals, then mentioning companies' goals), and general vagueness (e.g., "use social media differently").
Task 1 seems hit and miss.
If you've never had a disagreement with a friend, you're not expressing your opinions honestly.
If you've never had a flat tire, you're not driving enough.
Some are good but obvious, some novel but useless, and others nonsensical.
For task 3, the main issue seems to be lack of specificity. So while the connections between chosen topics are existent, the broader point made in a response is weak. The distinction between culture and non-culture seems lost upon GPT3. Making "data-driven decisions" isn't a unicultural phenomenon.
Overall, I was surprised by GPT3's logically consistent responses. Then again, given enough tries, monkeys on typewriters... GPT3 still doesn't have any conceptual understanding of words, exemplified in the incohesive content behind its reasonably clear grammatical form.
These are impressive responses from GPT3 in the sense that many people would find a number of the responses insightful on first reading. It's unclear whether it could come up with completely original ideas, but with further advancements, that could soon be a possibility.
Thanks for the in-depth questions! You're right, and this is another limitation. Even for cases where there is no inter-wave activism, I should make it clear that the estimates are only truly causal if you adjust for all relevant confounders, which is unlikely in practice. So the results we get are associations, but less biased (aka causal under certain assumptions).
The main way we address this issue is through the sensitivity analysis, since it gives a sense of how much unmeasured confounding is required (from a variable not collected or a variable collected not granularly enough like you pointed out) to overturn significance. In our case, a moderate amount would be needed, so the estimates are likely at least directionally consistent.