I also conduct research on the generalizability issue, but from a different perspective. In my view, any attempt to measure effect heterogeneity (and by extension, research generalizability) is scale dependent. It is very difficult to tease apart genuine effect heterogeneity from the appearance of heterogeneity due to using an inappropriate scale to measure the effects.
In order to to get around this, I have constructed a new scale for measuring effects, which I believe is more natural than the alternative measures. My work on this is available on arXiv at https://arxiv.org/abs/1610.00069 . The paper has been accepted for publication at the journal Epidemiologic Methods, and I plan to post a full explanation of the idea here and on Less Wrong when it is published (presumably, this will be a couple of weeks from now).
I would very much appreciate feedback on this work, and as always, I operate according to Crocker's Rules.
The Meta-Research Innovation Center at Stanford (METRICS) is hiring post-docs for 2016/2017. The full announcement is available at http://metrics.stanford.edu/education/postdoctoral-fellowships. Feel free to contact me with any questions; I am currently a post-doc in this position.
METRICS is a research center within Stanford Medical School. It was set up to study the conditions under which the scientific process can be expected to generate accurate beliefs, for instance about the validity of evidence for the effect of interventions.
METRICS was founded by Stanford Professors Steve Goodman and John Ioannidis in 2014, after Givewell connected them with the Laura and John Arnold Foundation, who provided the initial funding. See http://blog.givewell.org/2014/04/23/meta-research-innovation-centre-at-stanford-metrics/ for more details.
(I tried posting this in a separate article, but as a new user I don't have enough karma. For now it is going to the open thread; if people think this should get more visibility I'd be happy to move it once I get sufficient karma)
Thank you! I will think about whether I can come up with a catchier name for future publications (and about whether the benefits outweight the costs of rebranding).
If anyone has suggestions for a better name (for an effect measure that intuitively measures the probability that the exposure switches a person's outcome state), please let me know!