1/ Is the replication crises a replication crisis or a problem of testimony? A new AMR paper by me, and Not-on-Twitters (NoTs) Andy King & Tim Simcoe reframes the problem. @aomConnect https://doi.org/10.5465/amr.2018.0421
2/ Frequentist statistics imply that results are replicable. By definition! A p-value (or CI, SE etc) is a statement about how frequently we might expect to see in repeated samples from the same population. It's called "Frequentist" a reason!
3/ But, frequency claims are hard to make, because they require strict pre-specification. VERY strict. And they require random samples from a population. So any non-prespecified paper using frequency stats is making unjustified claims of replicability.
4/ This is a testimony problem. We aren't describing our work in a justifiable way. In philosophy-of-testimony speak, most empirical papers are not "veridical" https://en.wiktionary.org/wiki/veridicality
5/ There are two solutions to this problem. Either use better methods to make frequency claims justifiable or make claims that match the way we conduct tests. Let's consider the first approach:
6/ Making frequency claims veridical: You have to pre-specify the precise statistical test you will use *before* collecting any data. This means sampling plan, measurement plan, specific regressions, and how coefficients will be interpreted.
7/ Unless a question is very well defined, and a population very well understood, that’s impractical. We can only do this after a lot of exploratory work that helps us understand which questions to ask.
8/ Aside: One way around this for certain types of questions/data is splitting samples. Whatever you find in the first sample pre-specifies the analysis in the second sample. Replication is similar, just across authors.
9/ Bayesians might say: “Bayes!”. But no free lunch here. Bayesian posteriors are statements about a sample, not a population. This is a much more narrow statement. Bayesian stats tell me about the likelihood of a model in my sample, they do NOT tell me anything...
10/ ...about the likelihood in another sample. We never know to what degree Bayesian stats are over-fit. That information is not in the data. But, we can use more information to figure out whether we might believe the model is the best explanation.
12/ But IBE requires defining "Best". Best is subjective and any credible application I've seen requires subjective weighting non-statistical information (i.e., "context").
13/ But at the end of the day, an explanation applies to a sample, not a population. Generalization from a sample is theory. All general statements we make are theoretical. Hume's problem of induction is severe. https://plato.stanford.edu/entries/induction-problem/
14/ We've been relying on frequentist stats to make those general statements. But that's been wrong! This means that most empirical papers are theory papers. Provocation: @aomConnect! AMJ is a theory journal. https://twitter.com/brentdg2/status/1159549567039184899
15/ So, if what we are doing is IBE, we should say so! This is our way out of the problem. But again, no free lunch! IBE cannot make claims to general truth! Rather, IBE is always the researcher's best explanation for their data.
16/ Making weaker claims with clear statements that our papers are only the best stories for the patterns in our data and that application to other settings requires non-statistical reasoning may seem like a massive retreat.
17/ Maybe it is, but we couldn't find another way out without changing what frequentist and Bayesian stats mean. And we were unable to figure out a way to do that. IDK, maybe if we were statisticians?
19/ A corollary of this is that without prespecification, a paper should never present formal hypotheses. Rather, papers should be written in ways that make how we reasoned explicit. If we discovered a relationship and then tried to understand this, that is useful,...
20/ ...and a noble way forward. We should be transparent to the reader how we reasoned so that the reader can evaluate whether they agree with us or not.
21/ More transparent reasoning can be done by evaluating a much larger set of potential models in what we call epistemic mapping. This will also allow the reader to evaluate whether the models you've chosen match their priors. https://www.dropbox.com/s/uv6adsn5v8p06fi/Screen%20Shot%202020-05-18%20at%2016.51.57.png?dl=0
22/ This is subtle! Maps shift the burden of judgment in part to the reader. Making sense of maps is harder, because the reader has to choose which model they think is best, or decide that they have no idea and it is uncertain.
24/ starbility in R: https://twitter.com/aakaash_rao/status/1257359460093362181?s=20
25/ Still an open question as to how to use these maps. And while maps only tell us about a sample, they at least help force the researcher to be more transparent.
26/ I believe that maps are great, but at the end of the day, but I also believe that the only way out of the reliability problem is to make much more modest claims in our work. We can't change fundamental stats. And we lose credibility when we keep trying. /END.
You can follow @brentdg2.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: