Digest: Why Most Published Research Findings Are False

Published on
19 min read––– views


This article aims to provide some of the derivations from Ioannidis' 2005 article: "Why Most Published Research Findings Are False" which exposed what has since been termed "The Replication Crisis."

The issue begins with the subject of pp-values which measure the probability of a study finding a positive result, assuming the presented hypothesis is false. A strong pp-values is considered to be 0.050.05, indicating a regrettable 5% of published findings are false.

Before diving into the derivations, some examples:

Example 1

Suppose we represent all possible hypotheses that can be tested with a more manageable 100,000 hypotheses. Let's allow a generous 50:50 true:false split for this set as well as a statistical power of 80%.

positive result ++40k2.5k42.5k
negative result -10k47.5k57.5k

Here, the pp-value =α=P(+ f )= \alpha = P(+ \vert \text{ f }) where ++ is a positive relationship, and f\text{f} is a flase result. The statistical power =P(+ t )= P(+ \vert \text{ t }) and Positive Predictive Value PPV=statistical powerP( t + f +)=40k42.5k0.94\text{PPV} = \frac{\text{statistical power}}{P(\text{ t } + \text{ f } \vert +)} = \frac{40k}{42.5k} \approx 0.94 which is pretty satisfactory given our generous values.

Example 2

Once again, we'll take 100,000 hypotheses, but now with a 10:90 true:false split for this set as well as a statistical power of 80%. Filling out the table we get:


Here, PPV=8k12.5k=0.64\text{PPV} = \frac{8k}{12.5k} = 0.64 which is significantly worse than the assumed 95% if the study is positive without publicaiton bias, cheating etc. which is covered below.

Before getting much further, it will be useful to define a glossary

ppP(+ f )P(+ \vert \text{ f })probability of a study finding a positive result, given that the hypothesis is false
PPV\text{PPV}statistical powerP( t + f +)\frac{\text{statistical power}}{P(\text{ t } + \text{ f } \vert +)}Positive Predictive Value
RRP( t )P( f )\frac{P(\text{ t } )}{P(\text{ f } )}the pre-study odds of the hypothesis is tested
Θ\varTheta=R=P1P= R = \frac{P}{1 - P}an alternate expression of probability, e.g. 10:90 odds: Θ=10%100%10%\varTheta = \frac{10\%}{100\% - 10\%}
P(f)P(\text{f})1P( t )1 - P(\text{ t }) compliment rule
α\alphaP(+ f )P( + \vert \text{ f }) Type I Error
β\betaP( t )P( - \vert \text{ t }) Type II Error
P(+)P(\text{t } \vert +)P(t)P(+ t )P(t)\frac{P(\text{t}) \cdot P(+ \vert \text{ t })}{P(t)} Bayes Rule
P(t+)P(\text{t} \land +)P( t +)P(t)P(\text{ t } \vert +) \cdot P(\text{t})Product Rule
uubias factor influenced by pp-hacking, conflict of interest, competitive publication motivations, etc.

Table 1

Now we can recreate the general table for all such examples above and derive their values:

++c(1β)RR+1\frac{c(1 - \beta)R}{R+1}cαR+1\frac{c \alpha}{R+1}c(R+αβR)R+1\frac{c(R+\alpha-\beta R)}{R+1}
-cβRR+1\frac{c \beta R}{R+1}c(1α)R+1\frac{c(1-\alpha)}{R+1}c(1α+βR)R+1\frac{c(1-\alpha + \beta R)}{R+1}
TotalcRR+1\frac{cR}{R+1}cR+1\frac{c}{R+1}cc the number of relationships tested


Starting with the top left cell which represents:

P(+ t )=P(+ t )§P(t)RR+1P(+ \land \text{ t }) = \underbrace{P(+ \vert \text{ t })}_{\text{\S}} \cdot \underbrace{P(\text{t})}_{\frac{R}{R+1}}

§1.1:β=P( t )\quad \text{\S} 1.1: \beta = P(- \vert \text{ t })

§1.2:P(+ t )+P( t )=1\quad \text{\S} 1.2: P(+ \vert \text{ t }) + P(- \vert \text{ t }) = 1

§1.3:P(+ t )+P( t )=1\quad \text{\S} 1.3: P(+ \vert \text{ t }) + P(- \vert \text{ t }) = 1

§P(+ t )=1β\quad \text{\S} \therefore P(+ \vert \text{ t }) = 1 - \beta

=(1β)(RR+1)=c(1β)RR+1 = (1- \beta )(\frac{R}{R+1})= \frac{c(1 - \beta )R}{R+1}

Similarly, for the top-middle cell:

P(+ f )=P(+ f )αP(f)1RR+1P(+ \land \text{ f }) = \underbrace{P(+ \vert \text{ f })}_{\alpha} \cdot \underbrace{P(\text{f})}_{1 - \frac{R}{R+1}}

=α(1RR+1)=cαR+1 = \alpha (1 - \frac{R}{R+1}) = \frac{c \alpha}{R+1}

So, for all true positives, the top-right cell:

true positivesall positives=c(1β)RR+1c(R+αβR)R+1=(1β)RR+αβR=P( t +)want this bad boi to be high\frac{\text{true positives}}{{\text{all positives}}} = \frac{\frac{c(1- \beta )R}{R+1}}{\frac{c(R+ \alpha - \beta R)}{R+1}} = \frac{(1 - \beta ) R}{ R + \alpha - \beta R} = \underbrace{P(\text{ t } \vert +)}_{\text {want this bad boi to be high}} in terms of Type I, II error and pre-study odds.

When is a Study More Likely to be True than False?

P( t +)>12P(\text{ t } \vert +) > \frac{1}{2}

(1β)RR+αβR>12 \rArr \frac{(1 - \beta ) R}{ R + \alpha - \beta R} > \frac{1}{2}

2((1β)RR+αβR)>2(12)\rArr 2(\frac{(1 - \beta ) R}{ R + \alpha - \beta R}) > 2( \frac{1}{2})

2(1β)R>R+αβR\rArr 2(1 - \beta )R > R + \alpha - \beta R

R(1β)>α    (pre-study odds)(statistical power)>p-value \rArr R(1 - \beta ) > \alpha \iff (\text{pre-study odds})(\text{statistical power}) > p\text{-value}

Some fields of study have inherently small RR or (1β)(1 - \beta) values

What Happens if we Introduce Bias?

P(+ f )=P(+ f )P(f)biasP(+ \land \text{ f }) = P(+ \vert \text{ f }) \cdot P(\text{f}) \underset{bias}{\longrightarrow} negative study results become positive with P(u)P(u)

This can alter our outcome by in two cases:

1. P(+for any reasont)\text{1. } P(\underbrace{+}_{\text{for any reason}} \land \text{t})

=P(+ t )1βP(t)RR+1+uP( t )Type II Error: βP(t)= \underbrace{P(+ \vert \text{ t })}_{1 - \beta} \cdot \underbrace{P(\text{t})}_{\frac{R}{R+1}} + u \cdot \underbrace{P(- \vert \text{ t })}_{\text{Type II Error: } \beta} \cdot P(\text{t})

=(1β)(RR+1)+uβRR+1= (1 - \beta )(\frac{R}{R+1}) + u \beta \frac{R}{R+1}

=1βR+uβRR+1 = \frac{1 - \beta R + u \beta R}{R+1}

2. P(+for any reasonf)\text{2. } P(\underbrace{+}_{\text{for any reason}} \land \text{f})

=P(+ f )P(f)_1P(t)+uP( f )P(f) = P(+ \vert \text{ f }) \cdot \underbrace{P(\text{f})}\_{1-P(t)} + u \cdot P(- \vert \text{ f }) \cdot P(\text{f})

=α(1RR+1)+u(1α)(1RR+1) = \alpha (1-\frac{R}{R+1}) + u(1 - \alpha)(1-\frac{R}{R+1}) Note that these truths/falsehoods have to be independent of the decision making otherwise they would impair judgement, disallowing us from applying the Product Rule

=α+u(1α)R+1 = \frac{\alpha + u(1 - \alpha)}{R+1}

The Issue of Incorrect pre-publication pp-values.

Research efforts do not occur in isolation. Several teams may be independently, competitive working on the same hypotheses over and over and over again without adjusting their pp-values.

Relevant xkcd:


This means that statistical power decreases as the experiments are repeated:

P(+>0with n studies t )P(\underbrace{+_{\tiny > 0}}_{\text{with n studies}} \land \text{ t })

=P(+_§ t )P(t)= P(\underbrace{+}\_{\text{\S}} \land \text{ t }) \cdot P(\text{t})

§2.1:P(+ t )=nPn\quad \text{\S}2.1: P(+ \land \text{ t }) = \displaystyle\sum^{n} P_n which could be ... a lot of probabilities

§2.2:(1P(n t ))P(t)\quad \text{\S}2.2: (1 - P(-_{\forall \tiny n} \vert \text{ t })) \cdot P(\text{t}) which is the negative results of all nn studies

§2.3:(1βn)P(t)\quad \text{\S}2.3: (1 - \beta^n) \cdot P(\text{t})

=R(1βn)R+1= \frac{R(1 - \beta^n)}{R+1}

Meaning that for each subsequent, competing trial, the likelihood of your own pp-value genuinely being sufficiently small decreases.