For a decade, science has faced a replication crisis in that the results of many important studies are difficult or impossible to reproduce. For example, the Open Science Collaboration in 2015 published a paper replicating 100 psychology studies that found that many replications produced weaker evidence for the original findings. A study published in Science reproducing 18 economic experiments soon followed and again found that up to one-third could not be reproduced. The question remains whether health economics faces a reproducibility crisis and, if so, what do we do about it?
To fully understand the reproducibility crisis, one must look at the incentives for authors trying to publish scientific articles. It is only human nature to regard results that are perceived as positive or statistically significant as telling a better story than negative or non-significant results. A common manifestation is p-hacking which arises when researchers look to report effects that are deemed statistically significant below a threshold such as 0.05. A recent analysis of over 21,000 hypothesis tests published in 25 leading economics journals show this a problem, particularly with studies employing instrumental variables and difference-in-differences methods. Hence, published results may not provide a reliable evidence base for further research or policy analysis, and replication studies are a way to test this.
Beyond formal hypothesis testing, the widespread use of crucial health economic results, such as EQ-5D value sets, means reproducibility is likely to be extremely important; such studies become inputs to 100s of other analyses. It is perhaps not surprising that one of the only replications conducted in health economics has been over the EQ-5D-5L value set for England, although in less-than-ideal circumstances. Rather than a one-off, replication should be seen as integral to the development of foundational health economic tools such as value sets and disease simulation models that are critical to so much research.
The question now for the discipline is how we can promote and facilitate replication and avoid the pitfalls of p-hacking.
Editors of health economics journals undertook an initiative in 2015, aimed at reducing p-hacking; they issued a statement reminding referees to accept studies that ‘have potential scientific and publication merit regardless of whether such studies’ empirical findings do or do not reject null hypotheses’. This appears to have had some impact, but health economics faces unique challenges. Often there are pressures to demonstrate that an intervention is cost-effective by showing that it falls below a predefined cost-per-QALY threshold, which produces what could be termed cost-effectiveness threshold hacking.
One approach undertaken by the Mount Hood Diabetes Challenge Network has been to run comparable scenarios through various health economic diabetes simulation models. A recent challenge involved comparing 12 different Type 2 diabetes computer models that separately simulated the impact of a range of treatment interventions on quality-adjusted life years (QALYs). The variation in outcomes across models was substantive, e.g. up to a six-fold variation in incremental QALYs associated between different simulation models (see figure). These findings suggest that the choice of simulation model could significantly impact whether a therapy is deemed cost-effective. When combined with threshold hacking, this means that many economic evaluations are likely to be more grounded in advocacy than science.

To create greater transparency, the Mount Hood Diabetes Challenge Network has also created specific guidelines for reporting economic evaluations that use diabetes simulation models to enable replication and has developed a simulation model registry. The registry is designed to encourage those developing models to provide documentation in one place and report on a set of reference simulations. Modelling groups are also encouraged to update these simulations each time the model changes, which provides a benchmark to compare different simulation models and how models evolve. Health economic model registries can potentially improve the science of economic evaluation in the same way clinical trial registries have improved the conduct and reporting of randomised controlled trials in medicine.
Finally, those conducting experiments or quasi-experimental methods can now submit registered reports. This initiative started in psychology journals with the idea that authors submit the protocol to a journal for peer review before undertaking the study. After review, if the registered report is accepted, the journal commits to publishing the full study, regardless of the results’ significance. Registered reports are a way of avoiding the pitfalls of p-hacking and publication bias.
More than 300 journals now allow registered reports, but uptake by economics journals has been slow. Quality of Life Research and Oxford Open Economics are the only options for those undertaking health economics experiments. Hopefully, these initiatives – along with an online petition signed by more than 145 health economists – will encourage other health economics journals to provide this option in future.
Embracing registered reports and developing health economic model registers are two important ways to strengthen health economics as a science.
Photo by engin akyurt on Unsplash