Data sharing and the cost of error

The world’s highest impact factor medical journal, the New England Journal of Medicine (NEJM), seems to have been doing some soul searching. After publishing an editorial early in 2016 insinuating that researchers requesting data from trials for re-analysis were “research parasites“, they have released a series of articles on the topic of data sharing. Four articles were published in August: two in favour and two less so. This month another three articles are published on the same topic. And, the journal is sponsoring a challenge to re-analyse data from a previous trial. We reported earlier in the year about a series of concerns at the NEJM and these new steps are all welcome to address those challenges. However, while the articles consider questions of fairness about sharing data from large, long, and difficult trials, little has been said about the potential costs to society of un-remedied errors in data analysis. The costs of not sharing data can be large as the long running saga over the controversial PACE trial illustrates.

The PACE trial was a randomised, controlled trial to assess the benefits of a number of treatments for chronic fatigue syndrome including graded exercise therapy and cognitive behavioural therapy. However, after publication of the trial results in 2011, a number of concerns were raised about the conduct of the trial, its analysis, and reporting. This included a change in the definitions of ‘improvement’ and ‘recovery’ mid-way through the trial. Other researchers sought access to the data from the trial for re-analysis, but such requests were rebutted with what a judge later described as ‘wild speculations’. The data were finally released and recently re-analysed. The new analysis revealed what many suspected – that the interventions in the trial had little benefit. Nevertheless, the recommended treatments for chronic fatigue syndrome had changed as a result of the trial. (STAT has the whole story here).

A cost-effectiveness analysis was published alongside the PACE trial. The results showed that chronic behavioural therapy (CBT) was cost-effective compared to standard care, as was graded exercise therapy (GET). Quality of life was measured in the trial using the EQ-5D, and costs were also recorded, making calculation of incremental cost-effectiveness ratios straightforward. Costs were higher for all the intervention groups. The table reporting QALY outcomes is reproduced below:


At face value the analysis seems reasonable. But, in light of the problems with the trial, including that none of the objective measures of patient health, such as walking tests and step tests, nor labour market outcomes, showed much sign of improvement or recovery, these data seem less convincing. In particular, their statistically significant difference in QALYs – “After controlling for baseline utility, the difference between CBT and SMC was 0.05 (95% CI 0.01 to 0.09)” – may well just be a type I error. A re-analysis of these data is warranted (although gaining access may yet still be hard).

If there actually was no real benefit from the new treatments, then benefits have been lost from elsewhere in the healthcare system. If we assume the NHS achieves £20,000/QALY (contentious I know!) then the health service loses 0.05 QALYs for each patient with chronic fatigue syndrome put on the new treatment. The prevalence of chronic fatigue syndrome may be as high as 0.2% among adults in England, which represents approximately 76,000 people. If all of these were switched to new, ineffective treatments, the opportunity cost could potentially be as much as 3,800 QALYs.

The key point is that analytical errors have costs if the analyses go on to lead to changes in recommended treatments. And when averaged over a national health service these costs could become quite substantial. Researchers may worry about publication prestige or fairness in using other people’s hard won data, but the bigger issue is the wider costs of letting an error go unchallenged.


What’s going on at the New England Journal of Medicine?

Editorial policies between the top medical journals differ. Some take a ‘crusading’ view and campaign on contemporary health issues. The BMJ falls into this camp, although this has sometimes led them to take political positions that might be contrary to the evidence. Nevertheless, the editorial agenda of the BMJ is clear, readers know what they are backing. The NEJM on the other hand seems to have adopted a more opaque position.

On the face of it the NEJM seems to support a position of ‘if a randomised controlled trial (RCT) has been conducted and it’s published then that’s the last word on the matter’. Some recent examples illustrate this. Ben Goldacre and colleagues in the COMPARE project received a dismissal of their letters penned to the NEJM that expressed concerns over trials that had not reported on the primary outcomes specified in their protocols or reported different outcomes. The New York Times reports on potential flaws or even misconduct in a mega trial of Xarelto, an anticlotting drug, for which the manufacturers are currently being sued. The NEJM, which published the trial, dismissed the relevance of the claims and defended the trial. And, in a recent, controversial editorial, the NEJM appears to endorse the view that researchers who re-analyse trial data from other studies are ‘research parasites’.

This view is not unique to the NEJM. It reflects a broader view that RCTs are definitive and research in top impact factor journals more so. But, scientists are fallible, and RCTs can be flawed and present biased results. For example, in a study of the top four medical journals 95% of RCTs had some missing data, with a median percentage of 9% dropout, and in many cases adequate missing data methods were not used. Publication should not be the final stage of a piece of research but part of an ongoing process.

Part of the problem may lie with the false dichotomy imposed by hypothesis testing and statistical significance. A treatment either works or does not work or it is safe or it is not safe. But, for the most part, these tests are based solely on asking whether the data are compatible with the hypothesis or whether it’s unlikely. All the other forms of uncertainty are not taken into account such as missing data, a lack of adequate allocation concealment, or a lack of double blinding. The researchers could have chosen any number of different tests or comparisons and the choice could be contingent on the data, potentially rendering the p-value meaningless.

Results from RCTs are used to make important clinical and policy decisions. Scrutiny and debate are essential to ensure that the best decisions are made. This includes allowing for an appropriate representation of the uncertainty surrounding a decision. The trust endowed by a high impact factor should bring a responsibility to ensure that well founded critical or dissenting views on published research are appropriately represented. RCTs should be subject to as much scrutiny as any other form of research. Vioxx should serve as an important reminded of this.