A good illustration of the muddles that p-values can get us in appeared recently on HealthNewsReview.com. HealthNewsReview examines and debunks the often hyped-up claims about medicines that appear in the media. But last week they “called BS” on a claim on Novartis’ website for the drug Everolimus. Novartis claimed that in a recent trial Everolimus demonstrated benefits that were “not statistically significant but clinically meaningful.” HealthNewsReview writes:
When results aren’t statistically significant, researchers can’t be sufficiently confident that any benefit they observed is real. Such findings are considered speculative until confirmed by other studies.
Sometimes, a result that was initially “not significant” might well reach the threshold of significance in a bigger study group with more patients, which is what this promotional material seems to anticipate.
And they quote a biostatistician further on:
A result that is statistically insignificant is not meaningful, period. Thus, we cannot say a result is statistically insignificant and clinically meaningful at the same time.
A null hypothesis significance testing (NHST) framework aims to determine whether the data are compatible with a model that the coefficient on the treatment is exactly zero. For the Everolimus trial, the t-test did not hit the magical 1.96 threshold and so it has been concluded there either was an effect of exactly zero or there was insufficient power. Hence it is “not meaningful”.
This is where the problems of NHST become obvious. Everolimus is an mTOR inhibitor, a class of drugs under active development for the treatment of cancer. Hyperactivation of mTOR signalling in cancer has been widely observed and various preclinical trials have shown promising results (see more here). So why one should expect it to have an effect of exactly zero, I can’t say.
Perhaps more importantly, this is where the irrelevance of inference rears its head. A decision to use Everolimus has to be made. It cannot be deferred and the only reason we use other treatments at this point in time is an accident of history. As we discussed recently, all that matters for these decisions is the (posterior) mean net benefits. Although, in the US, costs information has been outlawed because of those pernicious “death panels”. Decisions are made on the basis of “comparative effectiveness“. But even in this case, the above comments on Everolimus do not follow this logic, seeming to imply: (i) in the absence of statistical significance we have learned nothing from the data to inform our decision; and (ii) we should only choose to implement technologies that have demonstrated statistical significance. If taken as true then we have no choice but to conflate clinical and statistical significance, since apparently we cannot conclude something is clinically significant unless it is also statistically significant. This goes against all sound advice.
I write this without good knowledge of the original Everolimus trial. It may well be flawed. Industry funded research can be biased. Indeed, it is the design and conduct of the trial should be the basis for a reasonable critique of its findings, not its statistical significance.
HealthNewsReview is a typically excellent reviewer of claims derived from often flawed studies. Their conflation of statistical and clinical significance here though is by no means unique, being used by many regulatory agencies around the world. This just goes to show how the p-value can continue to distract from a sound decision making process in health care.
I think it should be mentioned here that the main problem with a failure to reject the null is that it is difficult to interpret unless one does not know the power of the test. See https://statlect.com/glossary/null-hypothesis
[…] the magnitude of estimated coefficients. For example, we’ve written previously about a claim of how statistically insignificant results are ‘meaningless’. Another common error is to ‘transpose the conditional’, that is to interpret […]