Poor statistical communication means poor statistics

Statistics is a broad and complex field. For a given research question any number of statistical approaches could be taken. In an article published last year, researchers asked 61 analysts to use the same dataset to address the question of whether referees were more likely to give dark skinned players a red card than light skinned players. They got 61 different responses. Each analysis had its advantages and disadvantages and I’m sure each analyst would have defended their work. However, as many statisticians and economists may well know, the merit of an approach is not the only factor that matters in its adoption.

There has, for decades, been criticism about the misunderstanding and misuse of null hypothesis significance testing (NHST). P-values have been a common topic on this blog. Despite this, NHST remains the predominant paradigm for most statistical work. If used appropriately this needn’t be a problem, but if it were being used appropriately it wouldn’t be used nearly as much: p-values can’t perform the inferential role many expect of them. It’s not difficult to understand why things are this way: most published work uses NHST, we teach students NHST in order to understand the published work, students become researchers who use NHST, and so on. Part of statistical education involves teaching the arbitrary conventions that have gone before such as that p-values are ‘significant’ if below 0.05 or a study is ‘adequately powered’ if power is above 80%. One of the most pernicious consequences of this is that these heuristics become a substitute for thinking. The presence of these key figures is expected and their absence often marked by a request from reviewers and other readers for their inclusion.

I have argued on this blog and elsewhere for a wider use of Bayesian methods (and less NHST) and I try to practice what I preach. For an ongoing randomised trial I am involved with, I adopted a Bayesian approach to design and analysis. Instead of the usual power calculation, I conducted a Bayesian assurance analysis (which Anthony O’Hagan has written some good articles on for those wanting more information). I’ll try to summarise the differences between ‘power’ and ‘assurance’ calculations by attempting to define them, which is actually quite hard!

Power calculation. If we were to repeat a trial infinitely many times, what sample size would we need so that in x% of trials the assumed data generating model produces data which would fall in the α% most extreme quantiles of the distribution of data that would be produced from the same data generating model but with one parameter set to exactly zero (or any equivalent hypothesis). Typically we set x%to be 80% (power) and α% to be 5% (statistical significance threshold).

Assurance calculation. For a given data generating model, what sample size do we need so that there is a x% probability that we will be 1-α% certain that the parameter is positive (or any equivalent choice).

The assurance calculation could be reframed in a decision framework as what sample size do we need so that there is a x% probability we will make the right decision about whether a parameter is positive (or any equivalent decision) given the costs of making the wrong decision.

Both of these are complex but I would argue it is the assurance calculation that gives us what we want to know most of the time when designing a trial. The assurance analysis also better represents uncertainty since we specify distributions over all the uncertain parameters rather than choose exact values. Despite this though, the funder of the trial mentioned above, who shall remain nameless, insisted on the results of a power calculation in order to be able to determine whether the trial was worth continuing with because that’s “what they’re used to.”

The main culprit for this issue is, I believe, communication. A simpler explanation with better presentation may have been easier to understand and accept. This is not to say that I do not believe the funder was substituting the heuristic ‘80% or more power = good’ for actually thinking about what we could learn from the trial. But until statisticians, economists, and other data analytic researchers start communicating better, how can we expect others to listen?

Image credit: Geralt

Rita Faria’s journal round-up for 13th May 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Communicating uncertainty about facts, numbers and science. Royal Society Open Science Published 8th May 2019

This remarkable paper by Anne Marthe van der Bles and colleagues, including the illustrious David Spiegelhalter, covers two of my most favourite topics: communication and uncertainty. They focused on epistemic uncertainty. That is, the uncertainty about facts, numbers and science due to limited knowledge (rather than due to the randomness of the world). This is what we could know more about, if we spent more resources in finding it out.

The authors propose a framework for communicating uncertainty and apply it to two case studies, one in climate change and the other in economic statistics. They also review the literature on the effect of communicating uncertainty. It is so wide-ranging and exhaustive that, if I have any criticism, its 42 pages are not conducive to a leisurely read.

I found the distinction between direct and indirect uncertainty fascinating and incredibly relevant to health economics. Direct uncertainty is about the precision of the evidence whilst indirect uncertainty is about its quality. For example, evidence based on a naïve comparison of patients in a Phase 2 trial with historical controls in another country (yup, this happens!).

So, how should we communicate the uncertainty in our findings? I’m afraid that this paper is not a practical guide but rather a brilliant ground clearing exercise on how to start thinking about this. Nevertheless Box 5 (p35) does give some good advice! I do hope this paper kick-starts research on how to explain uncertainty beyond an academic audience. Looking forward to more!

Was Brexit triggered by the old and unhappy? Or by financial feelings? Journal of Economic Behavior & Organization [RePEc] Published 18th April 2019

Not strictly health economics – although arguably Brexit affects our health – is this impressive study about the factors that contributed to the Leave win in the Brexit referendum. Federica Liberini and colleagues used data from the Understanding Society survey to look at the predictors of people’s views about whether or not the UK should leave the EU. The main results are from a regression on whether or not a person was pro-Brexit, regressed on life satisfaction, their feelings on their financial situation, and other characteristics.

Their conclusions are staggering. They found that people’s views were generally unrelated to their age, their life satisfaction or their income. Instead, it was a person’s feelings about their financial situation that was the strongest predictor. For economists, it may be a bit cringe-worthy to see OLS used for a categorical dependent variable. But to be fair, the authors mention that the results are similar with non-linear models and they report extensive supplementary analyses. Remarkably, they’re making the individual level data available on the 18th of June here.

As the authors discuss, it is not clear if we’re looking at predictive estimates of characteristics related to pro-Brexit feeling or at causal estimates of factors that led to the pro-Brexit feeling. That is, if we could improve someone’s perceived financial situation, would we reduce their probability of feeling pro-Brexit? In any case, the message is clear. Feelings matter!

How does treating chronic hepatitis C affect individuals in need of organ transplants in the United Kingdom? Value in Health Published 8th March 2019

Anupam Bapu Jena and colleagues looked at the spillover benefits of curing hepatitis C given its consequences on the supply and demand of liver and other organs for transplant in the UK. They compare three policies: the status quo, in which there is no screening for hepatitis C and organ donation by people with hepatitis C is rare; universal screen and treat policy where cured people opt-in for organ donation; and similarly, but with opt-out for organ donation.

To do this, they adapted a previously developed queuing model. For the status quo, the model inputs were estimated by calibrating the model outputs to reported NHS performance. They then changed the model inputs to reflect the anticipated impact of the new policies. Importantly, they assumed that all patients with hepatitis C would be cured and no longer require a transplanted organ; conversely, that cured patients would donate organs at similar rates to the general population. They predict that curing hepatitis C would directly reduce the waiting list for organ transplants by reducing the number of patients needing them. Also, there would be an indirect benefit via increasing their availability to other patients. These consequences aren’t typically included in the cost-effectiveness analysis of treatments for hepatitis C, which means that their comparative benefits and costs may not be accurate.

Keeping in the theme of uncertainty, it was disappointing that the paper does not include some sort of confidence bounds on its results nor does it present sensitivity analysis to their assumptions, which in my view, were quite favourable towards a universal screen and test policy. This is an interesting application of a queuing model, which is something I don’t often see in cost-effectiveness analysis. It is also timely and relevant, given the recent drive by the NHS to eliminate hepatitis C. In a few years’ time, we’ll hopefully know to what extent the predicted spillover benefits were realised.