Rita Faria’s journal round-up for 2nd September 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ [PubMed] Published 28th August 2019

RCTs are the gold standard primary study to estimate the effect of treatments but are often far from perfect. The question is the extent to which their flaws make a difference to the results. Well, RoB 2 is your new best friend to help answer this question.

Developed by a star-studded team, the RoB 2 is the update to the original risk of bias tool by the Cochrane Collaboration. Bias is assessed by outcome, rather than for the whole RCT. For me, this makes sense.  For example, the primary outcome may be well reported, yet the secondary outcome, which may be the outcome of interest for a cost-effectiveness model, much less so.

Bias is considered in terms of 5 domains, with the overall risk of bias usually corresponding to the worst risk of bias in any of the domains. This overall risk of bias is then reflected in the evidence synthesis, with, for example, a stratified meta-analysis.

The paper is a great read! Jonathan Sterne and colleagues explain the reasons for the update and the process that was followed. Clearly, there was quite a lot of thought given to the types of bias and to develop questions to help reviewers assess it. The only downside is that it may require more time to apply, given that it needs to be done by outcome. Still, I think that’s a price worth paying for more reliable results. Looking forward to seeing it in use!

Characteristics and methods of incorporating randomised and nonrandomised evidence in network meta-analyses: a scoping review. Journal of Clinical Epidemiology [PubMed] Published 3rd May 2019

In keeping with the evidence synthesis theme, this paper by Kathryn Zhang and colleagues reviews how the applied literature has been combining randomised and non-randomised evidence. The headline findings are that combining these two types of study designs is rare and, when it does happen, naïve pooling is the most common method.

I imagine that the limited use of non-randomised evidence is due to its risk of bias. After all, it is difficult to ensure that the measure of association from a non-randomised study is an estimate of a causal effect. Hence, it is worrying that the majority of network meta-analyses that did combine non-randomised studies did so with naïve pooling.

This scoping review may kick start some discussions in the evidence synthesis world. When should we combine randomised and non-randomised evidence? How best to do so? And how to make sure that the right methods are used in practice? As a cost-effectiveness modeller, with limited knowledge of evidence synthesis, I’ve grappled with these questions myself. Do get in touch if you have any thoughts.

A cost-effectiveness analysis of shortened direct-acting antiviral treatment in genotype 1 noncirrhotic treatment-naive patients with chronic hepatitis C virus. Value in Health [PubMed] Published 17th May 2019

Rarely we see a cost-effectiveness paper where the proposed intervention is less costly and less effective, that is, in the controversial southwest quadrant. This exceptional paper by Christopher Fawsitt and colleagues is a welcome exception!

Christopher and colleagues looked at the cost-effectiveness of shorter treatment durations for chronic hepatitis C. Compared with the standard duration, the shorter treatment is not as effective, hence results in fewer QALYs. But it is much cheaper to treat patients over a shorter duration and re-treat those patients who were not cured, rather than treat everyone with the standard duration. Hence, for the base-case and for most scenarios, the shorter treatment is cost-effective.

I’m sure that labelling a less effective and less costly option as cost-effective may have been controversial in some quarters. Some may argue that it is unethical to offer a worse treatment than the standard even if it saves a lot of money. In my view, it is no different from funding better and more costlier treatments, given that the savings will be borne by other patients who will necessarily have access to fewer resources.

The paper is beautifully written and is another example of an outstanding cost-effectiveness analysis with important implications for policy and practice. The extensive sensitivity analysis should provide reassurance to the sceptics. And the discussion is clever in arguing for the value of a shorter duration in resource-constrained settings and for hard to reach populations. A must read!

Credits

Sam Watson’s journal round-up for 6th May 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Channeling Fisher: randomisation tests and the statistical insignificance of seemingly experimental results. Quarterly Journal of Economics Published May 2019

Anyone who pays close attention to the statistics literature may feel that a paradigm shift is underway. While papers cautioning on the use of null hypothesis significance testing (NHST) have been published for decades, a number of articles in recent years have highlighted large numbers of problems in published studies. For example, only 39% of replications of 100 experiments in social psychology were considered successful. Publication in prestigious journals like Science and Nature is no guarantee of replicability either. There is a growing number of voices calling for improvements in study reporting and conduct, changes to use of p-values or even their abandonment altogether.

Some of the failures of studies using NHST methods are due to poor experimental design, poorly defined interventions, or “noise-mining”. But even well-designed experiments that are theoretically correctly analysed are not immune from false inferences in the NHST paradigm. This article looks at the reliability of statistical significance claims in 53 experimental studies published in the journals of the American Economic Association.

Statistical significance is typically determined in experimental economic papers using the econometric techniques widely taught to all economics students. In particular, the t-statistic of a regression coefficient is calculated using either homoskedastic or robust standard errors, which is then compared to a t-distribution with the appropriate degrees of freedom. An alternative method to determine p-values is a permutation or randomisation test, which we have featured in a previous Method of the Month. The permutation test provides the exact distribution of the test statistic and is therefore highly reliable. This article compares results from permutation tests the author conducts to the reported p-values in the 53 selected experimental studies. It finds between 13% and 22% fewer statistically significant results than reported in the papers and in tests of multiple treatment effects, 33% to 49% fewer.

This discrepancy is explained in part by the leverage of certain observations in each study. Results are often sensitive to the removal of single observations. The more of an impact an observation has, the greater its leverage; in balanced experimental designs leverage is uniformly distributed. In regressions with multiple treatments and treatment interactions leverage becomes concentrated and standard errors become volatile. Needless to say, this article presents yet another piece of compelling evidence that NHST is unreliable and strengthens the case for abandoning statistical significance as the primary inferential tool.

Effect of a resuscitation strategy targeting peripheral perfusion status vs serum lactate levels on 28-day mortality among patients with septic shock. The ANDROMEDA-SHOCK randomized clinical trial. Journal of the American Medical Association [PubMed] Published 17th February 2019

This article gets a mention in this round-up not for its health or economic content but because it is a very good example how not to use statistical significance. In previous articles on the blog we’ve discussed the misuse and misinterpretation of p-values, but I generally don’t go as far as advocating their complete abandonment as a recent mass-signed letter in Nature has. What is crucial is that researchers stop making the mistake that statistical insignificance means no effect. Making this error can lead to pernicious consequences when it comes to patient treatment and the lack of adoption of effective and cost-effective technologies, which is exactly what this article does.

I first saw this ridiculous use of statistical significance when it was Tweeted by David Spiegelhalter. The trial (in JAMA, no less) compares two different methods of managing resuscitation in patients with septic shock. The key result is:

By day 28, 74 patients (34.9%) in the peripheral perfusion group and 92 patients (43.4%) in the lactate group had died (hazard ratio, 0.75 [95% CI, 0.55 to 1.02]; P = .06; risk difference, −8.5% [95% CI, −18.2% to 1.2%]).

And the conclusion?

Among patients with septic shock, a resuscitation strategy targeting normalization of capillary refill time, compared with a strategy targeting serum lactate levels, did not reduce all-cause 28-day mortality.


Which is determined solely on the basis of statistical significance. Certainly it is possible that the result is just chance variation. But the study was conducted because it was believed that there was a difference in survival between these methods, and a 25% reduction in mortality risk is significant indeed. Rather than take an abductive or Bayesian approach, which would see this result as providing some degree of evidence in support of one treatment, the authors abandon any attempt at thinking and just mechanically follow statistical significance logic. This is a good case study for anyone wanting to discuss interpretation of p-values, but more significantly (every pun intended) the reliance on statistical significance may well be jeopardising patient lives.

Value of information: sensitivity analysis and research design in Bayesian evidence synthesis. Journal of the American Statistical Association Published 30th April 2019.

Three things are necessary to make a decision in the decision theoretical sense. First, a set of possible decisions; second, a set of parameters describing the state of the world; and third, a loss (or utility) function. Given these three things the decision that is chosen is the one that minimises losses (or maximises utility) given the state of the world. Of course, the state of the world may not be known for sure. There can be some uncertainty about the parameters and hence the best course of action, which might lead to losses relative to the decision we would make if we knew everything perfectly. Thus, we can determine the benefits of collecting more information. This is the basis of value of information (VoI) analysis.

We can distinguish between different quantities of interest in VoI analyses. The expected value of perfect information (EVPI) is the difference in the expected loss under the optimal decision made with current information, and the expected loss under the decision we would make if we knew all the parameters exactly. The expected value of partial perfect information (EVPPI) is similar to the previous definition expect it considers only the difference to if we knew one of the parameters exactly. Finally, the expected value of sample information (EVSI) compares the losses under our current decision to those under the decision we would make if we had the information on our parameters from a particular study design. If we know the costs of conducting a given study then we can take the benefits estimated in the EVSI to get the expected net benefit of sampling.

Calculating EVPPI and EVSI is no easy feat though, particularly for more complex models. This article proposes a relatively straightforward and computationally feasible way of estimating these quantities for complex evidence synthesis models. For their example they use a model commonly used to estimate overall HIV prevalence. Since not all HIV cases are known or disclosed, one has to combine different sets of data to get to a reliable estimate. For example, it is known how many people attend sexual health clinics and what proportion of those have HIV, so it is also known how many do not attend sexual health clinics just not how many of those might be HIV positive. There are many epidemiological parameters in this complex model and the aim of the paper is to demonstrate how the principle sources of uncertainty can be determined in terms of EVPPI and EVSI.

Credits

Transformative treatments: a big methodological challenge for health economics

Social scientists, especially economists, are concerned with causal inference: understanding whether and how an event causes a certain effect. Typically, we subscribe to the view that causal relations are reducible to sets of counterfactuals, and we use ever more sophisticated methods, such as instrumental variables and propensity score matching, to estimate these counterfactuals. Under the right set of assumptions, like that unobserved differences between study subjects are time invariant or that a treatment causes its effect through a certain mechanism, we can derive estimators for average treatment effects. All uncontroversial stuff indeed.

A recent paper from L.A. Paul and Kieran Healy introduces an argument of potential importance to how we can interpret studies investigating causal relations. In particular, they make the argument that we don’t know if individual preferences persist in a study through treatment. It is in general not possible to distinguish between the case where a treatment has satisfied an underlying revealed preference, or transformed an individual’s preferences. If preferences are changed or transformed, rather than revealed, then they are, in effect, a different population and in a causal inference type study, no longer comparable to the control population.

To quote their thought experiment:

Vampires: In the 21st century, vampires begin to populate North America. Psychologists decide to study the implications this could have for the human population. They put out a call for undergraduates to participate in a randomized controlled experiment, and recruit a local vampire with scientific interests. After securing the necessary permissions, they randomize and divide their population of undergraduates into a control group and a treatment group. At t1, members of each group are given standard psychological assessments measuring their preferences about vampires in general and about becoming a vampire in particular. Then members of the experimental group are bitten by the lab vampire.

Members of both groups are left to go about their daily lives for a period of time. At t2, they are assessed. Members of the control population do not report any difference in their preferences at t2. All members of the treated population, on the other hand, report living richer lives, enjoying rewarding new sensory experiences, and having a new sense of meaning at t2. As a result, they now uniformly report very strong pro-vampire preferences. (Some members of the treatment group also expressed pro-vampire preferences before the experiment, but these were a distinct minority.) In exit interviews, all treated subjects also testify that they have no desire to return to their previous condition.

Should our psychologists conclude that being bitten by a vampire somehow satisfies people’s underlying, previously unrecognized, preferences to become vampires? No. They should conclude that being bitten by a vampire causes you to become a vampire (and thus, to prefer being one). Being bitten by a vampire and then being satisfied with the result does not satisfy or reveal your underlying preference to be a vampire. Being bitten by a vampire transforms you: it changes your preferences in a deep and fundamental way, by replacing your underlying human preferences with vampire preferences, no matter what your previous preferences were.

In our latest journal round-up, I featured a paper that used German reunification in 1989 as a natural experiment to explore the impact of novel food items in the market on consumption and weight gain. The transformative treatments argument comes into play here. Did reunification reveal the preferences of East Germans for the novel food stuffs, or did it change their preferences for foodstuffs overall due to the significant cultural change? If the latter case is true then West Germans do not constitute an appropriate control group. The causal mechanism at play is also important to the development of policy: for example, without reunification there may not have been any impact from novel food products.

This argument is also sometimes skirted around with regards to the valuing of health states. Should it be the preferences of healthy people, or the experienced utility of sick people, that determine health state values? Do physical trauma and disease reveal our underlying preferences for different health states, or do they transform us to have different preferences entirely? Any study looking at the effect of disease on health status or quality of life could not distinguish between the two. Yet the two cases are akin to using the same or different groups of people to do the valuation of health states.

Consider also something like estimating the impact of retirement on health and quality of life. If self-reported quality of life is observed to improve in one of these studies, we don’t know if that is because retirement has satisfied a pre-existing preference for the retired lifestyle, or retirement has transformed a person’s preferences. In the latter case, the appropriate control group to evaluate the causal effect of retirement is not non-retired persons.

Paul and Healy do not make their argument to try to prevent or undermine research in the social sciences, they interpret their conclusion as a “methodological challenge”. The full implications of the above arguments have not been explored but could be potentially great and new innovations in methodology to estimate average causal effects could be warranted. How this may be achieved, I’ll have to admit, I do not know.

Credits