Sam Watson’s journal round-up for 6th May 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Channeling Fisher: randomisation tests and the statistical insignificance of seemingly experimental results. Quarterly Journal of Economics Published May 2019

Anyone who pays close attention to the statistics literature may feel that a paradigm shift is underway. While papers cautioning on the use of null hypothesis significance testing (NHST) have been published for decades, a number of articles in recent years have highlighted large numbers of problems in published studies. For example, only 39% of replications of 100 experiments in social psychology were considered successful. Publication in prestigious journals like Science and Nature is no guarantee of replicability either. There is a growing number of voices calling for improvements in study reporting and conduct, changes to use of p-values or even their abandonment altogether.

Some of the failures of studies using NHST methods are due to poor experimental design, poorly defined interventions, or “noise-mining”. But even well-designed experiments that are theoretically correctly analysed are not immune from false inferences in the NHST paradigm. This article looks at the reliability of statistical significance claims in 53 experimental studies published in the journals of the American Economic Association.

Statistical significance is typically determined in experimental economic papers using the econometric techniques widely taught to all economics students. In particular, the t-statistic of a regression coefficient is calculated using either homoskedastic or robust standard errors, which is then compared to a t-distribution with the appropriate degrees of freedom. An alternative method to determine p-values is a permutation or randomisation test, which we have featured in a previous Method of the Month. The permutation test provides the exact distribution of the test statistic and is therefore highly reliable. This article compares results from permutation tests the author conducts to the reported p-values in the 53 selected experimental studies. It finds between 13% and 22% fewer statistically significant results than reported in the papers and in tests of multiple treatment effects, 33% to 49% fewer.

This discrepancy is explained in part by the leverage of certain observations in each study. Results are often sensitive to the removal of single observations. The more of an impact an observation has, the greater its leverage; in balanced experimental designs leverage is uniformly distributed. In regressions with multiple treatments and treatment interactions leverage becomes concentrated and standard errors become volatile. Needless to say, this article presents yet another piece of compelling evidence that NHST is unreliable and strengthens the case for abandoning statistical significance as the primary inferential tool.

Effect of a resuscitation strategy targeting peripheral perfusion status vs serum lactate levels on 28-day mortality among patients with septic shock. The ANDROMEDA-SHOCK randomized clinical trial. Journal of the American Medical Association [PubMed] Published 17th February 2019

This article gets a mention in this round-up not for its health or economic content but because it is a very good example how not to use statistical significance. In previous articles on the blog we’ve discussed the misuse and misinterpretation of p-values, but I generally don’t go as far as advocating their complete abandonment as a recent mass-signed letter in Nature has. What is crucial is that researchers stop making the mistake that statistical insignificance means no effect. Making this error can lead to pernicious consequences when it comes to patient treatment and the lack of adoption of effective and cost-effective technologies, which is exactly what this article does.

I first saw this ridiculous use of statistical significance when it was Tweeted by David Spiegelhalter. The trial (in JAMA, no less) compares two different methods of managing resuscitation in patients with septic shock. The key result is:

By day 28, 74 patients (34.9%) in the peripheral perfusion group and 92 patients (43.4%) in the lactate group had died (hazard ratio, 0.75 [95% CI, 0.55 to 1.02]; P = .06; risk difference, −8.5% [95% CI, −18.2% to 1.2%]).

And the conclusion?

Among patients with septic shock, a resuscitation strategy targeting normalization of capillary refill time, compared with a strategy targeting serum lactate levels, did not reduce all-cause 28-day mortality.


Which is determined solely on the basis of statistical significance. Certainly it is possible that the result is just chance variation. But the study was conducted because it was believed that there was a difference in survival between these methods, and a 25% reduction in mortality risk is significant indeed. Rather than take an abductive or Bayesian approach, which would see this result as providing some degree of evidence in support of one treatment, the authors abandon any attempt at thinking and just mechanically follow statistical significance logic. This is a good case study for anyone wanting to discuss interpretation of p-values, but more significantly (every pun intended) the reliance on statistical significance may well be jeopardising patient lives.

Value of information: sensitivity analysis and research design in Bayesian evidence synthesis. Journal of the American Statistical Association Published 30th April 2019.

Three things are necessary to make a decision in the decision theoretical sense. First, a set of possible decisions; second, a set of parameters describing the state of the world; and third, a loss (or utility) function. Given these three things the decision that is chosen is the one that minimises losses (or maximises utility) given the state of the world. Of course, the state of the world may not be known for sure. There can be some uncertainty about the parameters and hence the best course of action, which might lead to losses relative to the decision we would make if we knew everything perfectly. Thus, we can determine the benefits of collecting more information. This is the basis of value of information (VoI) analysis.

We can distinguish between different quantities of interest in VoI analyses. The expected value of perfect information (EVPI) is the difference in the expected loss under the optimal decision made with current information, and the expected loss under the decision we would make if we knew all the parameters exactly. The expected value of partial perfect information (EVPPI) is similar to the previous definition expect it considers only the difference to if we knew one of the parameters exactly. Finally, the expected value of sample information (EVSI) compares the losses under our current decision to those under the decision we would make if we had the information on our parameters from a particular study design. If we know the costs of conducting a given study then we can take the benefits estimated in the EVSI to get the expected net benefit of sampling.

Calculating EVPPI and EVSI is no easy feat though, particularly for more complex models. This article proposes a relatively straightforward and computationally feasible way of estimating these quantities for complex evidence synthesis models. For their example they use a model commonly used to estimate overall HIV prevalence. Since not all HIV cases are known or disclosed, one has to combine different sets of data to get to a reliable estimate. For example, it is known how many people attend sexual health clinics and what proportion of those have HIV, so it is also known how many do not attend sexual health clinics just not how many of those might be HIV positive. There are many epidemiological parameters in this complex model and the aim of the paper is to demonstrate how the principle sources of uncertainty can be determined in terms of EVPPI and EVSI.

Credits

Rita Faria’s journal round-up for 28th January 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Appraising the value of evidence generation activities: an HIV modelling study. BMJ Global Health [PubMed] Published 7th December 2018

How much should we spend on implementing our health care strategy versus getting more information to devise a better strategy? Should we devolve budgets to regions or administer the budget centrally? These are difficult questions and this new paper by Beth Woods et al has a brilliant stab at answering them.

The paper looks at the HIV prevention and treatment policies in Zambia. It starts by finding the most cost-effective strategy and the corresponding budget in each region, given what is currently known about the prevalence of the infection, the effectiveness of interventions, etc. The idea is that the regions receive a cost-effective budget to implement a cost-effective strategy. The issue is that the cost-effective strategy and budget are devised according to what we currently know. In practice, regions might face a situation on the ground which is different from what was expected. Regions might not have enough budget to implement the strategy or might have some leftover.

What if we spend some of the budget to get more information to make a better decision? This paper considers the value of perfect information given the costs of research. Depending on the size of the budget and the cost of research, it may be worthwhile to divert some funds to get more information. But what if we had more flexibility in the budgetary policy? This paper tests 2 more budgetary options: a national hard budget but with the flexibility to transfer funds from under- to overspending regions, and a regional hard budget with a contingency fund.

The results are remarkable. The best budgetary policy is to have a national budget with the flexibility to reallocate funds across regions. This is a fascinating paper, with implications not only for prioritisation and budget setting in LMICs but also for high-income countries. For example, the 2012 Health and Social Care Act broke down PCTs into smaller CCGs and gave them hard budgets. Some CCGs went into deficit, and there are reports that some interventions have been cut back as a result. There are probably many reasons for the deficit, but this paper shows that hard regional budgets clearly have negative consequences.

Health economics methods for public health resource allocation: a qualitative interview study of decision makers from an English local authority. Health Economics, Policy and Law [PubMed] Published 11th January 2019

Our first paper looked at how to use cost-effectiveness to allocate resources between regions and across health care services and research. Emma Frew and Katie Breheny look at how decisions are actually made in practice, but this time in a local authority in England. Another change of the 2012 Health and Social Care Act was to move public health responsibilities from the NHS to local authorities. Local authorities are now given a ring-fenced budget to implement cost-effective interventions that best match their needs. How do they make decisions? Thanks to this paper, we’re about to find out.

This paper is an enjoyable read and quite an eye-opener. It was startling that health economics evidence was not much used in practice. But the barriers that were cited are not insurmountable. And the suggestions by the interviewees were really useful. There were suggestions about how economic evaluations should consider the local context to get a fair picture of the impact of the intervention to services and to the population, and to move beyond the trial into the real world. Equity was mentioned too, as well as broadening the outcomes beyond health. Fortunately, the health economics community is working on many of these issues.

Lastly, there was a clear message to make economic evidence accessible to lay audiences. This is a topic really close to my heart, and something I’d like to help improve. We have to make our work easy to understand and use. Otherwise, it may stay locked away in papers rather than do what we intended it for. Which is, at least in my view, to help inform decisions and to improve people’s lives.

I found this paper reassuring in that there is clearly a need for economic evidence and a desire to use it. Yes, there are some teething issues, but we’re working in the right direction. In sum, the future for health economics is bright!

Survival extrapolation in cancer immunotherapy: a validation-based case study. Value in Health Published 13th December 2018

Often, the cost-effectiveness of cancer drugs hangs in the method to extrapolate overall survival. This is because many cancer drugs receive their marketing authorisation before most patients in the trial have died. Extrapolation is tested extensively in the sensitivity analysis, and this is the subject of many discussions in NICE appraisal committees. Ultimately, at the point of making the decision, the correct method to extrapolate is a known unknown. Only in hindsight can we know for sure what the best choice was.

Ash Bullement and colleagues take advantage of hindsight to know the best method for extrapolation of a clinical trial of an immunotherapy drug. Survival after treatment with immunotherapy drugs is more difficult to predict because some patients can survive for a very long time, while others have much poorer outcomes. They fitted survival models to the 3-year data cut, which was available at the time of the NICE technology appraisal. Then they compared their predictions to the observed survival in the 5-year data cut and to long-term survival trends from registry data. They found that the piecewise model and a mixture-cure model had the best predictions at 5 years.

This is a relevant paper for those of us who work in the technology appraisal world. I have to admit that I can be sceptical of piecewise and mixture-cure models, but they definitely have a role in our toolbox for survival extrapolation. Ideally, we’d have a study like this for all the technology appraisals hanging on the survival extrapolation so that we can take learnings across cancers and classes of drugs. With time, we would get to know more about what works best for which condition or drug. Ultimately, we may be able to get to a stage where we can look at the extrapolation with less inherent uncertainty.

Credits

Rita Faria’s journal round-up for 10th December 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Calculating the expected value of sample information using efficient nested Monte Carlo: a tutorial. Value in Health [PubMed] Published 17th July 2018

The expected value of sample information (EVSI) represents the added benefit from collecting new information on specific parameters in future studies. It can be compared to the cost of conducting these future studies to calculate the expected net benefit of sampling. The objective is to help inform which study design is best, given the information it can gather and its costs. The theory and methods to calculate EVSI have been around for some time, but we rarely see it in applied economic evaluations.

In this paper, Anna Heath and Gianluca Baio present a tutorial about how to implement a method they had previously published on, which is more computationally efficient than the standard nested Monte Carlo simulations.

The authors start by explaining the method in theory, then illustrate it with a simple worked example. I’ll admit that I got a bit lost with the theory, but I found that the example made it much clearer. They demonstrate the method’s performance using a previously published cost-effectiveness model. Additionally, they have very helpfully published a suite of functions to apply this method in practice.

I really enjoyed reading this paper, as it takes the reader step-by-step through the method. However, I wasn’t sure about when this method is applicable, given that the authors note that it requires a large number of probabilistic simulations to perform well, and it is only appropriate when EVPPI is high. The issue is, how large is large and how high is high? Hopefully, these and other practical questions are on the list for this brilliant research team.

As an applied researcher, I find tutorial papers such as this one incredibly useful to learn new methods and help implement them in practice. Thanks to work such as this one and others, we’re getting close to making value of information analysis a standard element of cost-effectiveness studies.

Future costs in cost-effectiveness analyses: past, present, future. PharmacoEconomics [PubMed] Published 26th November 2018

Linda de Vries, Pieter van Baal and Werner Brouwer help illuminate the debate on future costs with this fascinating paper. Future costs are the costs of resources used by patients during the years of life added by the technology under evaluation. Future costs can be distinguished between related or unrelated, depending on whether the resources are used for the target disease. They can also be distinguished between medical or non-medical, depending on whether the costs fall on the healthcare budget.

The authors very skilfully summarise the theoretical literature on the inclusion of future costs. They conclude that future related and unrelated medical costs should be included and present compelling arguments to do so.

They also discuss empirical research, such as studies that estimate future unrelated costs. The references are a useful starting point for other researchers. For example, I noted that there is a tool to include future unrelated medical costs in the Netherlands and some studies on their estimation in the UK (see, for example, here).

There is a thought-provoking section on ethical concerns. If unrelated costs are included, technologies that increase the life expectancy of people who need a lot of resources will look less cost-effective. The authors suggest that these issues should not be concealed in the analysis, but instead dealt with in the decision-making process.

This is an enjoyable paper that provides an overview of the literature on future costs. I highly recommend it to get up to speed with the arguments and the practical implications. There is clearly a case for including future costs, and the question now is whether the cost-effectiveness practice follows suit.

Cost-utility analysis using EQ-5D-5L data: does how the utilities are derived matter? Value in Health Published 4th July 2018

We’ve recently become spoilt for choice when it comes to the EQ-5D. To obtain utility values, just in the UK, there are a few options: the 3L tariff, the 5L tariff, and crosswalk tariffs by Ben van Hout and colleagues and Mónica Hernandez and colleagues [PDF]. Which one to choose? And does it make any difference?

Fan Yang and colleagues have done a good job in getting us closer to the answer. They estimated utilities obtained from EQ-5D-5L data using the 5L value set and crosswalk tariffs to EQ-5D-3L and tested the values in cost-effectiveness models of hemodialysis compared to peritoneal dialysis.

Reassuringly, hemodialysis had always greater utilities than peritoneal dialysis. However, the magnitude of the difference varied with the approach. Therefore, using either EQ-5D-5L or the crosswalk tariff to EQ-5D-3L can influence the cost-effectiveness results. These results are in line with earlier work by Mónica Hernandez and colleagues, who compared the EQ-5D-3L with the EQ-5D-5L.

The message is clear in that both the type of EQ-5D questionnaire and the EQ-5D tariff makes a difference to the cost-effectiveness results. This can have huge policy implications as decisions by HTA agencies, such as NICE, depend on these results.

Which EQ-5D-5L to use in a new primary research study remains an open question. In the meantime, NICE recommends the use of the EQ-5D-3L or, if EQ-5D-5L was collected, Ben van Hout and colleagues’ mapping function to the EQ-5D-3L. Hopefully, a definite answer won’t be long in coming.

Credits