Rita Faria’s journal round-up for 4th November 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The marginal benefits of healthcare spending in the Netherlands: estimating cost-effectiveness thresholds using a translog production function. Health Economics [PubMed] Published 30th August 2019

The marginal productivity of the healthcare sector or, as commonly known, the supply-side cost-effectiveness threshold, is a hot topic right now. A few years ago, we could only guess at the magnitude of health that was displaced by reimbursing expensive and not-that-beneficial drugs. Since the seminal work by Karl Claxton and colleagues, we have started to have a pretty good idea of what we’re giving up.

This paper by Niek Stadhouders and colleagues adds to this literature by estimating the marginal productivity of hospital care in the Netherlands. Spoiler alert: they estimated that hospital care generates 1 QALY for around €74,000 at the margin, with 95% confidence intervals ranging from €53,000 to €94,000. Remarkably, it’s close to the Dutch upper reference value for the cost-effectiveness threshold at €80,000!

The approach for estimation is quite elaborate because it required building QALYs and costs, and accounting for the effect of mortality on costs. The diagram in Figure 1 is excellent in explaining it. Their approach is different from the Claxton et al method, in that they corrected for the cost due to changes in mortality directly rather than via an instrumental variable analysis. To estimate the marginal effect of spending on health, they use a translog function. The confidence intervals are generated with Monte Carlo simulation and various robustness checks are presented.

This is a fantastic paper, which will be sure to have important policy implications. Analysts conducting cost-effectiveness analysis in the Netherlands, do take note.

Mixed-effects models for health care longitudinal data with an informative visiting process: a Monte Carlo simulation study. Statistica Neerlandica Published 5th September 2019

Electronic health records are the current big thing in health economics research, but they’re not without challenges. One issue is that the data reflects the clinical management, rather than a trial protocol. This means that doctors may test more severe patients more often. For example, people with higher cholesterol may get more frequent cholesterol tests. The challenge is that traditional methods for longitudinal data assume independence between observation times and disease severity.

Alessandro Gasparini and colleagues set out to solve this problem. They propose using inverse intensity of visit weighting within a mixed-methods model framework. Importantly, they provide a Stata package that includes the method. It’s part of the wide ranging and super-useful merlin package.

It was great to see how the method works with the directed acyclic graph. Essentially, after controlling for confounders, the longitudinal outcome and the observation process are associated through shared random effects. By assuming a distribution for the shared random effects, the model blocks the path between the outcome and the observation process. It makes it sound easy!

The paper goes through the method, compares it with other methods in the literature in a simulation study, and applies to a real case study. It’s a brilliant paper that deserves a close look by all of those using electronic health records.

Alternative approaches for confounding adjustment in observational studies using weighting based on the propensity score: a primer for practitioners. BMJ [PubMed] Published 23rd October 2019

Would you like to use a propensity score method but don’t know where to start? Look no further! This paper by Rishi Desai and Jessica Franklin provides a practical guide to propensity score methods.

They start by explaining what a propensity score is and how it can be used, from matching to reweighting and regression adjustment. I particularly enjoyed reading about the importance of conceptualising the target of inference, that is, what treatment effect are we trying to estimate. In the medical literature, it is rare to see a paper that is clear on whether it is average treatment effect or average treatment effect among the treated population.

I found the algorithm for method selection really useful. Here, Rishi and Jessica describe the steps in the choice of the propensity score method and recommend their preferred method for each situation. The paper also includes the application of each method to the example of dabigatran versus warfarin for atrial fibrillation. Thanks to the graphs, we can visualise how the distribution of the propensity score changes for each method and depending on the target of inference.

This is an excellent paper to those starting their propensity score analyses, or for those who would like a refresher. It’s a keeper!

Credits

Rita Faria’s journal round-up for 21st October 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Quantifying how diagnostic test accuracy depends on threshold in a meta-analysis. Statistics in Medicine [PubMed] Published 30th September 2019

A diagnostic test is often based on a continuous measure, e.g. cholesterol, which is dichotomised at a certain threshold to classify people as ‘test positive’, who should be treated, or ‘test negative’, who should not. In an economic evaluation, we may wish to compare the costs and benefits of using the test at different thresholds. For example, the cost-effectiveness of offering lipid lowering therapy for people with cholesterol over 7 mmol/L vs over 5 mmol/L. This is straightforward to do if we have access to a large dataset comparing the test to its gold standard to estimate its sensitivity and specificity at various thresholds. It is quite the challenge if we only have aggregate data from multiple publications.

In this brilliant paper, Hayley Jones and colleagues report on a new method to synthesise diagnostic accuracy data from multiple studies. It consists of a multinomial meta-analysis model that can estimate how accuracy depends on the diagnostic threshold. This method produces estimates that can be used to parameterise an economic model.

These new developments in evidence synthesis are very exciting and really important to improve the data going into economic models. My only concern is that the model is implemented in WinBUGS, which is not a software that many applied analysts use. Would it be possible to have a tutorial, or even better, include this method in the online tools available in the Complex Reviews Support Unit website?

Early economic evaluation of diagnostic technologies: experiences of the NIHR Diagnostic Evidence Co-operatives. Medical Decision Making [PubMed] Published 26th September 2019

Keeping with the diagnostic theme, this paper by Lucy Abel and colleagues reports on the experience of the Diagnostic Evidence Co-operatives in conducting early modelling of diagnostic tests. These were established in 2013 to help developers of diagnostic tests link-up with clinical and academic experts.

The paper discusses eight projects where economic modelling was conducted at an early stage of project development. It was fascinating to read about the collaboration between academics and test developers. One of the positive aspects was the buy-in of the developers, while a less positive one was the pressure to produce evidence quickly and that supported the product.

The paper is excellent in discussing the strengths and challenges of these projects. Of note, there were challenges in mapping out a clinical pathway, selecting the appropriate comparators, and establishing the consequences of testing. Furthermore, they found that the parameters around treatment effectiveness were the key driver of cost-effectiveness in many of the evaluations. This is not surprising given that the benefits of a test are usually in better informing the management decisions, rather than via its direct costs and benefits. It definitely resonates with my own experience in conducting economic evaluations of diagnostic tests (see, for example, here).

Following on from the challenges, the authors suggest areas for methodological research: mapping the clinical pathway, ensuring model transparency, and modelling sequential tests. They finish with advice for researchers doing early modelling of tests, although I’d say that it would be applicable to any economic evaluation. I completely agree that we need better methods for economic evaluation of diagnostic tests. This paper is a useful first step in setting up a research agenda.

A second chance to get causal inference right: a classification of data science tasks. Chance [arXiv] Published 14th March 2019

This impressive paper by Miguel Hernan, John Hsu and Brian Healy is an essential read for all researchers, analysts and scientists. Miguel and colleagues classify data science tasks into description, prediction and counterfactual prediction. Description is using data to quantitatively summarise some features of the world. Prediction is using the data to know some features of the world given our knowledge about other features. Counterfactual prediction is using the data to know what some features of the world would have been if something hadn’t happened; that is, causal inference.

I found the explanation of the difference between prediction and causal inference quite enlightening. It is not about the amount of data or the statistical/econometric techniques. The key difference is in the role of expert knowledge. Predicting requires expert knowledge to specify the research question, the inputs, the outputs and the data sources. Additionally, causal inference requires expert knowledge “also to describe the causal structure of the system under study”. This causal knowledge is reflected in the assumptions, the ideas for the data analysis, and for the interpretation of the results.

The section on implications for decision-making makes some important points. First, that the goal of data science is to help people make better decisions. Second, that predictive algorithms can tell us that decisions need to be made but not which decision is most beneficial – for that, we need causal inference. Third, many of us work on complex systems for which we don’t know everything (the human body is a great example). Because we don’t know everything, it is impossible to predict with certainty what would be the consequences of an intervention in a specific individual from routine health records. At most, we can estimate the average causal effect, but even for that we need assumptions. The relevance to the latest developments in data science is obvious, given all the hype around real world data, artificial intelligence and machine learning.

I absolutely loved reading this paper and wholeheartedly recommend it for any health economist. It’s a must read!

Credits

Rita Faria’s journal round-up for 2nd September 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ [PubMed] Published 28th August 2019

RCTs are the gold standard primary study to estimate the effect of treatments but are often far from perfect. The question is the extent to which their flaws make a difference to the results. Well, RoB 2 is your new best friend to help answer this question.

Developed by a star-studded team, the RoB 2 is the update to the original risk of bias tool by the Cochrane Collaboration. Bias is assessed by outcome, rather than for the whole RCT. For me, this makes sense.  For example, the primary outcome may be well reported, yet the secondary outcome, which may be the outcome of interest for a cost-effectiveness model, much less so.

Bias is considered in terms of 5 domains, with the overall risk of bias usually corresponding to the worst risk of bias in any of the domains. This overall risk of bias is then reflected in the evidence synthesis, with, for example, a stratified meta-analysis.

The paper is a great read! Jonathan Sterne and colleagues explain the reasons for the update and the process that was followed. Clearly, there was quite a lot of thought given to the types of bias and to develop questions to help reviewers assess it. The only downside is that it may require more time to apply, given that it needs to be done by outcome. Still, I think that’s a price worth paying for more reliable results. Looking forward to seeing it in use!

Characteristics and methods of incorporating randomised and nonrandomised evidence in network meta-analyses: a scoping review. Journal of Clinical Epidemiology [PubMed] Published 3rd May 2019

In keeping with the evidence synthesis theme, this paper by Kathryn Zhang and colleagues reviews how the applied literature has been combining randomised and non-randomised evidence. The headline findings are that combining these two types of study designs is rare and, when it does happen, naïve pooling is the most common method.

I imagine that the limited use of non-randomised evidence is due to its risk of bias. After all, it is difficult to ensure that the measure of association from a non-randomised study is an estimate of a causal effect. Hence, it is worrying that the majority of network meta-analyses that did combine non-randomised studies did so with naïve pooling.

This scoping review may kick start some discussions in the evidence synthesis world. When should we combine randomised and non-randomised evidence? How best to do so? And how to make sure that the right methods are used in practice? As a cost-effectiveness modeller, with limited knowledge of evidence synthesis, I’ve grappled with these questions myself. Do get in touch if you have any thoughts.

A cost-effectiveness analysis of shortened direct-acting antiviral treatment in genotype 1 noncirrhotic treatment-naive patients with chronic hepatitis C virus. Value in Health [PubMed] Published 17th May 2019

Rarely we see a cost-effectiveness paper where the proposed intervention is less costly and less effective, that is, in the controversial southwest quadrant. This exceptional paper by Christopher Fawsitt and colleagues is a welcome exception!

Christopher and colleagues looked at the cost-effectiveness of shorter treatment durations for chronic hepatitis C. Compared with the standard duration, the shorter treatment is not as effective, hence results in fewer QALYs. But it is much cheaper to treat patients over a shorter duration and re-treat those patients who were not cured, rather than treat everyone with the standard duration. Hence, for the base-case and for most scenarios, the shorter treatment is cost-effective.

I’m sure that labelling a less effective and less costly option as cost-effective may have been controversial in some quarters. Some may argue that it is unethical to offer a worse treatment than the standard even if it saves a lot of money. In my view, it is no different from funding better and more costlier treatments, given that the savings will be borne by other patients who will necessarily have access to fewer resources.

The paper is beautifully written and is another example of an outstanding cost-effectiveness analysis with important implications for policy and practice. The extensive sensitivity analysis should provide reassurance to the sceptics. And the discussion is clever in arguing for the value of a shorter duration in resource-constrained settings and for hard to reach populations. A must read!

Credits