# Sam Watson’s journal round-up for 10th September 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Probabilistic sensitivity analysis in cost-effectiveness models: determining model convergence in cohort models. PharmacoEconomics [PubMed] Published 27th July 2018

Probabilistic sensitivity analysis (PSA) is rightfully a required component of economic evaluations. Deterministic sensitivity analyses are generally biased; averaging the outputs of a model based on a choice of values from a complex joint distribution is not likely to be a good reflection of the true model mean. PSA involves repeatedly sampling parameters from their respective distributions and analysing the resulting model outputs. But how many times should you do this? Most times, an arbitrary number is selected that seems “big enough”, say 1,000 or 10,000. But these simulations themselves exhibit variance; so-called Monte Carlo error. This paper discusses making the choice of the number of simulations more formal by assessing the “convergence” of simulation output.

In the same way as sample sizes are chosen for trials, the number of simulations should provide an adequate level of precision, anything more wastes resources without improving inferences. For example, if the statistic of interest is the net monetary benefit, then we would want the confidence interval (CI) to exclude zero as this should be a sufficient level of certainty for an investment decision. The paper, therefore, proposed conducting a number of simulations, examining the CI for when it is ‘narrow enough’, and conducting further simulations if it is not. However, I see a problem with this proposal: the variance of a statistic from a sequence of simulations itself has variance. The stopping points at which we might check CI are themselves arbitrary: additional simulations can increase the width of the CI as well as reduce them. Consider the following set of simulations from a simple ratio of random variables $ICER = gamma(1,0.01)/normal(0.01,0.01)$:The “stopping rule” therefore proposed doesn’t necessarily indicate “convergence” as a few more simulations could lead to a wider, as well as narrower, CI. The heuristic approach is undoubtedly an improvement on the current way things are usually done, but I think there is scope here for a more rigorous method of assessing convergence in PSA.

Mortality due to low-quality health systems in the universal health coverage era: a systematic analysis of amenable deaths in 137 countries. The Lancet [PubMed] Published 5th September 2018

Richard Horton, the oracular editor-in-chief of the Lancet, tweeted last week:

There is certainly an argument that academic journals are good forums to make advocacy arguments. Who better to interpret the analyses presented in these journals than the authors and audiences themselves? But, without a strict editorial bulkhead between analysis and opinion, we run the risk that the articles and their content are influenced or dictated by the political whims of editors rather than scientific merit. Unfortunately, I think this article is evidence of that.

No-one debates that improving health care quality will improve patient outcomes and experience. It is in the very definition of ‘quality’. This paper aims to estimate the numbers of deaths each year due to ‘poor quality’ in low- and middle-income countries (LMICs). The trouble with this is two-fold: given the number of unknown quantities required to get a handle on this figure, the definition of quality notwithstanding, the uncertainty around this figure should be incredibly high (see below); and, attributing these deaths in a causal way to a nebulous definition of ‘quality’ is tenuous at best. The approach of the article is, in essence, to assume that the differences in fatality rates of treatable conditions between LMICs and the best performing health systems on Earth, among people who attend health services, are entirely caused by ‘poor quality’. This definition of quality would therefore seem to encompass low resourcing, poor supply of human resources, a lack of access to medicines, as well as everything else that’s different in health systems. Then, to get to this figure, the authors have multiple sources of uncertainty including:

• Using a range of proxies for health care utilisation;
• Using global burden of disease epidemiology estimates, which have associated uncertainty;
• A number of data slicing decisions, such as truncating case fatality rates;
• Estimating utilisation rates based on a predictive model;
• Estimating the case-fatality rate for non-users of health services based on other estimated statistics.

Despite this, the authors claim to estimate a 95% uncertainty interval with a width of only 300,000 people, with a mean estimate of 5.0 million, due to ‘poor quality’. This seems highly implausible, and yet it is claimed to be a causal effect of an undefined ‘poor quality’. The timing of this article coincides with the Lancet Commission on care quality in LMICs and, one suspects, had it not been for the advocacy angle on care quality, it would not have been published in this journal.

Embedding as a pitfall for survey‐based welfare indicators: evidence from an experiment. Journal of the Royal Statistical Society: Series A Published 4th September 2018

Health economists will be well aware of the various measures used to evaluate welfare and well-being. Surveys are typically used that are comprised of questions relating to a number of different dimensions. These could include emotional and social well-being or physical functioning. Similar types of surveys are also used to collect population preferences over states of the world or policy options, for example, Kahneman and Knetsch conducted a survey of WTP for different environmental policies. These surveys can exhibit what is called an ’embedding effect’, which Kahneman and Knetsch described as when the value of a good varies “depending on whether the good is assessed on its own or embedded as part of a more inclusive package.” That is to say that the way people value single dimensional attributes or qualities can be distorted when they’re embedded as part of a multi-dimensional choice. This article reports the results of an experiment involving students who were asked to weight the relative importance of different dimensions of the Better Life Index, including jobs, housing, and income. The randomised treatment was whether they rated ‘jobs’ as a single category, or were presented with individual dimensions, such as the unemployment rate and job security. The experiment shows strong evidence of embedding – the overall weighting substantially differed by treatment. This, the authors conclude, means that the Better Life Index fails to accurately capture preferences and is subject to manipulation should a researcher be so inclined – if you want evidence to say your policy is the most important, just change the way the dimensions are presented.

Credits

# James Altunkaya’s journal round-up for 3rd September 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Sensitivity analysis for not-at-random missing data in trial-based cost-effectiveness analysis: a tutorial. PharmacoEconomics [PubMed] [RePEc] Published 20th April 2018

Last month, we highlighted a Bayesian framework for imputing missing data in economic evaluation. The paper dealt with the issue of departure from the ‘Missing at Random’ (MAR) assumption by using a Bayesian approach to specify a plausible missingness model from the results of expert elicitation. This was used to estimate a prior distribution for the unobserved terms in the outcomes model.

For those less comfortable with Bayesian estimation, this month we highlight a tutorial paper from the same authors, outlining an approach to recognise the impact of plausible departures from ‘Missingness at Random’ assumptions on cost-effectiveness results. Given poor adherence to current recommendations for the best practice in handling and reporting missing data, an incremental approach to improving missing data methods in health research may be more realistic. The authors supply accompanying Stata code.

The paper investigates the importance of assuming a degree of ‘informative’ missingness (i.e. ‘Missingness not at Random’) in sensitivity analyses. In a case study, the authors present a range of scenarios which assume a decrement of 5-10% in the quality of life of patients with missing health outcomes, compared to multiple imputation estimates based on observed characteristics under standard ‘Missing at Random’ assumptions. This represents an assumption that, controlling for all observed characteristics used in multiple imputation, those with complete quality of life profiles may have higher quality of life than those with incomplete surveys.

Quality of life decrements were implemented in the control and treatment arm separately, and then jointly, in six scenarios. This aimed to demonstrate the sensitivity of cost-effectiveness judgements to the possibility of a different missingness mechanism in each arm. The authors similarly investigate sensitivity to higher health costs in those with missing data than predicted based on observed characteristics in imputation under ‘Missingness at Random’. Finally, sensitivity to a simultaneous departure from ‘Missingness at Random’ in both health outcomes and health costs is investigated.

The proposed sensitivity analyses provide a useful heuristic to assess what degree of difference between missing and non-missing subjects on unobserved characteristics would be necessary to change cost-effectiveness decisions. The authors admit this framework could appear relatively crude to those comfortable with more advanced missing data approaches such as those outlined in last month’s round-up. However, this approach should appeal to those interested in presenting the magnitude of uncertainty introduced by missing data assumptions, in a way that is easily interpretable to decision makers.

The impact of waiting for intervention on costs and effectiveness: the case of transcatheter aortic valve replacement. The European Journal of Health Economics [PubMed] [RePEc] Published September 2018

This paper appears in print this month and sparked interest as one of comparatively few studies on the cost-effectiveness of waiting lists. Given interest in using constrained optimisation methods in health outcomes research, highlighted in this month’s editorial in Value in Health, there is rightly interest in extending the traditional sphere of economic evaluation from drugs and devices to understanding the trade-offs of investing in a wider range of policy interventions, using a common metric of costs and QALYs. Rachel Meacock’s paper earlier this year did a great job at outlining some of the challenges involved broadening the scope of economic evaluation to more general decisions in health service delivery.

The authors set out to understand the cost-effectiveness of delaying a cardiac treatment (TVAR) using a waiting list of up to 12 months compared to a policy of immediate treatment. The effectiveness of treatment at 3, 6, 9 & 12 months after initial diagnosis, health decrements during waiting, and corresponding health costs during wait time and post-treatment were derived from a small observational study. As treatment is studied in an elderly population, a non-ignorable proportion of patients die whilst waiting for surgery. This translates to lower modelled costs, but also lower quality life years in modelled cohorts where there was any delay from a policy of immediate treatment. The authors conclude that eliminating all waiting time for TVAR would produce population health at a rate of ~€12,500 per QALY gained.

However, based on the modelling presented, the authors lack the ability to make cost-effectiveness judgements of this sort. Waiting lists exist for a reason, chiefly a lack of clinical capacity to treat patients immediately. In taking a decision to treat patients immediately in one disease area, we therefore need some judgement as to whether the health displaced in now untreated patients in another disease area is of greater, less or equal magnitude to that gained by treating TVAR patients immediately. Alternately, modelling should include the cost of acquiring additional clinical capacity (such as theatre space) to treat TVAR patients immediately, so as not to displace other treatments. In such a case, the ICER is likely to be much higher, due to the large cost of new resources needed to reduce waiting times to zero.

Given the data available, a simple improvement to the paper would be to reflect current waiting times (already gathered from observational study) as the ‘standard of care’ arm. As such, the estimated change in quality of life and healthcare resource cost from reducing waiting times to zero from levels observed in current practice could be calculated. This could then be used to calculate the maximum acceptable cost of acquiring additional treatment resources needed to treat patients with no waiting time, given current national willingness-to-pay thresholds.

Admittedly, there remain problems in using the authors’ chosen observational dataset to calculate quality of life and cost outcomes for patients treated at different time periods. Waiting times were prioritised in this ‘real world’ observational study, based on clinical assessment of patients’ treatment need. Thus it is expected that the quality of life lost during a waiting period would be lower for patients treated in the observational study at 12 months, compared to the expected quality of life loss of waiting for the group of patients judged to need immediate treatment. A previous study in cardiac care took on the more manageable task of investigating the cost-effectiveness of different prioritisation strategies for the waiting list, investigating the sensitivity of conclusions to varying a fixed maximum wait-time for the last patient treated.

This study therefore demonstrates some of the difficulties in attempting to make cost-effectiveness judgements about waiting time policy. Given that the cost-effectiveness of reducing waiting times in different disease areas is expected to vary, based on relative importance of waiting for treatment on short and long-term health outcomes and costs, this remains an interesting area for economic evaluation to explore. In the context of the current focus on constrained optimisation techniques across different areas in healthcare (see ISPOR task force), it is likely that extending economic evaluation to evaluate a broader range of decision problems on a common scale will become increasingly important in future.

Understanding and identifying key issues with the involvement of clinicians in the development of decision-analytic model structures: a qualitative study. PharmacoEconomics [PubMed] Published 17th August 2018

This paper gathers evidence from interviews with clinicians and modellers, with the aim to improve the nature of the working relationship between the two fields during model development.

Researchers gathered opinion from a variety of settings, including industry. The main report focusses on evidence from two case studies – one tracking the working relationship between modellers and a single clinical advisor at a UK university, with the second gathering evidence from a UK policy institute – where modellers worked with up to 11 clinical experts per meeting.

Some of the authors’ conclusions are not particularly surprising. Modellers reported difficulty in recruiting clinicians to advise on model structures, and further difficulty in then engaging recruited clinicians to provide relevant advice for the model building process. Specific comments suggested difficulty for some clinical advisors in identifying representative patient experiences, instead diverting modellers’ attention towards rare outlier events.

Study responses suggested currently only 1 or 2 clinicians were typically consulted during model development. The authors recommend involving a larger group of clinicians at this stage of the modelling process, with a more varied range of clinical experience (junior as well as senior clinicians, with some geographical variation). This is intended to help ensure clinical pathways modelled are generalizable. The experience of one clinical collaborator involved in the case study based at a UK university, compared to 11 clinicians at the policy institute studied, perhaps may also illustrate a general problem of inadequate compensation for clinical time within the university system. The authors also advocate the availability of some relevant training for clinicians in decision modelling to help enhance the efficiency of participants’ time during model building. Clinicians sampled were supportive of this view – citing the need for further guidance from modellers on the nature of their expected contribution.

This study ties into the general literature regarding structural uncertainty in decision analytic models. In advocating the early contribution of a larger, more diverse group of clinicians in model development, the authors advocate a degree of alignment between clinical involvement during model structuring, and guidelines for eliciting parameter estimates from clinical experts. Similar problems, however, remain for both fields, in recruiting clinical experts from sufficiently diverse backgrounds to provide a valid sample.

Credits