Chris Sampson’s journal round-up for 23rd April 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

What should we know about the person behind a TTO? The European Journal of Health Economics [PubMed] Published 18th April 2018

The time trade-off (TTO) is a staple of health state valuation. Ask someone to value a health state with respect to time and – hey presto! – you have QALYs. This editorial suggests that completing a TTO can be a difficult task for respondents and that, more importantly, individuals’ characteristics may determine the way that they respond and therefore the nature of the results. One of the most commonly demonstrated differences, in this respect, is the fact that valuations of people’s own health states tend to be higher than health states valued hypothetically. But this paper focuses on indirect (hypothetical) valuations. The authors highlight mixed evidence for the influence of age, gender, marital status, having children, education, income, expectations about the future, and of one’s own health state. But why should we try and find out more about respondents when conducting TTOs? The authors offer 3 reasons: i) to inform sampling, ii) to inform the design and standardisation of TTO exercises, and iii) to inform the analysis. I agree – we need to better understand these sources of heterogeneity. Not to over-engineer responses, but to aid our interpretation, even if we want societally-representative valuations that include all of these variations in response behaviour. TTO valuation studies should collect data relating to the individual respondents. Unfortunately, what those data should be aren’t listed in this study, so the research question in the title isn’t really answered. But maybe that’s something the authors have in hand.

Computer modeling of diabetes and its transparency: a report on the eighth Mount Hood Challenge. Value in Health Published 9th April 2018

The Mount Hood Challenge is a get-together for people working on the (economic) modelling of diabetes. The subject of the 2016 meeting was transparency, with two specific goals: i) to evaluate the transparency of two published studies, and ii) to develop a diabetes-specific checklist for transparent reporting of modelling studies. Participants were tasked (in advance of the meeting) with replicating the two published studies and using the replicated models to evaluate some pre-specified scenarios. Both of the studies had some serious shortcomings in the reporting of the necessary data for replication, including the baseline characteristics of the population. Five modelling groups replicated the first model and seven groups replicated the second model. Naturally, the different groups made different assumptions about what should be used in place of missing data. For the first paper, none of the models provided results that matched the original. Not even close. And the differences between the results of the replications – in terms of costs incurred and complications avoided – were huge. The performance was a bit better on the second paper, but hardly worth celebrating. In general, the findings were fear-confirming. Informed by these findings, the Diabetes Modeling Input Checklist was created, designed to complement existing checklists with more general applications. It includes specific data requirements for the reporting of modelling studies, relating to the simulation cohort, treatments, costs, utilities, and model characteristics. If you’re doing some modelling in diabetes, you should have this paper to hand.

Setting dead at zero: applying scale properties to the QALY model. Medical Decision Making [PubMed] Published 9th April 2018

In health state valuation, whether or not a state is considered ‘worse than dead’ is heavily dependent on methodological choices. This paper reviews the literature to answer two questions: i) what are the reasons for anchoring at dead=0, and ii) how does the position of ‘dead’ on the utility-scale impact on decision making? The authors took a standard systematic approach to identify literature from databases, with 7 papers included. Then the authors discuss scale properties and the idea that there are interval scales (such as temperature) and ratio scales (such as distance). The difference between these is the meaningfulness of the reference point (or origin). This means that you can talk about distance doubling, but you can’t talk about temperature doubling, because 0 metres is not arbitrary, whereas 0 degrees Celsius is. The paper summarises some of the arguments put forward for using dead=0. They aren’t compelling. The authors argue that the duration part of the QALY (i.e. time) needs to have ratio properties for the QALY model to function. Time obviously holds this property and it’s clear that duration can be anchored at zero. The authors then demonstrate that, for the QALY model to work, the health-utility scale must also exhibit ratio scale properties. The basis for this is the assumption that zero duration nullifies health states and that ‘dead’ nullifies duration. But the paper doesn’t challenge the conceptual basis for using dead in health state valuation exercises. Rather, it considers the mathematical properties that must hold to allow for dead=0, and asserts them. The authors’ conclusion that dead “needs to have the value of 0 in a QALY model” is correct, but only within the existing restrictions and assumptions underlying current practice. Nevertheless, this is a very useful study for understanding the challenge of anchoring and explicating the assumptions underlying the QALY model.

Credits

Brent Gibbons’s journal round-up for 30th January 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

For this week’s round-up, I selected three papers from December’s issue of Health Services Research. I didn’t intend to to limit my selections to one issue of one journal but as I narrowed down my selections from several journals, these three papers stood out.

Treatment effect estimation using nonlinear two-stage instrumental variable estimators: another cautionary note. Health Services Research [PubMed] Published December 2016

This paper by Chapman and Brooks evaluates the properties of a non-linear instrumental variables (IV) estimator called two-stage residual inclusion or 2SRI. 2SRI has been more recently suggested as a consistent estimator of treatment effects under conditions of selection bias and where the dependent variable of the 2nd-stage equation is either binary or otherwise non-linear in its distribution. Terza, Bradford, and Dismuke (2007) and Terza, Basu, and Rathouz (2008) furthermore claimed that 2SRI estimates can produce unbiased estimates not just of local average treatment effects (LATE) but of average treatment effects (ATE). However, Chapman and Brooks question why 2SRI, which is analogous to two-stage least squares (2SLS) when both the first and second stage equations are linear, should not require similar assumptions as in 2SLS when generalizing beyond LATE to ATE. Backing up a step, when estimating treatment effects using observational data, one worry when trying to establish a causal effect is bias due to treatment choice. Where patient characteristics related to treatment choice are unobservable and one or more instruments is available, linear IV estimation (i.e. 2SLS) produces unbiased and consistent estimates of treatment effects for “marginal patients” or compliers. These are the patients whose treatment effects were influenced by the instrument and their treatment effects are termed LATE. But if there is heterogeneity in treatment effects, a case needs to be made that treatment effect heterogeneity is not related to treatment choice in order to generalize to ATE.  Moving to non-linear IV estimation, Chapman and Brooks are skeptical that this case for generalizing LATE to ATE no longer needs to be made with 2SRI. 2SRI, for those not familiar, uses the residual from stage 1 of a two-stage estimator as a variable in the 2nd-stage equation that uses a non-linear estimator for a binary outcome (e.g. probit) or another non-linear estimator (e.g. poisson). The authors produce a simulation that tests the 2SRI properties over varying conditions of uniqueness of the marginal patient population and the strength of the instrument. The uniqueness of the marginal population is defined as the extent of the difference in treatment effects for the marginal population as compared to the general population. For each scenario tested, the bias between the estimated LATE and the true LATE and ATE is calculated. The findings support the authors’ suspicions that 2SRI is subject to biased results when uniqueness is high. In fact, the 2SRI results were only practically unbiased when uniqueness was low, but were biased for both ATE and LATE when uniqueness was high. Having very strong instruments did help reduce bias. In contrast, 2SLS was always practically unbiased for LATE for different scenarios and the authors use these results to caution researchers on using “new” estimation methods without thoroughly understanding their properties. In this case, old 2SLS still outperformed 2SRI even when dependent variables were non-linear in nature.

Testing the replicability of a successful care management program: results from a randomized trial and likely explanations for why impacts did not replicate. Health Services Research [PubMed] Published December 2016

As is widely known, how to rein in U.S. healthcare costs has been a source of much hand-wringing. One promising strategy has been to promote better management of care in particular for persons with chronic illnesses. This includes coordinating care between multiple providers, encouraging patient adherence to care recommendations, and promoting preventative care. The hope was that by managing care for patients with more complex needs, higher cost services such as emergency visits and hospitalizations could be avoided. CMS, the Centers for Medicare and Medicaid Services, funded a demonstration of a number of care management programs to study what models might be successful in improving quality and reducing costs. One program implemented by Health Quality Partners (HQP) for Medicare Fee-For-Service patients was successful in reducing hospitalizations (by 34 percent) and expenditures (by 22 percent) for a select group of patients who were identified as high-risk. The demonstration occurred from 2002 – 2010 and this paper reports results for a second phase of the demonstration where HQP was given additional funding to continue treating only high-risk patients in the years 2010 – 2014. High-risk patients were identified as having a diagnosis of congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD), coronary artery disease (CAD), or diabetes and had a hospitalization in the year prior to enrollment. In essence, phase II of the demonstration for HQP served as a replication of the original demonstration. The HQP care management program was delivered by nurse coordinators who regularly talked with patients and provided coordinated care between primary care physicians and specialists, as well as other services such as medication guidance. All positive results from phase I vanished in phase II and the authors test several hypotheses for why results did not replicate. They find that treatment group patients had similar hospitalization rates between phase I and II, but that control group patients had substantially lower phase II hospitalization rates. Outcome differences between phase I and phase II were risk-adjusted as phase II had an older population with higher severity of illness. The authors also used propensity score re-weighting to further control for differences in phase I and phase II populations. The affordable care act did promote similar care management services through patient-centered medical homes and accountable care organizations that likely contributed to the usual care of control group patients improving. The authors also note that the effectiveness of care management may be sensitive to the complexity of the target population needs. For example, the phase II population was more homebound and was therefore unable to participate in group classes. The big lesson in this paper though is that demonstration results may not replicate for different populations or even different time periods.

A machine learning framework for plan payment risk adjustment. Health Services Research [PubMed] Published December 2016

Since my company has been subsumed under IBM Watson Health, I have been trying to wrap my head around this big data revolution and the potential of technological advances such as artificial intelligence or machine learning. While machine learning has infiltrated other disciplines, it is really just starting to influence health economics, so watch out! This paper by Sherri Rose is a nice introduction into a range of machine learning techniques that she applies to the formulation of plan payment risk adjustments. In insurance systems where patients can choose from a range of insurance plans, there is the problem of adverse selection where some plans may attract an abundance of high risk patients. To control for this, plans (e.g. in the affordable care act marketplaces) with high percentages of high risk consumers get compensated based on a formula that predicts spending based on population characteristics, including diagnoses. Rose says that these formulas are still based on a 1970s framework of linear regression and may benefit from machine learning algorithms. Given that plan payment risk adjustments are essentially predictions, this does seem like a good application. In addition to testing goodness of fit of machine learning algorithms, Rose is interested in whether such techniques can reduce the number of variable inputs. Without going into any detail, insurers have found ways to “game” the system and fewer variable inputs would restrict this activity. Rose introduces a number of concepts in the paper (at least they were new to me) such as ensemble machine learningdiscrete learning frameworks and super learning frameworks. She uses a large private insurance claims dataset and breaks the dataset into what she calls 10 “folds” which allows her to run 5 prediction models, each with its own cross-validation dataset. Aside from one parametric regression model, she uses several penalized regression models, neural net, single-tree, and random forest models. She describes machine learning as aiming to smooth over data in a similar manner to parametric regression but with fewer assumptions and allowing for more flexibility. To reduce the number of variables in models, she applies techniques that limit variables to, for example, just the 10 most influential. She concludes that applying machine learning to plan payment risk adjustment models can increase efficiencies and her results suggest that it is possible to get similar results even with a limited number of variables. It is curious that the parametric model performed as well as or better than many of the different machine learning algorithms. I’ll take that to mean we can continue using our trusted regression methods for at least a few more years.

Credits