Brent Gibbons’s journal round-up for 30th January 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

For this week’s round-up, I selected three papers from December’s issue of Health Services Research. I didn’t intend to to limit my selections to one issue of one journal but as I narrowed down my selections from several journals, these three papers stood out.

Treatment effect estimation using nonlinear two-stage instrumental variable estimators: another cautionary note. Health Services Research [PubMed] Published December 2016

This paper by Chapman and Brooks evaluates the properties of a non-linear instrumental variables (IV) estimator called two-stage residual inclusion or 2SRI. 2SRI has been more recently suggested as a consistent estimator of treatment effects under conditions of selection bias and where the dependent variable of the 2nd-stage equation is either binary or otherwise non-linear in its distribution. Terza, Bradford, and Dismuke (2007) and Terza, Basu, and Rathouz (2008) furthermore claimed that 2SRI estimates can produce unbiased estimates not just of local average treatment effects (LATE) but of average treatment effects (ATE). However, Chapman and Brooks question why 2SRI, which is analogous to two-stage least squares (2SLS) when both the first and second stage equations are linear, should not require similar assumptions as in 2SLS when generalizing beyond LATE to ATE. Backing up a step, when estimating treatment effects using observational data, one worry when trying to establish a causal effect is bias due to treatment choice. Where patient characteristics related to treatment choice are unobservable and one or more instruments is available, linear IV estimation (i.e. 2SLS) produces unbiased and consistent estimates of treatment effects for “marginal patients” or compliers. These are the patients whose treatment effects were influenced by the instrument and their treatment effects are termed LATE. But if there is heterogeneity in treatment effects, a case needs to be made that treatment effect heterogeneity is not related to treatment choice in order to generalize to ATE.  Moving to non-linear IV estimation, Chapman and Brooks are skeptical that this case for generalizing LATE to ATE no longer needs to be made with 2SRI. 2SRI, for those not familiar, uses the residual from stage 1 of a two-stage estimator as a variable in the 2nd-stage equation that uses a non-linear estimator for a binary outcome (e.g. probit) or another non-linear estimator (e.g. poisson). The authors produce a simulation that tests the 2SRI properties over varying conditions of uniqueness of the marginal patient population and the strength of the instrument. The uniqueness of the marginal population is defined as the extent of the difference in treatment effects for the marginal population as compared to the general population. For each scenario tested, the bias between the estimated LATE and the true LATE and ATE is calculated. The findings support the authors’ suspicions that 2SRI is subject to biased results when uniqueness is high. In fact, the 2SRI results were only practically unbiased when uniqueness was low, but were biased for both ATE and LATE when uniqueness was high. Having very strong instruments did help reduce bias. In contrast, 2SLS was always practically unbiased for LATE for different scenarios and the authors use these results to caution researchers on using “new” estimation methods without thoroughly understanding their properties. In this case, old 2SLS still outperformed 2SRI even when dependent variables were non-linear in nature.

Testing the replicability of a successful care management program: results from a randomized trial and likely explanations for why impacts did not replicate. Health Services Research [PubMed] Published December 2016

As is widely known, how to rein in U.S. healthcare costs has been a source of much hand-wringing. One promising strategy has been to promote better management of care in particular for persons with chronic illnesses. This includes coordinating care between multiple providers, encouraging patient adherence to care recommendations, and promoting preventative care. The hope was that by managing care for patients with more complex needs, higher cost services such as emergency visits and hospitalizations could be avoided. CMS, the Centers for Medicare and Medicaid Services, funded a demonstration of a number of care management programs to study what models might be successful in improving quality and reducing costs. One program implemented by Health Quality Partners (HQP) for Medicare Fee-For-Service patients was successful in reducing hospitalizations (by 34 percent) and expenditures (by 22 percent) for a select group of patients who were identified as high-risk. The demonstration occurred from 2002 – 2010 and this paper reports results for a second phase of the demonstration where HQP was given additional funding to continue treating only high-risk patients in the years 2010 – 2014. High-risk patients were identified as having a diagnosis of congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD), coronary artery disease (CAD), or diabetes and had a hospitalization in the year prior to enrollment. In essence, phase II of the demonstration for HQP served as a replication of the original demonstration. The HQP care management program was delivered by nurse coordinators who regularly talked with patients and provided coordinated care between primary care physicians and specialists, as well as other services such as medication guidance. All positive results from phase I vanished in phase II and the authors test several hypotheses for why results did not replicate. They find that treatment group patients had similar hospitalization rates between phase I and II, but that control group patients had substantially lower phase II hospitalization rates. Outcome differences between phase I and phase II were risk-adjusted as phase II had an older population with higher severity of illness. The authors also used propensity score re-weighting to further control for differences in phase I and phase II populations. The affordable care act did promote similar care management services through patient-centered medical homes and accountable care organizations that likely contributed to the usual care of control group patients improving. The authors also note that the effectiveness of care management may be sensitive to the complexity of the target population needs. For example, the phase II population was more homebound and was therefore unable to participate in group classes. The big lesson in this paper though is that demonstration results may not replicate for different populations or even different time periods.

A machine learning framework for plan payment risk adjustment. Health Services Research [PubMed] Published December 2016

Since my company has been subsumed under IBM Watson Health, I have been trying to wrap my head around this big data revolution and the potential of technological advances such as artificial intelligence or machine learning. While machine learning has infiltrated other disciplines, it is really just starting to influence health economics, so watch out! This paper by Sherri Rose is a nice introduction into a range of machine learning techniques that she applies to the formulation of plan payment risk adjustments. In insurance systems where patients can choose from a range of insurance plans, there is the problem of adverse selection where some plans may attract an abundance of high risk patients. To control for this, plans (e.g. in the affordable care act marketplaces) with high percentages of high risk consumers get compensated based on a formula that predicts spending based on population characteristics, including diagnoses. Rose says that these formulas are still based on a 1970s framework of linear regression and may benefit from machine learning algorithms. Given that plan payment risk adjustments are essentially predictions, this does seem like a good application. In addition to testing goodness of fit of machine learning algorithms, Rose is interested in whether such techniques can reduce the number of variable inputs. Without going into any detail, insurers have found ways to “game” the system and fewer variable inputs would restrict this activity. Rose introduces a number of concepts in the paper (at least they were new to me) such as ensemble machine learningdiscrete learning frameworks and super learning frameworks. She uses a large private insurance claims dataset and breaks the dataset into what she calls 10 “folds” which allows her to run 5 prediction models, each with its own cross-validation dataset. Aside from one parametric regression model, she uses several penalized regression models, neural net, single-tree, and random forest models. She describes machine learning as aiming to smooth over data in a similar manner to parametric regression but with fewer assumptions and allowing for more flexibility. To reduce the number of variables in models, she applies techniques that limit variables to, for example, just the 10 most influential. She concludes that applying machine learning to plan payment risk adjustment models can increase efficiencies and her results suggest that it is possible to get similar results even with a limited number of variables. It is curious that the parametric model performed as well as or better than many of the different machine learning algorithms. I’ll take that to mean we can continue using our trusted regression methods for at least a few more years.

Credits

Advertisements

Chris Sampson’s journal round-up for 3rd October 2016

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Using discrete choice experiments with duration to model EQ-5D-5L health state preferences: testing experimental design strategies. Medical Decision Making [PubMedPublished 28th September 2016

DCEs are a bit in vogue for the purpose of health state valuation, so it was natural that EuroQol turned to it for valuation of the EQ-5D-5L. But previous valuation studies have highlighted challenges  associated with this approach, some of which this paper now investigates. Central to the use of DCE in this way is the inclusion of a duration attribute to facilitate anchoring from 1 to dead. This study looks at the effect of increasing the options when it comes to duration, as previous studies were limited in this regard. In this study, possible durations were 6 months or 1, 2, 4, 7 or 10 years. 802 online survey respondents we presented with 10 DCE choice sets, and the resulting model had generally logically ordered coefficients. So the approach looks feasible, but it isn’t clear whether or not there are any real advantages to including more durations. Another issue is that the efficiency of the DCE design might be improved by introducing prior information from previous studies to inform the selection of health profiles – that is, by introducing non-zero prior values. With 800 respondents, this design resulted in more disordering with – for example – a positive coefficient on level 2 for the pain/discomfort dimension. This was not the expected result. However, the design included a far greater proportion of more difficult choices, which the authors suggest may have resulted in inconsistencies. An alternative way of increasing efficiency might be to use a 2-stage approach, whereby health profiles are selected and then durations are selected based on information from previous studies. Using the same number of pairs but a sample half the size (400), the 2-stage design seemed to work a treat. It’s a promising design that will no doubt see further research in this context.

Is the distribution of care quality provided under pay-for-performance equitable? Evidence from the Advancing Quality programme in England. International Journal for Equity in Health [PubMedPublished 23rd September 2016

Suppose a regional health care quality improvement initiative worked, but only for the well-off. Would we still support it? Maybe not, so it’s important to uncover for whom the policy is working. QOF is the most-studied pay-for-performance programme in England and it does not seem to have reduced health inequalities in the context of primary care. There is less evidence regarding P4P in hospital care, which is where this study comes in by looking at the Advancing Quality initiative across five different health conditions. Using individual-level data for 73,002 people, the authors model the probability of receiving a quality indicator according to income deprivation in their local area. There were 23 indicators altogether, across which the results were not consistent. Poorer patients were more likely to receive pre-surgical interventions for hip and knee replacements and for coronary artery bypass grafting (CABG). And poorer people were less likely to receive advice at discharge. On the other hand, for hip and knee replacement and CABG, richer people were more likely to receive diagnostic tests. The main finding is that there is no obvious systematic pro-poor or pro-rich bias in the effects of this pay-for-performance initiative in secondary care. This may not be a big surprise due to the limited amount of self-selection and self-direction for patients in secondary care, compared with primary care.

The impact of social security income on cognitive function at older ages. American Journal of Health Economics [RePEc] Published 19th September 2016

Income correlates with health, as we know. But it’s useful to be more specific – as this article is – in order to inform policy. So does more social security income improve cognitive function at older ages? The short answer is yes. And that wasn’t a foregone conclusion as there is some evidence that higher income leads to earlier retirement, which in turn can be detrimental to cognitive function. In this study the authors use changes in the Social Security Act in the US in the 1970s. Between 1972 and 1977, Congress messed up a bit and temporarily introduced a policy that made payments increase at a rate faster than inflation, which was therefore enjoyed by people born between 1910 and 1916, with a 5 year gradual transition until 1922. Unsurprisingly, this study follows many others that have made the most of this policy quirk. Data are taken from a longitudinal survey of older people, which includes a set of scores relating to cognition, with a sample of 4139 people. Using an OLS model, the authors estimate the association between Social Security income and cognition. Cognition is measured using a previously developed composite score with 3 levels: ‘normal’, ‘cognitively impaired’ and ‘demented’. To handle the endogeneity of income, an instrumental variable is constructed on the basis of year of birth to tie-in with the peak in benefit from the policy (n=673). In today’s money the beneficiary cohort received around $2000 extra. It’s also good to see the analysis extended to a quantile regression to see whereabouts in the cognition score distribution effects accrue. The additional income resulted in improvements in working memory, knowledge, languages and orientation and overall cognition. The effects are strong and clinically meaningful. A $1000 (in 1993 prices) increase in annual income lead to a 1.9 percentage point reduction in the likelihood of being classified as cognitively impaired. The effect is strongest for those with higher levels of cognition. The key take-home message here is that even in older populations, policy changes can be beneficial to health. It’s never too late.

Credits

Chris Sampson’s journal round-up for 22nd August 2016

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Simulation as an ethical imperative and epistemic responsibility for the implementation of medical guidelines in health care. Medicine, Health Care and Philosophy [PubMed] Published 6th August 2016

Some people describe RCTs as a ‘gold standard’ for evidence. But if more than one RCT exists, or we have useful data from outside the RCT, that probably isn’t true. Decision modelling has value over and above RCT data, as well as in lieu of it. One crucial thing that cannot – or at least not usually – be captured in an RCT is how well the evidence might be implemented. Medical guidelines will be developed, but there will be a process of adjustments and no doubt errors; all of which might impact on the quality of life of patients. Here we stray into the realms of implementation science. This paper argues that health care providers have a responsibility to acquire knowledge about implementation and the learning curve of medical guidelines. To this end, there is an epistemic and ethical imperative to simulate the possible impacts on patients’ health of the implementation learning curve. The authors provide some examples of guideline implementation that might have benefited from simulation. However, it’s very easy in hindsight to identify what went wrong and none of the examples set out realistic scenarios for simulation analyses that could have been carried out in advance. It isn’t clear to me how or why we should differentiate – in ethical or epistemic terms – implementation from effectiveness evaluation. It is clear, however, that health economists could engage more with implementation science, and that there is an ethical imperative to do so.

Estimating marginal healthcare costs using genetic variants as instrumental variables: Mendelian randomization in economic evaluation. PharmacoEconomics [PubMedPublished 2nd August 2016

To assert that obesity is associated with greater use of health care resources is uncontroversial. However, to assert that all of the additional cost associated with obesity is because of obesity is a step too far. There are many other determinants of health care costs (and outcomes) that might be independently associated with obesity. One way of dealing with this problem of identifying causality is to use instrumental variables in econometric analysis, but appropriate IVs can be tricky to identify. Enter, Mendelian randomisation. This is a method that can be used to adopt genetic variants as IVs. This paper describes the basis for Mendelian randomisation and outlines the suitability of genetic traits as IVs. En route, the authors provide a nice accessible summary of the IV approach more generally. The focus throughout the paper is upon estimating costs, with obesity used as an example. The article outlines a lot of the potential challenges and pitfalls associated with the approach, such as the use of weak instruments and non-linear exposure-outcome relationships. On the whole, the approach is intuitive and fits easily within existing methodologies. Its main value may lie in the estimation of more accurate parameters for model-based economic evaluation. Of course, we need data. Ideally, longitudinal medical records linked to genotypic information for a large number of people. That may seem like wishful thinking, but the UK Biobank project (and others) can fit the bill.

Patient and general public preferences for health states: A call to reconsider current guidelines. Social Science & Medicine [PubMed] Published 31st July 2016

One major ongoing debate in health economics is the question of whether public or patient preferences should be used to value health states and thus to estimate QALYs. Here in the UK NICE recommends public preferences, and I’d hazard a guess that most people agree. But why? After providing some useful theoretical background, this article reviews the arguments made in favour of the use of public preferences. It focuses on three that have been identified in Dutch guidelines. First, that cost-effectiveness analysis should adopt a societal perspective. The Gold Panel invoked a Rawlsian veil of ignorance argument to support the use of decision (ex ante) utility rather than experiences (ex post). The authors highlight that this is limited, as the public are not behind a veil of ignorance. Second, that the use of patient preferences might (wrongfully) ignore adaptation. This is not a complete argument as there may be elements of adaptation that decision makers wish not to take into account, and public preferences may still underestimate the benefits of treatment due to adaptation. Third, the insurance principle highlights that the obligation to be insured is made ex ante and therefore the benefits of insurance (i.e. health care) should also be valued as such. The authors set out a useful taxonomy of the arguments, their reasoning and the counter arguments. The key message is that current arguments in favour of public preferences are incomplete. As a way forward, the authors suggest that both patient and public preferences should be used alongside each other and propose that HTA guidelines require this. The paper got my cogs whirring, so expect a follow-up blog post tomorrow.

What, who and when? Incorporating a discrete choice experiment into an economic evaluation. Health Economics Review [PubMed] Published 29th July 2016

This study claims to be the first to carry out a discrete choice experiment on clinical trial participants, and to compare willingness to pay results with standard QALY-based net benefit estimates; thus comparing a CBA and a CUA. The trial in question evaluates extending the role of community pharmacists in the management of coronary heart disease. The study focusses on the questions of what, who and when: what factors should be evaluated (i.e. beyond QALYs)? whose preferences (i.e. patients with experience of the service or all participants)? and when should preferences be evaluated (i.e. during or after the intervention)? Comparisons are made along these lines. The DCE asked participants to choose between their current situation and two alternative scenarios involving either the new service or the control. The trial found no significant difference in EQ-5D scores, SF-6D scores or costs between the groups, but it did identify a higher level of satisfaction with the intervention. The intervention group (through the DCE) reported a greater willingness to pay for the intervention than the control group, and this appeared to increase with prolonged use of the service. I’m not sure what the take-home message is from this study. The paper doesn’t answer the questions in the title – at least, not in any general sense. Nevertheless, it’s an interesting discussion about how we might carry out cost-benefit analysis using DCEs.

Photo credit: Antony Theobald (CC BY-NC-ND 2.0)