Sam Watson’s journal round up for 16th October 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Effect of forced displacement on health. Journal of the Royal Statistical Society: Series A [RePEcPublished October 2017

History, as they say, is doomed to repeat itself. Despite repeated cries of ‘never again’, war and conflict continue to harm and displace people around the world. The mass displacement of the Rohingya in Myanmar is leading to the formation of the world’s largest refugee camp in Bangladesh. Aside from the obvious harm from conflict itself, displacement is likely to have pernicious effects on health. Livelihoods and well-being is lost as well as access to basic amenities and facilities. The conflict in Croatia and wider Eastern Europe created a mass displacement of people, however, many went to relatively wealthy neighbouring countries across Europe. Thus, the health effects of displacement in this conflict should provide an understanding of the lower bound of what happens. This paper looks into this question using a health survey of Croatians from 2003. An empirical issue the authors spend a substantial amount of time addressing is that displacement status is likely to be endogenous: beyond choices about protecting their household and possessions, health and the ability to travel may play a large role in decisions to move. The mortality rate from conflict is used as an instrument for displacement, being a proxy for the intensity of war. However, conflict intensity is obviously likely to have a effect itself on health status. A method of relaxing the exclusion restriction is used, which tempers the estimates somewhat. Nevertheless, there is evidence that displacement impacts upon hypertension, self-assessed health, and emotional and physical dimensions of the SF-36. However, it seems to me that there may be another empirical issue not dealt with – the sample selection problem. While the number of casualties was low relative to the size of the population and numbers of displaced people, those who died obviously don’t feature in the sample. And those who died may have also been more likely or not to be displaced and be in worse health. Maybe only a bias of second order but a point it seems is left unconsidered.

Can variation in subgroups’ average treatment effects explain treatment effect heterogeneity? Evidence from a social experiment. Review of Economics and Statistics [RePEcPublished October 2017

A common approach to explore treatment effect heterogeneity is to estimate mean impacts by subgroups. In applied health economics studies I have most often seen this done by pooling data and adding interactions of the treatment with subgroups of interest to a regression model. For example, there is a large interest in differences in access to care across socioeconomic groups – in the UK we often use quintiles, or other division, of the Index of Multiple Deprivation, which is estimated at small area level, to look at this. However, this paper looks at the question of whether this approach to estimating heterogeneity is any good. Using data from a large jobs treatment program, they compare estimates of quantile treatment effects, which are considered to fully capture treatment effect heterogeneity, to results from various specifications of models that assume constant treatment effects within subgroups. If they found there was little difference in the two methods, I doubt the paper would have been published in such a good journal, so it’s no surprise that their conclusions are that the subgroup models perform poorly. Even allowing for more flexibility, such as by allowing effects to vary over time, and adding submodels for a point mass at zero, they still don’t do that well. Interestingly, subgroups defined according to different variables, e.g. education or pre-treatment earnings, fare differently – so comparisons across types of subgroups is important when the analyst is looking at heterogeneity. The takeaway message though is that constant effects subgroups models aren’t that good – more flexible semi or nonparametric methods may be preferred.

The hidden costs of terrorism: The effects on health at birth. Journal of Health Economics [PubMedPublished October 2017

We here at the blog have covered a long series of papers on the effects of in utero stressors on birth and later life health and economic outcomes. The so-called fetal-origins hypothesis posits that the nine months in the womb are some of the most important in predicting later life health outcomes. This may be one of the main mechanisms explaining intergenerational transmission of health. Some of these previous studies have covered reduced maternal nutrition, exposure to conditions of famine, or unemployment shocks in the household. This study examines the effect of the mother being pregnant in a province in Spain during which a terrorist attack by ETA occurred. At first glance, one might be forgiven for being sceptical at first, given (i) terrorist attacks were rare, (ii) the chances of actually being affected by an attack in a province if an attack occurred is low, so (iii) the chances are that the effect of feeling stressed on birth weight is small and likely to be swamped by a multitude of other factors (see all the literature we’ve covered on the topic!) All credit to the authors for doing a thorough job of trying to address all these concerns, but I’ll admit I remain sceptical. The effect sizes are very small indeed, as we suspected, and unfortunately there is not enough evidence to examine whether those women who had low birth weight live births were stressed or demonstrating adverse health behaviours.

Credits

Sam Watson’s journal round-up for 26th June 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Future and potential spending on health 2015–40: development assistance for health, and government, prepaid private, and out-of-pocket health spending in 184 countries. The Lancet [PubMed] Published 20th May 2017

The colossal research collaboration that is the Global Burden of Disease Study is well known for producing estimates of deaths and DALYs lost across the world due to a huge range of diseases. These figures have proven invaluable as a source of information to inform disease modelling studies and to help guide the development of public health programs. In this study, the collaboration turn their hands to modelling future health care expenditure. Predicting the future of any macroeconomic variable is tricky, to say the least. The approach taken here is to (1) model GDP to 2040 using an ensemble method, taking the ‘best performing’ models from the over 1,000 used (134 were included); (2) model all-sector government spending, out-of-pocket spending, and private health spending as a proportion of GDP in the same way, but with GDP as an input; and then (3) using a stochastic frontier approach to model maximum ‘potential’ spending. This latter step is an attempt to make the results potentially more useful by analysing different scenarios that might change overall health care expenditure by considering different frontiers. All of these steps would conceptually add a lot of uncertainty: the different probability of each model in the ensemble and the prediction uncertainty from each model including uncertainty in inputs such as size of population and demographic structure, all of which is propagated through the three step process. And this is without taking into account that health care spending at a national level is the result of a complex political decision making process, which can impact national income and prioritisation of health care in unforeseen ways (Brexit anyone?). Despite this, the predictions seem quite certain: health spending per capita is predicted to rise from $1,279 in 2014 to $2,872 with a 95% confidence intervention (or do they mean prediction interval?) of $2,426 to $3,522. It may well be a good model for average spending, but I suspect uncertainty (at least of a Bayesian kind) should be higher for a predictive model for 25 years into the future based on 20 years of data. The non-standard use of stochastic frontier analysis, which is typically a way of estimating technical efficiency, is also tricky to follow. The frontier is argued in this paper to be the maximum amount a country of similar levels of development spends on health care. This would also suggest that it is assumed spending cannot go higher than a country’s highest spending peer. A potentially strong assumption. Needless to say, these are the best predictions we currently have for future health care expenditure.

Discovering effect modification in an observational study of surgical mortality at hospitals with superior nursing. Journal of the Royal Statistical Society: Series A [ArXivPublished June 2017

An applied econometrician can find endogeneity everywhere. Such is the complexity of the social, political, and economic world. Everything is connected in some way. It’s one of the reasons I’ve argued before against null hypothesis significance testing: no effect is going to be exactly zero. Our job is one of measurement of the size of an effect and, crucially for this paper, what might affect the magnitude of these effects. This might start with a graphical or statistical exploratory analysis before proceeding to a confirmatory analysis. This paper proposes a method of exploratory analysis for treatment effect modifiers and examines the effect of superior nursing on treatment outcomes, which an approach I think to be a sensible scientific approach. But how does it propose to do it? Null hypothesis significance testing! Oh no! Essentially, the method involves a novel procedure for testing if treatment effects differ by group allowing for potential unobserved confounding and where the groups are also formed in a novel way. For example, the authors ask how much bias would need to be present for their conclusions to change. In terms of the effects of superior nurse staffing, the authors estimates that its beneficial treatment effect is the least sensitive to bias in a group of patients with the most serious conditions.

Incorporation of a health economic modelling tool into public health commissioning: Evidence use in a politicised context. Social Science & Medicine [PubMedPublished June 2017

Last up, a qualitative research paper (on an economics blog! I know…). Many health economists are involved in trying to encourage the incorporation of research findings into health care decision making and commissioning. The political decision making process often ends in inefficient or inequitable outcomes despite good evidence on what makes good policy. This paper explored how commissioners in an English local authority viewed a health economics decision tool for planning diabetes services. This is a key bit of research if we are to make headway in designing tools that actually improve commissioning decisions. Two key groups of stakeholders were involved, public health managers and politicians. The latter prioritized intelligence, local opinion, and social care agendas over scientific evidence from research, which was preferred by the former group. The push and pull between the different approaches meant the health economics tool was used as a way of supporting the agendas of different stakeholders rather than as a means to addressing complex decisions. For a tool to be successful it would seem to need to speak to or about the local population to which it is going to be applied. Well, that’s my interpretation. I’ll leave you with this quote from an interview with a manager in the study:

Public health, what they bring is a, I call it a kind of education scholarly kind of approach to things … whereas ‘social care’ sometimes are not so evidence-based-led. It’s a bit ‘well I thought that’ or, it’s a bit more fly by the seat of pants in social care.

Credits

Brent Gibbons’s journal round-up for 30th January 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

For this week’s round-up, I selected three papers from December’s issue of Health Services Research. I didn’t intend to to limit my selections to one issue of one journal but as I narrowed down my selections from several journals, these three papers stood out.

Treatment effect estimation using nonlinear two-stage instrumental variable estimators: another cautionary note. Health Services Research [PubMed] Published December 2016

This paper by Chapman and Brooks evaluates the properties of a non-linear instrumental variables (IV) estimator called two-stage residual inclusion or 2SRI. 2SRI has been more recently suggested as a consistent estimator of treatment effects under conditions of selection bias and where the dependent variable of the 2nd-stage equation is either binary or otherwise non-linear in its distribution. Terza, Bradford, and Dismuke (2007) and Terza, Basu, and Rathouz (2008) furthermore claimed that 2SRI estimates can produce unbiased estimates not just of local average treatment effects (LATE) but of average treatment effects (ATE). However, Chapman and Brooks question why 2SRI, which is analogous to two-stage least squares (2SLS) when both the first and second stage equations are linear, should not require similar assumptions as in 2SLS when generalizing beyond LATE to ATE. Backing up a step, when estimating treatment effects using observational data, one worry when trying to establish a causal effect is bias due to treatment choice. Where patient characteristics related to treatment choice are unobservable and one or more instruments is available, linear IV estimation (i.e. 2SLS) produces unbiased and consistent estimates of treatment effects for “marginal patients” or compliers. These are the patients whose treatment effects were influenced by the instrument and their treatment effects are termed LATE. But if there is heterogeneity in treatment effects, a case needs to be made that treatment effect heterogeneity is not related to treatment choice in order to generalize to ATE.  Moving to non-linear IV estimation, Chapman and Brooks are skeptical that this case for generalizing LATE to ATE no longer needs to be made with 2SRI. 2SRI, for those not familiar, uses the residual from stage 1 of a two-stage estimator as a variable in the 2nd-stage equation that uses a non-linear estimator for a binary outcome (e.g. probit) or another non-linear estimator (e.g. poisson). The authors produce a simulation that tests the 2SRI properties over varying conditions of uniqueness of the marginal patient population and the strength of the instrument. The uniqueness of the marginal population is defined as the extent of the difference in treatment effects for the marginal population as compared to the general population. For each scenario tested, the bias between the estimated LATE and the true LATE and ATE is calculated. The findings support the authors’ suspicions that 2SRI is subject to biased results when uniqueness is high. In fact, the 2SRI results were only practically unbiased when uniqueness was low, but were biased for both ATE and LATE when uniqueness was high. Having very strong instruments did help reduce bias. In contrast, 2SLS was always practically unbiased for LATE for different scenarios and the authors use these results to caution researchers on using “new” estimation methods without thoroughly understanding their properties. In this case, old 2SLS still outperformed 2SRI even when dependent variables were non-linear in nature.

Testing the replicability of a successful care management program: results from a randomized trial and likely explanations for why impacts did not replicate. Health Services Research [PubMed] Published December 2016

As is widely known, how to rein in U.S. healthcare costs has been a source of much hand-wringing. One promising strategy has been to promote better management of care in particular for persons with chronic illnesses. This includes coordinating care between multiple providers, encouraging patient adherence to care recommendations, and promoting preventative care. The hope was that by managing care for patients with more complex needs, higher cost services such as emergency visits and hospitalizations could be avoided. CMS, the Centers for Medicare and Medicaid Services, funded a demonstration of a number of care management programs to study what models might be successful in improving quality and reducing costs. One program implemented by Health Quality Partners (HQP) for Medicare Fee-For-Service patients was successful in reducing hospitalizations (by 34 percent) and expenditures (by 22 percent) for a select group of patients who were identified as high-risk. The demonstration occurred from 2002 – 2010 and this paper reports results for a second phase of the demonstration where HQP was given additional funding to continue treating only high-risk patients in the years 2010 – 2014. High-risk patients were identified as having a diagnosis of congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD), coronary artery disease (CAD), or diabetes and had a hospitalization in the year prior to enrollment. In essence, phase II of the demonstration for HQP served as a replication of the original demonstration. The HQP care management program was delivered by nurse coordinators who regularly talked with patients and provided coordinated care between primary care physicians and specialists, as well as other services such as medication guidance. All positive results from phase I vanished in phase II and the authors test several hypotheses for why results did not replicate. They find that treatment group patients had similar hospitalization rates between phase I and II, but that control group patients had substantially lower phase II hospitalization rates. Outcome differences between phase I and phase II were risk-adjusted as phase II had an older population with higher severity of illness. The authors also used propensity score re-weighting to further control for differences in phase I and phase II populations. The affordable care act did promote similar care management services through patient-centered medical homes and accountable care organizations that likely contributed to the usual care of control group patients improving. The authors also note that the effectiveness of care management may be sensitive to the complexity of the target population needs. For example, the phase II population was more homebound and was therefore unable to participate in group classes. The big lesson in this paper though is that demonstration results may not replicate for different populations or even different time periods.

A machine learning framework for plan payment risk adjustment. Health Services Research [PubMed] Published December 2016

Since my company has been subsumed under IBM Watson Health, I have been trying to wrap my head around this big data revolution and the potential of technological advances such as artificial intelligence or machine learning. While machine learning has infiltrated other disciplines, it is really just starting to influence health economics, so watch out! This paper by Sherri Rose is a nice introduction into a range of machine learning techniques that she applies to the formulation of plan payment risk adjustments. In insurance systems where patients can choose from a range of insurance plans, there is the problem of adverse selection where some plans may attract an abundance of high risk patients. To control for this, plans (e.g. in the affordable care act marketplaces) with high percentages of high risk consumers get compensated based on a formula that predicts spending based on population characteristics, including diagnoses. Rose says that these formulas are still based on a 1970s framework of linear regression and may benefit from machine learning algorithms. Given that plan payment risk adjustments are essentially predictions, this does seem like a good application. In addition to testing goodness of fit of machine learning algorithms, Rose is interested in whether such techniques can reduce the number of variable inputs. Without going into any detail, insurers have found ways to “game” the system and fewer variable inputs would restrict this activity. Rose introduces a number of concepts in the paper (at least they were new to me) such as ensemble machine learningdiscrete learning frameworks and super learning frameworks. She uses a large private insurance claims dataset and breaks the dataset into what she calls 10 “folds” which allows her to run 5 prediction models, each with its own cross-validation dataset. Aside from one parametric regression model, she uses several penalized regression models, neural net, single-tree, and random forest models. She describes machine learning as aiming to smooth over data in a similar manner to parametric regression but with fewer assumptions and allowing for more flexibility. To reduce the number of variables in models, she applies techniques that limit variables to, for example, just the 10 most influential. She concludes that applying machine learning to plan payment risk adjustment models can increase efficiencies and her results suggest that it is possible to get similar results even with a limited number of variables. It is curious that the parametric model performed as well as or better than many of the different machine learning algorithms. I’ll take that to mean we can continue using our trusted regression methods for at least a few more years.

Credits