Are we estimating the effects of health care expenditure correctly?

It is a contentious issue in philosophy whether an omission can be the cause of an event. At the very least it seems we should consider causation by omission differently from ‘ordinary’ causation. Consider Sarah McGrath’s example. Billy promised Alice to water the plant while she was away, but he did not water it. Billy not watering the plant caused its death. But there are good reasons to suppose that Billy did not cause its death. If Billy’s lack of watering caused the death of the plant, it may well be reasonable to assume that Vladimir Putin and indeed anyone else who did not water the plant were also a cause. McGrath argues that there is a normative consideration here: Billy ought to have watered the plant and that’s why we judge his omission as a cause and not anyone else’s. Similarly, the example from L.A. Paul and Ned Hall’s excellent book Causation: A User’s GuideBilly and Suzy are playing soccer on rival teams. One of Suzy’s teammates scores a goal. Both Billy and Suzy were nearby and could have easily prevented the goal. But our judgement is that the goal should only be credited to Billy’s failure to block the goal as Suzy had no responsibility to.

These arguments may appear far removed from the world of health economics. But, they have practical implications. Consider the estimation of the effect that increasing health care expenditure has on public health outcomes. The government, or relevant health authority, makes a decision about how the budget is allocated. It is often the case that there are allocative inefficiencies: greater gains could be had by reallocating the budget to more effective programs of care. In this case there would seem to be a relevant omission; the budget has not been spent where it could have provided benefits. These omissions are often seen as causes of a loss of health. Karl Claxton wrote of the Cancer Drugs Fund, a pool of money diverted from the National Health Service to provide cancer drugs otherwise considered cost-ineffective, that it was associated with

a net loss of at least 14,400 quality adjusted life years in 2013/14.

Similarly, an analysis of the lack of spending on effective HIV treatment and prevention by the Mbeki administration in South Africa wrote that

More than 330,000 lives or approximately 2.2 million person-years were lost because a feasible and timely ARV treatment program was not implemented in South Africa.

But our analyses of the effects of health care expenditure typically do not take these omissions into account.

Causal inference methods are founded on a counterfactual theory of causation. The aim of a causal inference method is to estimate the potential outcomes that would have been observed under different treatment regimes. In our case this would be what would have happened under different levels of expenditure. This is typically estimated by examining the relationship between population health and levels of expenditure, perhaps using some exogenous determinant of expenditure to identify the causal effects of interest. But this only identifies those changes caused by expenditure and not those changes caused by not spending.

Consider the following toy example. There are two causes of death in the population a and b with associated programs of care and prevention A and B. The total health care expenditure is x of which a proportion p: p\in P \subseteq [0,1] is spent on A and 1-p on B. The deaths due to each cause are y_a and y_b and so the total deaths are y = y_a + y_b. Finally, the effect of a unit increase in expenditure in each program are \beta_a and \beta_b. The question is to determine what the causal effect of expenditure is. If Y_x is the potential outcome for level of expenditure x then the average treatment effect is given by E(\frac{\partial Y_x}{\partial x}).

The country has chosen an allocation between the programmes of care of p_0. If causation by omission is not a concern then, given linear, additive models (and that all the model assumptions are met), y_a = \alpha_a + \beta_a p x + f_a(t) + u_a and y_b = \alpha_b + \beta_b (1-p) x + f_b(t) + u_b, the causal effect is E(\frac{\partial Y_x}{\partial x}) = \beta = \beta_a p_0 + \beta_b (1-p_0). But if causation by omission is relevant, then the net effect of expenditure is the lives gained \beta_a p_0 + \beta_b (1-p_0) less the lives lost. The lives lost are those under all possible things we did not do, so the estimator of the causal effect is \beta' = \beta_a p_0 + \beta_b (1-p_0) -  \int_{P/p_0} [ \beta_ap + \beta_b(1-p) ] dG(p). Now, clearly \beta \neq \beta' unless P/p_0 is the empty set, i.e. there was no other option. Indeed, the choice of possible alternatives involves a normative judgement as we’ve suggested. For an omission to count as a cause, there needs to be a judgement about what ought to have been done. For health care expenditure this may mean that the only viable alternative is the allocatively efficient distribution, in which case all allocations will result in a net loss of life unless they are allocatively efficient, which some may argue is reasonable. An alternative view is simply that the government simply has to not do worse than in the past and perhaps it is also reasonable for the government not to make significant changes to the allocation, for whatever reason. In that case we might say that P \in [p_0,1] and g(p) might be a distribution truncated below p_0 with most mass around p_0 and small variance.

The problem is that we generally do not observe the effect of expenditure in each program of care nor do we know the distribution of possible budget allocations. The normative judgements are also a contentious issue. Claxton clearly believes the government ought not to have initiated the Cancer Drugs Fund, but he does not go so far as to say any allocative inefficiency results in a net loss of life. Some working out of the underlying normative principles is warranted. But if it’s not possible to estimate these net causal effects, why discuss it? Perhaps it’s due to the lack of consistency. We estimate the ‘ordinary’ causal effect in our empirical work, but we often discuss opportunity costs and losses due to inefficiencies as being due to or caused by the spending decisions that are made. As the examples at the beginning illustrate, the normative question of responsibility seeps into our judgments about whether an omission is the cause of an outcome. For health care expenditure the government or other health care body does have a relevant responsibility. I would argue then that causation by omission is important and perhaps we need to reconsider the inferences that we make.



Brent Gibbons’s journal round-up for 30th January 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

For this week’s round-up, I selected three papers from December’s issue of Health Services Research. I didn’t intend to to limit my selections to one issue of one journal but as I narrowed down my selections from several journals, these three papers stood out.

Treatment effect estimation using nonlinear two-stage instrumental variable estimators: another cautionary note. Health Services Research [PubMed] Published December 2016

This paper by Chapman and Brooks evaluates the properties of a non-linear instrumental variables (IV) estimator called two-stage residual inclusion or 2SRI. 2SRI has been more recently suggested as a consistent estimator of treatment effects under conditions of selection bias and where the dependent variable of the 2nd-stage equation is either binary or otherwise non-linear in its distribution. Terza, Bradford, and Dismuke (2007) and Terza, Basu, and Rathouz (2008) furthermore claimed that 2SRI estimates can produce unbiased estimates not just of local average treatment effects (LATE) but of average treatment effects (ATE). However, Chapman and Brooks question why 2SRI, which is analogous to two-stage least squares (2SLS) when both the first and second stage equations are linear, should not require similar assumptions as in 2SLS when generalizing beyond LATE to ATE. Backing up a step, when estimating treatment effects using observational data, one worry when trying to establish a causal effect is bias due to treatment choice. Where patient characteristics related to treatment choice are unobservable and one or more instruments is available, linear IV estimation (i.e. 2SLS) produces unbiased and consistent estimates of treatment effects for “marginal patients” or compliers. These are the patients whose treatment effects were influenced by the instrument and their treatment effects are termed LATE. But if there is heterogeneity in treatment effects, a case needs to be made that treatment effect heterogeneity is not related to treatment choice in order to generalize to ATE.  Moving to non-linear IV estimation, Chapman and Brooks are skeptical that this case for generalizing LATE to ATE no longer needs to be made with 2SRI. 2SRI, for those not familiar, uses the residual from stage 1 of a two-stage estimator as a variable in the 2nd-stage equation that uses a non-linear estimator for a binary outcome (e.g. probit) or another non-linear estimator (e.g. poisson). The authors produce a simulation that tests the 2SRI properties over varying conditions of uniqueness of the marginal patient population and the strength of the instrument. The uniqueness of the marginal population is defined as the extent of the difference in treatment effects for the marginal population as compared to the general population. For each scenario tested, the bias between the estimated LATE and the true LATE and ATE is calculated. The findings support the authors’ suspicions that 2SRI is subject to biased results when uniqueness is high. In fact, the 2SRI results were only practically unbiased when uniqueness was low, but were biased for both ATE and LATE when uniqueness was high. Having very strong instruments did help reduce bias. In contrast, 2SLS was always practically unbiased for LATE for different scenarios and the authors use these results to caution researchers on using “new” estimation methods without thoroughly understanding their properties. In this case, old 2SLS still outperformed 2SRI even when dependent variables were non-linear in nature.

Testing the replicability of a successful care management program: results from a randomized trial and likely explanations for why impacts did not replicate. Health Services Research [PubMed] Published December 2016

As is widely known, how to rein in U.S. healthcare costs has been a source of much hand-wringing. One promising strategy has been to promote better management of care in particular for persons with chronic illnesses. This includes coordinating care between multiple providers, encouraging patient adherence to care recommendations, and promoting preventative care. The hope was that by managing care for patients with more complex needs, higher cost services such as emergency visits and hospitalizations could be avoided. CMS, the Centers for Medicare and Medicaid Services, funded a demonstration of a number of care management programs to study what models might be successful in improving quality and reducing costs. One program implemented by Health Quality Partners (HQP) for Medicare Fee-For-Service patients was successful in reducing hospitalizations (by 34 percent) and expenditures (by 22 percent) for a select group of patients who were identified as high-risk. The demonstration occurred from 2002 – 2010 and this paper reports results for a second phase of the demonstration where HQP was given additional funding to continue treating only high-risk patients in the years 2010 – 2014. High-risk patients were identified as having a diagnosis of congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD), coronary artery disease (CAD), or diabetes and had a hospitalization in the year prior to enrollment. In essence, phase II of the demonstration for HQP served as a replication of the original demonstration. The HQP care management program was delivered by nurse coordinators who regularly talked with patients and provided coordinated care between primary care physicians and specialists, as well as other services such as medication guidance. All positive results from phase I vanished in phase II and the authors test several hypotheses for why results did not replicate. They find that treatment group patients had similar hospitalization rates between phase I and II, but that control group patients had substantially lower phase II hospitalization rates. Outcome differences between phase I and phase II were risk-adjusted as phase II had an older population with higher severity of illness. The authors also used propensity score re-weighting to further control for differences in phase I and phase II populations. The affordable care act did promote similar care management services through patient-centered medical homes and accountable care organizations that likely contributed to the usual care of control group patients improving. The authors also note that the effectiveness of care management may be sensitive to the complexity of the target population needs. For example, the phase II population was more homebound and was therefore unable to participate in group classes. The big lesson in this paper though is that demonstration results may not replicate for different populations or even different time periods.

A machine learning framework for plan payment risk adjustment. Health Services Research [PubMed] Published December 2016

Since my company has been subsumed under IBM Watson Health, I have been trying to wrap my head around this big data revolution and the potential of technological advances such as artificial intelligence or machine learning. While machine learning has infiltrated other disciplines, it is really just starting to influence health economics, so watch out! This paper by Sherri Rose is a nice introduction into a range of machine learning techniques that she applies to the formulation of plan payment risk adjustments. In insurance systems where patients can choose from a range of insurance plans, there is the problem of adverse selection where some plans may attract an abundance of high risk patients. To control for this, plans (e.g. in the affordable care act marketplaces) with high percentages of high risk consumers get compensated based on a formula that predicts spending based on population characteristics, including diagnoses. Rose says that these formulas are still based on a 1970s framework of linear regression and may benefit from machine learning algorithms. Given that plan payment risk adjustments are essentially predictions, this does seem like a good application. In addition to testing goodness of fit of machine learning algorithms, Rose is interested in whether such techniques can reduce the number of variable inputs. Without going into any detail, insurers have found ways to “game” the system and fewer variable inputs would restrict this activity. Rose introduces a number of concepts in the paper (at least they were new to me) such as ensemble machine learningdiscrete learning frameworks and super learning frameworks. She uses a large private insurance claims dataset and breaks the dataset into what she calls 10 “folds” which allows her to run 5 prediction models, each with its own cross-validation dataset. Aside from one parametric regression model, she uses several penalized regression models, neural net, single-tree, and random forest models. She describes machine learning as aiming to smooth over data in a similar manner to parametric regression but with fewer assumptions and allowing for more flexibility. To reduce the number of variables in models, she applies techniques that limit variables to, for example, just the 10 most influential. She concludes that applying machine learning to plan payment risk adjustment models can increase efficiencies and her results suggest that it is possible to get similar results even with a limited number of variables. It is curious that the parametric model performed as well as or better than many of the different machine learning algorithms. I’ll take that to mean we can continue using our trusted regression methods for at least a few more years.


Variations in NHS admissions at a glance

Variations in admissions to NHS hospitals are the source of a great deal of consternation. Over the long-run, admissions and the volume of activity required of the NHS have increased, without equivalent increases in funding or productivity. Over the course of the year, there are repeated claims of crises as hospitals are ill-equipped for the increase in demand in the winter. While different patterns of admissions at weekends relative to weekdays may be the foundation of the ‘weekend effect’ as we recently demonstrated. And yet all these different sources of variation produce a singular time series of numbers of daily admissions. But, each of the different sources of variation are important for different planning and research aims. So let’s decompose the daily number of admissions into its various components.


Daily number of emergency admissions to NHS hospitals between April 2007 and March 2015 from Hospital Episode Statistics.


A similar analysis was first conducted on variations in the number of births by day of the year. A full description of the model can be found in Chapter 21 of the textbook Bayesian Data Analysis (indeed the model is shown on the front cover!). The model is a sum of Gaussian processes, each one modelling a different aspect of the data, such as the long-run trend or weekly periodic variation. We have previously used Gaussian processes in a geostatistical model on this blog. Gaussian processes are a flexible class of models for which any finite dimensional marginal distribution is Gaussian. Different covariance functions can be specified for different models, such as the aforementioned periodic or long-run trends. The model was run using the software GPstuff in Octave (basically an open-source version of Matlab) and we have modified code from the GPstuff website.



The four panels of the figure reveal to us things we may claim to already know. Emergency admissions have been increasing over time and were about 15% higher in 2015 than in 2007 (top panel). The second panel shows us the day of the week effects: there are about 20% fewer admissions on a Saturday or Sunday than on a weekday. The third panel shows a decrease in summer and increase in winter as we often see reported, although perhaps not quite as large as we might have expected. And finally the bottom panel shows the effects of different days of the year. We should note that the large dip at the end of March/beginning of April is an artifact of coding at the end of the financial year in HES and not an actual drop in admissions. But, we do see expected drops for public holidays such as Christmas and the August bank holiday.

While none of this is unexpected it does show that there’s a lot going on underneath the aggregate data. Perhaps the most alarming aspect of the data is the long run increase in emergency admissions when we compare it to the (lack of) change in funding or productivity. It suggests that hospitals will often be running at capacity so other variation, such as over winter, may lead to an excess capacity problem. We might also speculate on other possible ‘weekend effects’, such as admission on a bank holiday.

As a final thought, the method used to model the data is an excellent way of modelling data with an unknown structure without posing assumptions such as linearity that might be too strong. Hence their use in geostatistics. They are widely used in machine learning and artificial intelligence as well. We often encounter data with unknown and potentially complicated structures in health care and public health research so hopefully this will serve as a good advert for some new methods. See this book, or the one referenced in the methods section, for an in depth look.