Method of the month: Synthetic control

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is synthetic control.

Principles

Health researchers are often interested in estimating the effect of a policy of change at the aggregate level. This might include a change in admissions policies at a particular hospital, or a new public health policy applied to a state or city. A common approach to inference in these settings is difference in differences (DiD) methods. Pre- and post-intervention outcomes in a treated unit are compared with outcomes in the same periods for a control unit. The aim is to estimate a counterfactual outcome for the treated unit in the post-intervention period. To do this, DiD assumes that the trend over time in the outcome is the same for both treated and control units.

It is often the case in practice that we have multiple possible control units and multiple time periods of data. To predict the post-intervention counterfactual outcomes, we can note that there are three sources of information: i) the outcomes in the treated unit prior to the intervention, ii) the behaviour of other time series predictive of that in the treated unit, including outcomes in similar but untreated units and exogenous predictors, and iii) prior knowledge of the effect of the intervention. The latter of these only really comes into play in Bayesian set-ups of this method. With longitudinal data we could just throw all this into a regression model and estimate the parameters. However, generally, this doesn’t allow for unobserved confounders to vary over time. The synthetic control method does.

Implementation

Abadie, Diamond, and Haimueller motivate the synthetic control method using the following model:

y_{it} = \delta_t + \theta_t Z_i + \lambda_t \mu_i + \epsilon_{it}

where y_{it} is the outcome for unit i at time t, \delta_t are common time effects, Z_i are observed covariates with time-varying parameters \theta_t, \lambda_t are unobserved common factors with \mu_i as unobserved factor loadings, and \epsilon_{it} is an error term. Abadie et al show in this paper that one can derive a set of weights for the outcomes of control units that can be used to estimate the post-intervention counterfactual outcomes in the treated unit. The weights are estimated as those that would minimise the distance between the outcome and covariates in the treated unit and the weighted outcomes and covariates in the control units. Kreif et al (2016) extended this idea to multiple treated units.

Inference is difficult in this framework. So to produce confidence intervals, ‘placebo’ methods are proposed. The essence of this is to re-estimate the models, but using a non-intervention point in time as the intervention date to determine the frequency with which differences of a given order of magnitude are observed.

Brodersen et al take a different approach to motivating these models. They begin with a structural time-series model, which is a form of state-space model:

y_t = Z'_t \alpha_t + \epsilon_t

\alpha_{t+1} = T_t \alpha_t + R_t \eta_t

where in this case, y_t is the outcome at time t, \alpha_t is the state vector and Z_t is an output vector with \epsilon_t as an error term. The second equation is the state equation that governs the evolution of the state vector over time where T_t is a transition matrix, R_t is a diffusion matrix, and \eta_t is the system error.

From this setup, Brodersen et al expand the model to allow for control time series (e.g. Z_t = X'_t \beta), local linear time trends, seasonal components, and allowing for dynamic effects of covariates. In this sense the model is perhaps more flexible than that of Abadie et al. Not all of the large number of covariates may be necessary, so they propose a ‘slab and spike’ prior, which combines a point mass at zero with a weakly informative distribution over the non-zero values. This lets the data select the coefficients, as it were.

Inference in this framework is simpler than above. The posterior predictive distribution can be ‘simply’ estimated for the counterfactual time series to give posterior probabilities of differences of various magnitudes.

Software

Stata

  • Synth Implements the method of Abadie et al.

R

  • Synth Implements the method of Abadie et al.
  • CausalImpact Implements the method of Brodersen et al.

Applications

Kreif et al (2016) estimate the effect of pay for performance schemes in hospitals in England and compare the synthetic control method to DiD. Pieters et al (2016) estimate the effects of democratic reform on under-five mortality. We previously covered this paper in a journal round-up and a subsequent post, for which we also used the Brodersen et al method described above. We recently featured a paper by Lépine et al (2017) in a discussion of user fees. The synthetic control method was used to estimate the impact that the removal of user fees had in various districts of Zambia on use of health care.

Credit 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s