Sam Watson’s journal round-up for 30th April 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The Millennium Villages Project: a retrospective, observational, endline evaluation. The Lancet Global Health [PubMedPublished May 2018

There are some clinical researchers who would have you believe observational studies are completely useless. The clinical trial is king, they might say, observation studies are just too biased. And while it’s true that observational studies are difficult to do well and convincingly, they can be a reliable and powerful source of evidence. Similarly, randomised trials are frequently flawed, for example there’s often missing data that hasn’t been dealt with, or a lack of allocation concealment, and many researchers forget that randomisation does not guarantee a balance of covariates, it merely increases the probability of it. I bring this up, as this study is a particularly carefully designed observational data study that I think serves as a good example to other researchers. The paper is an evaluation of the Millennium Villages Project, an integrated intervention program designed to help rural villages across sub-Saharan Africa meet the Millennium Development Goals over ten years between 2005 and 2015. Initial before-after evaluations of the project were criticised for inferring causal “impacts” from before and after data (for example, this Lancet paper had to be corrected after some criticism). To address these concerns, this new paper is incredibly careful about choosing appropriate control villages against which to evaluate the intervention. Their method is too long to summarise here, but in essence they match intervention villages to other villages on the basis of district, agroecological zone, and a range of variables from the DHS – matches were they reviewed for face validity and revised until a satisfactory matching was complete. The wide range of outcomes are all scaled to a standard normal and made to “point” in the same direction, i.e. so an increase indicated economic development. Then, to avoid multiple comparisons problems, a Bayesian hierarchical model is used to pool data across countries and outcomes. Costs data were also reported. Even better, “statistical significance” is barely mentioned at all! All in all, a neat and convincing evaluation.

Reconsidering the income‐health relationship using distributional regression. Health Economics [PubMed] [RePEcPublished 19th April 2018

The relationship between health and income has long been of interest to health economists. But it is a complex relationship. Increases in income may change consumption behaviours and a change in the use of time, promoting health, while improvements to health may lead to increases in income. Similarly, people who are more likely to make higher incomes may also be those who look after themselves, or maybe not. Disentangling these various factors has generated a pretty sizeable literature, but almost all of the empirical papers in this area (and indeed all empirical papers in general) use modelling techniques to estimate the effect of something on the expected value, i.e. mean, of some outcome. But the rest of the distribution is of interest – the mean effect of income may not be very large, but a small increase in income for poorer individuals may have a relatively large effect on the risk of very poor health. This article looks at the relationship between income and the conditional distribution of health using something called “structured additive distribution regression” (SADR). My interpretation of SADR is that, one would model the outcome y ~ g(a,b) as being distributed according to some distribution g(.) indexed by parameters a and b, for example, a normal or Gamma distribution has two parameters. One would then specify a generalised linear model for a and b, e.g. a = f(X’B). I’m not sure this is a completely novel method, as people use the approach to, for example, model heteroscedasticity. But that’s not to detract from the paper itself. The findings are very interesting – increases to income have a much greater effect on health at the lower end of the spectrum.

Ask your doctor whether this product is right for you: a Bayesian joint model for patient drug requests and physician prescriptions. Journal of the Royal Statistical Society: Series C Published April 2018.

When I used to take econometrics tutorials for undergraduates, one of the sessions involved going through coursework about the role of advertising. To set the scene, I would talk about the work of Alfred Marshall, the influential economist from the late 1800s/early 1900s. He described two roles for advertising: constructive and combative. The former is when advertising grows the market as a whole, increasing everyone’s revenues, and the latter is when ads just steal market share from rivals without changing the size of the market. Later economists would go on to thoroughly develop theories around advertising, exploring such things as the power of ads to distort preferences, the supply of ads and their complementarity with the product they’re selling, or seeing ads as a source of consumer information. Nevertheless, Marshall’s distinction is still a key consideration, although often phrased in different terms. This study examines a lot of things, but one of its key objectives is to explore the role of direct to consumer advertising on prescriptions of brands of drugs. The system is clearly complex: drug companies advertise both to consumers and physicians, consumers may request the drug from the physician, and the physician may or may not prescribe it. Further, there may be correlated unobservable differences between physicians and patients, and the choice to advertise to particular patients may not be exogenous. The paper does a pretty good job of dealing with each of these issues, but it is dense and took me a couple of reads to work out what was going on, especially with the mix of Bayesian and Frequentist terms. Examining the erectile dysfunction drug market, the authors reckon that direct to consumer advertising reduces drug requests across the category, while increasing the proportion of requests for the advertised drug – potentially suggesting a “combative” role. However, it’s more complex than that patient requests and doctor’s prescriptions seem to be influenced by a multitude of factors.


Method of the month: Semiparametric models with penalised splines

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is semiparametric models with penalised splines.


A common assumption of regression models is that effects are linear and additive. However, nothing is ever really that simple. One might respond that all models are wrong, but some are useful, as George Box once said. And the linear, additive regression model has coefficients that can be interpreted as average treatment effects under the right assumptions. Sometimes though we are interested in conditional average treatment effects and how the impact of an intervention varies according to the value of some variable of interest. Often this relationship is not linear and we don’t know its functional form. Splines provide a way of estimating curves (or surfaces) of unknown functional form and are a widely used tool for semiparametric regression models. The term ‘spline’ was derived from the tool shipbuilders and drafters used to construct smooth edges: a bendable piece of material that when fixed at a number of points would relax into the desired shape.


Our interest lies in estimating the unknown function m:

y_i = m(x_i) + e_i

A ‘spline’ in the mathematical sense is a function constructed piece-wise from polynomial functions. The places where the functions meet are known as knots and the spline has order equal to one more than the degree of the underlying polynomial terms. Basis-splines or B-splines are the typical starting point for spline functions. These are curves that are defined recursively as a sum of ‘basis functions’, which depend only on the polynomial degree and the knots. A spline function can be represented as a linear combination of B-splines, the parameters dictating this combination can be estimated using standard regression model estimation techniques. If we have N B-splines then our regression function can be estimated as:

y_i = \sum_{j=1}^N ( \alpha_j B_j(x_i) ) + e_i

by minimising \sum_{i=1}^N \{ y_i - \sum_{j=1}^N ( \alpha_j B_j(x_i) ) \} ^2. Where the B_j are the B-splines and the \alpha_j are coefficients to be estimated.

Useful technical explainers of splines and B-splines can be found here [PDF] and here [PDF].

One issue with fitting splines to data is that we run the risk of ‘overfitting’. Outliers might distort the curve we fit, damaging the external validity of conclusions we might make. To deal with this, we can enforce a certain level of smoothness using so-called penalty functions. The smoothness (or conversely the ‘roughness’) of a curve is often defined by the integral of the square of the second derivative of the curve function. Penalised-splines, or P-splines, were therefore proposed which added on this smoothness term multiplied by a smoothing parameter \lambda. In this case, we look to minimising:

\sum_{i=1}^N \{ y_i - \sum_{j=1}^N ( \alpha_j B_j(x_i) ) \}^2 + \lambda\int m''(x_i)^2 dx

to estimate our parameters. Many other different variations on this penalty have been proposed. This article provides a good explanation of P-splines.

An attractive type of spline has become the ‘low rank thin plate spline‘. This type of spline is defined by its penalty, which has a physical analogy with the resistance that a thin sheet of metal puts up when it is bent. This type of spline removes the problem associated with thin plate splines of having too many parameters to estimate by taking a ‘low rank’ approximation, and it is generally insensitive to the choice of knots, which other penalised spline regression models are not.

Crainiceanu and colleagues show how the low rank thin plate smooth splines can be represented as a generalised linear mixed model. In particular, our model can be represented as:

m(x_i) = \beta_0 + \beta_1x_i  + \sum_{k=1}^K u_k |x_i - \kappa_k|^3

where \kappa_k, k=1,...,K, are the knots. The parameters, \theta = (\beta_0,\beta_1,u_k)', can be estimated by minimising

\sum_{i=1}^N \{ y_i - m(x_i) \} ^2 + \frac{1}{\lambda} \theta ^T D \theta .

This is shown to give the mixed model

y_i = \beta_0 + \beta_1 + Z'b + u_i

where each random coefficient in the vector b is distributed as N(0,\sigma^2_b) and Z and D are given in the paper cited above.

As a final note, we have discussed splines in one dimension, but they can be extended to more dimensions. A two-dimensional spline can be generated by taking the tensor product of the two one dimensional spline functions. I leave this as an exercise for the reader.



  • The package gamm4 provides the tools necessary for a frequentist analysis along the lines described in this post. It uses restricted maximum likelihood estimation with the package lme4 to estimate the parameters of the thin plate spline model.
  • A Bayesian version of this functionality is implemented in the package rstanarm, which uses gamm4 to produce the matrices for thin plate spline models and Stan for the estimation through the stan_gamm4 function.

If you wanted to implement these models for yourself from scratch, Crainiceanu and colleagues provide the R code to generate the matrices necessary to estimate the spline function:

sqrt.OMEGA_all<-t(svd.OMEGA_all$v %*%


I will temper this advice by cautioning that I have never estimated a spline-based semi-parametric model in Stata, so what follows may be hopelessly incorrect. The only implementation of penalised splines in Stata is the package and associated function psplineHowever, I cannot find any information about the penalty function used, so I would advise some caution when implementing. An alternative is to program the model yourself, through conversion of the above R code in Mata to generate the matrix Z and then the parameters could be estimated with xtmixed. 


Applications of these semi-parametric models in the world of health economics have tended to appear more in technical or statistical journals than health economics journals or economics more generally. For example, recent examples include Li et al who use penalised splines to estimate the relationship between disease duration and health care costs. Wunder and co look at how reported well-being varies over the course of the lifespan. And finally, we have Stollenwerk and colleagues who use splines to estimate flexible predictive models for cost-of-illness studies with ‘big data’.



Brent Gibbons’s journal round-up for 30th January 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

For this week’s round-up, I selected three papers from December’s issue of Health Services Research. I didn’t intend to to limit my selections to one issue of one journal but as I narrowed down my selections from several journals, these three papers stood out.

Treatment effect estimation using nonlinear two-stage instrumental variable estimators: another cautionary note. Health Services Research [PubMed] Published December 2016

This paper by Chapman and Brooks evaluates the properties of a non-linear instrumental variables (IV) estimator called two-stage residual inclusion or 2SRI. 2SRI has been more recently suggested as a consistent estimator of treatment effects under conditions of selection bias and where the dependent variable of the 2nd-stage equation is either binary or otherwise non-linear in its distribution. Terza, Bradford, and Dismuke (2007) and Terza, Basu, and Rathouz (2008) furthermore claimed that 2SRI estimates can produce unbiased estimates not just of local average treatment effects (LATE) but of average treatment effects (ATE). However, Chapman and Brooks question why 2SRI, which is analogous to two-stage least squares (2SLS) when both the first and second stage equations are linear, should not require similar assumptions as in 2SLS when generalizing beyond LATE to ATE. Backing up a step, when estimating treatment effects using observational data, one worry when trying to establish a causal effect is bias due to treatment choice. Where patient characteristics related to treatment choice are unobservable and one or more instruments is available, linear IV estimation (i.e. 2SLS) produces unbiased and consistent estimates of treatment effects for “marginal patients” or compliers. These are the patients whose treatment effects were influenced by the instrument and their treatment effects are termed LATE. But if there is heterogeneity in treatment effects, a case needs to be made that treatment effect heterogeneity is not related to treatment choice in order to generalize to ATE.  Moving to non-linear IV estimation, Chapman and Brooks are skeptical that this case for generalizing LATE to ATE no longer needs to be made with 2SRI. 2SRI, for those not familiar, uses the residual from stage 1 of a two-stage estimator as a variable in the 2nd-stage equation that uses a non-linear estimator for a binary outcome (e.g. probit) or another non-linear estimator (e.g. poisson). The authors produce a simulation that tests the 2SRI properties over varying conditions of uniqueness of the marginal patient population and the strength of the instrument. The uniqueness of the marginal population is defined as the extent of the difference in treatment effects for the marginal population as compared to the general population. For each scenario tested, the bias between the estimated LATE and the true LATE and ATE is calculated. The findings support the authors’ suspicions that 2SRI is subject to biased results when uniqueness is high. In fact, the 2SRI results were only practically unbiased when uniqueness was low, but were biased for both ATE and LATE when uniqueness was high. Having very strong instruments did help reduce bias. In contrast, 2SLS was always practically unbiased for LATE for different scenarios and the authors use these results to caution researchers on using “new” estimation methods without thoroughly understanding their properties. In this case, old 2SLS still outperformed 2SRI even when dependent variables were non-linear in nature.

Testing the replicability of a successful care management program: results from a randomized trial and likely explanations for why impacts did not replicate. Health Services Research [PubMed] Published December 2016

As is widely known, how to rein in U.S. healthcare costs has been a source of much hand-wringing. One promising strategy has been to promote better management of care in particular for persons with chronic illnesses. This includes coordinating care between multiple providers, encouraging patient adherence to care recommendations, and promoting preventative care. The hope was that by managing care for patients with more complex needs, higher cost services such as emergency visits and hospitalizations could be avoided. CMS, the Centers for Medicare and Medicaid Services, funded a demonstration of a number of care management programs to study what models might be successful in improving quality and reducing costs. One program implemented by Health Quality Partners (HQP) for Medicare Fee-For-Service patients was successful in reducing hospitalizations (by 34 percent) and expenditures (by 22 percent) for a select group of patients who were identified as high-risk. The demonstration occurred from 2002 – 2010 and this paper reports results for a second phase of the demonstration where HQP was given additional funding to continue treating only high-risk patients in the years 2010 – 2014. High-risk patients were identified as having a diagnosis of congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD), coronary artery disease (CAD), or diabetes and had a hospitalization in the year prior to enrollment. In essence, phase II of the demonstration for HQP served as a replication of the original demonstration. The HQP care management program was delivered by nurse coordinators who regularly talked with patients and provided coordinated care between primary care physicians and specialists, as well as other services such as medication guidance. All positive results from phase I vanished in phase II and the authors test several hypotheses for why results did not replicate. They find that treatment group patients had similar hospitalization rates between phase I and II, but that control group patients had substantially lower phase II hospitalization rates. Outcome differences between phase I and phase II were risk-adjusted as phase II had an older population with higher severity of illness. The authors also used propensity score re-weighting to further control for differences in phase I and phase II populations. The affordable care act did promote similar care management services through patient-centered medical homes and accountable care organizations that likely contributed to the usual care of control group patients improving. The authors also note that the effectiveness of care management may be sensitive to the complexity of the target population needs. For example, the phase II population was more homebound and was therefore unable to participate in group classes. The big lesson in this paper though is that demonstration results may not replicate for different populations or even different time periods.

A machine learning framework for plan payment risk adjustment. Health Services Research [PubMed] Published December 2016

Since my company has been subsumed under IBM Watson Health, I have been trying to wrap my head around this big data revolution and the potential of technological advances such as artificial intelligence or machine learning. While machine learning has infiltrated other disciplines, it is really just starting to influence health economics, so watch out! This paper by Sherri Rose is a nice introduction into a range of machine learning techniques that she applies to the formulation of plan payment risk adjustments. In insurance systems where patients can choose from a range of insurance plans, there is the problem of adverse selection where some plans may attract an abundance of high risk patients. To control for this, plans (e.g. in the affordable care act marketplaces) with high percentages of high risk consumers get compensated based on a formula that predicts spending based on population characteristics, including diagnoses. Rose says that these formulas are still based on a 1970s framework of linear regression and may benefit from machine learning algorithms. Given that plan payment risk adjustments are essentially predictions, this does seem like a good application. In addition to testing goodness of fit of machine learning algorithms, Rose is interested in whether such techniques can reduce the number of variable inputs. Without going into any detail, insurers have found ways to “game” the system and fewer variable inputs would restrict this activity. Rose introduces a number of concepts in the paper (at least they were new to me) such as ensemble machine learningdiscrete learning frameworks and super learning frameworks. She uses a large private insurance claims dataset and breaks the dataset into what she calls 10 “folds” which allows her to run 5 prediction models, each with its own cross-validation dataset. Aside from one parametric regression model, she uses several penalized regression models, neural net, single-tree, and random forest models. She describes machine learning as aiming to smooth over data in a similar manner to parametric regression but with fewer assumptions and allowing for more flexibility. To reduce the number of variables in models, she applies techniques that limit variables to, for example, just the 10 most influential. She concludes that applying machine learning to plan payment risk adjustment models can increase efficiencies and her results suggest that it is possible to get similar results even with a limited number of variables. It is curious that the parametric model performed as well as or better than many of the different machine learning algorithms. I’ll take that to mean we can continue using our trusted regression methods for at least a few more years.