Sam Watson’s journal round-up for 26th June 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Future and potential spending on health 2015–40: development assistance for health, and government, prepaid private, and out-of-pocket health spending in 184 countries. The Lancet [PubMed] Published 20th May 2017

The colossal research collaboration that is the Global Burden of Disease Study is well known for producing estimates of deaths and DALYs lost across the world due to a huge range of diseases. These figures have proven invaluable as a source of information to inform disease modelling studies and to help guide the development of public health programs. In this study, the collaboration turn their hands to modelling future health care expenditure. Predicting the future of any macroeconomic variable is tricky, to say the least. The approach taken here is to (1) model GDP to 2040 using an ensemble method, taking the ‘best performing’ models from the over 1,000 used (134 were included); (2) model all-sector government spending, out-of-pocket spending, and private health spending as a proportion of GDP in the same way, but with GDP as an input; and then (3) using a stochastic frontier approach to model maximum ‘potential’ spending. This latter step is an attempt to make the results potentially more useful by analysing different scenarios that might change overall health care expenditure by considering different frontiers. All of these steps would conceptually add a lot of uncertainty: the different probability of each model in the ensemble and the prediction uncertainty from each model including uncertainty in inputs such as size of population and demographic structure, all of which is propagated through the three step process. And this is without taking into account that health care spending at a national level is the result of a complex political decision making process, which can impact national income and prioritisation of health care in unforeseen ways (Brexit anyone?). Despite this, the predictions seem quite certain: health spending per capita is predicted to rise from $1,279 in 2014 to $2,872 with a 95% confidence intervention (or do they mean prediction interval?) of $2,426 to $3,522. It may well be a good model for average spending, but I suspect uncertainty (at least of a Bayesian kind) should be higher for a predictive model for 25 years into the future based on 20 years of data. The non-standard use of stochastic frontier analysis, which is typically a way of estimating technical efficiency, is also tricky to follow. The frontier is argued in this paper to be the maximum amount a country of similar levels of development spends on health care. This would also suggest that it is assumed spending cannot go higher than a country’s highest spending peer. A potentially strong assumption. Needless to say, these are the best predictions we currently have for future health care expenditure.

Discovering effect modification in an observational study of surgical mortality at hospitals with superior nursing. Journal of the Royal Statistical Society: Series A [ArXivPublished June 2017

An applied econometrician can find endogeneity everywhere. Such is the complexity of the social, political, and economic world. Everything is connected in some way. It’s one of the reasons I’ve argued before against null hypothesis significance testing: no effect is going to be exactly zero. Our job is one of measurement of the size of an effect and, crucially for this paper, what might affect the magnitude of these effects. This might start with a graphical or statistical exploratory analysis before proceeding to a confirmatory analysis. This paper proposes a method of exploratory analysis for treatment effect modifiers and examines the effect of superior nursing on treatment outcomes, which an approach I think to be a sensible scientific approach. But how does it propose to do it? Null hypothesis significance testing! Oh no! Essentially, the method involves a novel procedure for testing if treatment effects differ by group allowing for potential unobserved confounding and where the groups are also formed in a novel way. For example, the authors ask how much bias would need to be present for their conclusions to change. In terms of the effects of superior nurse staffing, the authors estimates that its beneficial treatment effect is the least sensitive to bias in a group of patients with the most serious conditions.

Incorporation of a health economic modelling tool into public health commissioning: Evidence use in a politicised context. Social Science & Medicine [PubMedPublished June 2017

Last up, a qualitative research paper (on an economics blog! I know…). Many health economists are involved in trying to encourage the incorporation of research findings into health care decision making and commissioning. The political decision making process often ends in inefficient or inequitable outcomes despite good evidence on what makes good policy. This paper explored how commissioners in an English local authority viewed a health economics decision tool for planning diabetes services. This is a key bit of research if we are to make headway in designing tools that actually improve commissioning decisions. Two key groups of stakeholders were involved, public health managers and politicians. The latter prioritized intelligence, local opinion, and social care agendas over scientific evidence from research, which was preferred by the former group. The push and pull between the different approaches meant the health economics tool was used as a way of supporting the agendas of different stakeholders rather than as a means to addressing complex decisions. For a tool to be successful it would seem to need to speak to or about the local population to which it is going to be applied. Well, that’s my interpretation. I’ll leave you with this quote from an interview with a manager in the study:

Public health, what they bring is a, I call it a kind of education scholarly kind of approach to things … whereas ‘social care’ sometimes are not so evidence-based-led. It’s a bit ‘well I thought that’ or, it’s a bit more fly by the seat of pants in social care.


Chris Sampson’s journal round-up for 22nd May 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The effect of health care expenditure on patient outcomes: evidence from English neonatal care. Health Economics [PubMed] Published 12th May 2017

Recently, people have started trying to identify opportunity cost in the NHS, by assessing the health gains associated with current spending. Studies have thrown up a wide range of values in different clinical areas, including in neonatal care. This study uses individual-level data for infants treated in 32 neonatal intensive care units from 2009-2013, along with the NHS Reference Cost for an intensive care cot day. A model is constructed to assess the impact of changes in expenditure, controlling for a variety of variables available in the National Neonatal Research Database. Two outcomes are considered: the in-hospital mortality rate and morbidity-free survival. The main finding is that a £100 increase in the cost per cot day is associated with a reduction in the mortality rate of 0.36 percentage points. This translates into a marginal cost per infant life saved of around £420,000. Assuming an average life expectancy of 81 years, this equates to a present value cost per life year gained of £15,200. Reductions in the mortality rate are associated with similar increases in morbidity. The estimated cost contradicts a much higher estimate presented in the Claxton et al modern classic on searching for the threshold.

A comparison of four software programs for implementing decision analytic cost-effectiveness models. PharmacoEconomics [PubMed] Published 9th May 2017

Markov models: TreeAge vs Excel vs R vs MATLAB. This paper compares the alternative programs in terms of transparency and validation, the associated learning curve, capability, processing speed and cost. A benchmarking assessment is conducted using a previously published model (originally developed in TreeAge). Excel is rightly identified as the ‘ubiquitous workhorse’ of cost-effectiveness modelling. It’s transparent in theory, but in practice can include cell relations that are difficult to disentangle. TreeAge, on the other hand, includes valuable features to aid model transparency and validation, though the workings of the software itself are not always clear. Being based on programming languages, MATLAB and R may be entirely transparent but challenging to validate. The authors assert that TreeAge is the easiest to learn due to its graphical nature and the availability of training options. Save for complex VBA, Excel is also simple to learn. R and MATLAB are equivalently more difficult to learn, but clearly worth the time saving for anybody expecting to work on multiple complex modelling studies. R and MATLAB both come top in terms of capability, with Excel falling behind due to having fewer statistical facilities. TreeAge has clearly defined capabilities limited to the features that the company chooses to support. MATLAB and R were both able to complete 10,000 simulations in a matter of seconds, while Excel took 15 minutes and TreeAge took over 4 hours. For a value of information analysis requiring 1000 runs, this could translate into 6 months for TreeAge! MATLAB has some advantage over R in processing time that might make its cost ($500 for academics) worthwhile to some. Excel and TreeAge are both identified as particularly useful as educational tools for people getting to grips with the concepts of decision modelling. Though the take-home message for me is that I really need to learn R.

Economic evaluation of factorial randomised controlled trials: challenges, methods and recommendations. Statistics in Medicine [PubMed] Published 3rd May 2017

Factorial trials randomise participants to at least 2 alternative levels (for example, different doses) of at least 2 alternative treatments (possibly in combination). Very little has been written about how economic evaluations ought to be conducted alongside such trials. This study starts by outlining some key challenges for economic evaluation in this context. First, there may be interactions between combined therapies, which might exist for costs and QALYs even if not for the primary clinical endpoint. Second, transformation of the data may not be straightforward, for example, it may not be possible to disaggregate a net benefit estimation with its components using alternative transformations. Third, regression analysis of factorial trials may be tricky for the purpose of constructing CEACs and conducting value of information analysis. Finally, defining the study question may not be simple. The authors simulate a 2×2 factorial trial (0 vs A vs B vs A+B) to demonstrate these challenges. The first analysis compares A and B against placebo separately in what’s known as an ‘at-the-margins’ approach. Both A and B are shown to be cost-effective, with the implication that A+B should be provided. The next analysis uses regression, with interaction terms demonstrating the unlikelihood of being statistically significant for costs or net benefit. ‘Inside-the-table’ analysis is used to separately evaluate the 4 alternative treatments, with an associated loss in statistical power. The findings of this analysis contradict the findings of the at-the-margins analysis. A variety of regression-based analyses is presented, with the discussion focussed on the variability in the estimated standard errors and the implications of this for value of information analysis. The authors then go on to present their conception of the ‘opportunity cost of ignoring interactions’ as a new basis for value of information analysis. A set of 14 recommendations is provided for people conducting economic evaluations alongside factorial trials, which could be used as a bolt-on to CHEERS and CONSORT guidelines.


Geostatistical modelling with R and Stan

One of the advantages of writing blogs is that it can help to refresh and consolidate you thoughts on a topic. And when you spend a lot of time writing stats code, other people’s blogs that discuss how to code specific statistical models can be invaluable. I have recently found myself delving into spatial modelling and geostatistics with the aim of taking survey data from a number of discrete locations to model an underlying continuous distribution. This is typically for something like disease prevalence; the map featured in our post about slum health is an example. However, I have wanted to code my own Bayesian geostatistical model. There is a wealth of literature on these models: we recently featured one on a journal round up. But, moving from these models to usable code is not always obvious, so this post outlines an example using the example of mapping childhood stunting in Kenya with data from the Kenya Demographic and Health Survey.

A geostatistical model

A geostatistical model, in its most basic form, analyses spatially discrete data sampled across an area that are assumed to be sampled from some underlying and unobserved continuous process. We often want to model this continuous process to be able to predict outcomes at a new location.

The survey locations across a given area, A, can be denoted as x_i \in A \subset \mathbb{R}^2 for locations i=1,...,n. At each location there is an underlying prevalence or probability, p(x_i), the number of sampled individuals or households, m_i and the observed outcomes Y_i. The standard sampling model is then binomial:

Y_i \sim Bin(m_i,p(x_i))

To make inferences about the distribution of the prevalence of a disease or probability of some outcome across the area and to predict the prevalence at new locations, we need to model the link between the p(x_i) at different locations. The following logistic regression model is often used:

log\frac{p(x_i)}{1-p(x_i)} = z(x_i)'\beta + S(x_i) + \alpha_i

where z(x_i) is a vector of explanatory variables, \alpha_i are independent N(0,\tau^2) variates, and S(x_i) is a Gaussian process explained below.

Gaussian processes

A Gaussian process is a continuous stochastic process that can be interpreted as providing a probability distribution over functions. The key property is that the distribution of the function’s value at a finite number of points is multivariate normal. For our S(x_i) above we have a zero mean multivariate normal:

S(x) \sim N(\mathbf{0},k(x))

with covariance function k(x_i). A popular choice of correlation function is the Matérn function. But, we’ll use the following exponential function to keep things simple:

k(x_{ij}) = exp \left( -\phi d_{ij} \right) + \mathbf{1}_{i=j}\sigma^2

where d_{ij} = \lVert x_i - x_j \rVert is the Euclidean distance between two points, \mathbf{1}_{i=j} is the indicator function equal to one if indices are the same and zero otherwise, and \sigma^2 is a ‘noise’ term.


As this is a Bayesian model, we need to specify priors for our parameters and hyperparameters. For simplicities sake we are not going to include any covariates, so that z(x_i) is just a vector of ones, and drop the term \alpha_i for now. We’ll go for some weakly informative priors: half-normal(0,5) priors for \phi and \sigma^2 and a N(0,5) prior for our intercept \beta.

Bayesian approaches to geostatistical modelling

A great place to start reading about these sorts of models is a paper by Diggle, Tawn, and Moyeed (1998) and the example in this post is very similar to an example in their paper. They estimate a Bayesian model using MCMC. But one of the problems with MCMC is that the parameters in a Gaussian process model are often highly correlated resulting in very slow convergence of the chains. For this reason we’ll use Stan, which is described below. Another problem with fully Bayesian approaches to geostatistics, and other Gaussian process models, is that they can be prohibitively computationally intensive. For a given sample size n, the memory requirements scale with n^2 and the computational time scales with n^3. New approaches are being developed, such as the use of Kroenecker decompositions [PDF] of the covariance matrix, but these can’t be used for non-conjugate models such as ours. The DHS Program itself has a report on geostatistical modelling with DHS data. They use an INLA algorithm to estimate the models, which is another alternative to that employed here.

Coding the model

We are going to use Stan to code and sample from our model. Stan is a ‘probabilistic programming language’ that implements a ‘no-U-turn sampler’ – a kind of Hamiltonian Monte Carlo sampler that adaptively tunes its step parameters. We have mentioned Stan before on this blog. I think it’s great and have been adapting most of my models to it. The documentation, on the website, is in depth, thorough, and chock full of examples including, and thankfully for me, how to code Gaussian processes! Some of the online forums are also useful, including this discussion, which I should credit. The code for the model described above is as follows:

data {
 int<lower=1> N1; //number of data points
 int x1[N1]; //number of outcomes
 int n1[N1]; // number of observations
 int<lower=1> N2; // number of new points
 matrix[N1+N2,N1+N2] dist; //distances between points
transformed data{
 int<lower=1> N;
 N = N1 + N2;
 vector[N1] y1;
 vector[N2] y2;
 real beta;
 real  sigma_sq;
 real  phi;
transformed parameters{
 vector[N1+N2] mu;
 for(i in 1:N) mu[i] = beta;
 vector[N] y;
 matrix[N,N] Sigma;
 matrix[N,N] L;
 for(i in 1:N1) y[i] = y1[i];
 for(i in 1:N2) y[N1+i] = y2[i];
 for(i in 1:(N-1)){
   for(j in (i+1):N){
     Sigma[i,j] = exp((-1)*phi*dist[i,j]);
     Sigma[j,i] = Sigma[i,j];
 for(i in 1:N) Sigma[i,i] = sigma_sq;
 L = cholesky_decompose(Sigma);
 sigma_sq ~ normal(0, 5);
 phi ~ normal(0, 5);
 y ~ multi_normal_cholesky(mu,L);
 beta ~ normal(0,5);
 x1 ~ binomial_logit(n1,y1);
generated quantities{
 vector[N2] y_pred;
 for(i in 1:N2) y_pred[i] = inv_logit(beta+y2[i]);

Some notes:

  • We provide (below) the data and the locations of the survey clusters as well as the locations of all the points we want to predict. We also provide the matrix with the distance between all points both in the data and in the new locations.
  • The generated quantities block predict the probability at the new locations.
  • We have used the Cholesky decomposition of the covariance matrix, which provides a speed boost. (See the Stan manual for a nice explanation).


We are going to use the most recent Kenya DHS survey. Unfortunately, these data can’t be shared here, but access can be sought at the DHS website. There you’ll also find data definitions and further info. As discussed above, large sample sizes can be prohibitive in the fully Bayesian set up. So for this example we’ll focus on just one province of Kenya, the Western Province (or what was formerly the Western Province, as now Kenya is divided into counties).

We’ll use R for the analysis. We’ll need the individual and geographic survey data, which we’ll read into R

df<-read.dta("C:/individual data set file path")
geo<-read.dta("C:/geo dataset file path")

From the individual dataset we only need the cluster id (v001) and variables for child height for age as percentile (hw4). Stunting is defined as being in the fifth percentile or below for height for age, so we’ll convert to a binary variable:

for(i in vars){

and then we’ll aggregate all the children within each cluster by creating a new data frame (called dfc) with a row per cluster and then looping over each cluster id

for(i in unique(df$v001)){
 tmp<-df[ df$v001==i,2:7]

Now, we assign the longitude and latitude from the geographic dataset (note: the locations in the DHS are randomly displaced by up to 2km, we are assuming that this is small enough not to matter on the regional scale):


The next step is to construct the grid of new locations we want to estimate the prevalence at. Firstly, we’ll create a rectangular grid that covers the Western province, then we’ll remove the points outside of the province, which is a bit of a faff (although I’ll admit there may be a better way)! We’ll need a shapefile for Kenya, which I got from here, and we’ll use the code from here to put it into a useful format.

area@data$id = rownames(area@data)
area.points = fortify(area, region="id")
area.df = join(area.points, area@data, by="id")
area@data$id = rownames(area@data)
area.points = fortify(area, region="id")
area.df = join(area.points, area@data, by="id")

Then we can select the counties in our province and remove the rest


We can plot to make sure it’s right:

ggplot(area.df) + 
 aes(long,lat,group=group) + 
 geom_polygon() +


Looks good. Now, let’s create grid of new points based on the longitude and latitude at the corners and a step size of 0.06, which could be made smaller or larger,


Then we go through each of the counties and determine if each point is in it using the function of the sp package. We will also do the same for our data.

for(i in counties){
            area.df[ area.df$COUNTY==i,'long'],area.df[ area.df$COUNTY==i,'lat'])
            area.df[ area.df$COUNTY==i,'long'],area.df[ area.df$COUNTY==i,'lat'])

Almost there! We finally convert the longitude and latitude to eastings and northings

tmp <- projectMercator(fil$y,fil$x)

tmp<- projectMercator(dfc$lat,dfc$long)

and create our distance matrix (dividing by 10,000 so that we’re working on a scale of tens of kms)


and all that remains is to run the code! But before we do that let’s just take a look at the locations of the two data frames. I like to use maps with nice features, so we’ll plot a map of Kenya using the OpenStreetMap package and add on the survey locations (blue) and also look at our new grid of points (red) (hence why we needed the eastings and northings):

 c(-0.1,35.2), type="osm")


We may want to make the grid of points a little more dense, but you can see the principle. In the map, ‘p’ is the crude prevalence at each cluster.


The Stan code above, we have saved to, say, “C:/model.stan”, and it is then easy to run from R.

 N1 = length(dfc[,1]),
 x1 = dfc$x,
 n1 = dfc$n,
 N2 = length(fil[,1]),
 dist = d1

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

fit <- stan(
 data = dat,
 chains = 3,
 iter = 500


Now, let’s take a look at the output. Firstly, let’s look at the parameter estimates for \phi, \sigma^2, and \beta.


Inference for Stan model: geo4.
3 chains, each with iter=500; warmup=250; thin=1; 
post-warmup draws per chain=250, total post-warmup draws=750.

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta -0.74 0.03 0.63 -2.03 -1.11 -0.74 -0.35 0.52 368 1.00
phi 9.02 0.30 3.20 3.65 6.67 8.92 10.87 15.69 114 1.03
sigma_sq 1.08 0.01 0.06 1.01 1.04 1.07 1.11 1.21 23 1.07

Samples were drawn using NUTS(diag_e) at Tue Dec 06 15:37:55 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The Rhat stat gives an indication of convergence of the chains, we’re looking for a value typically under 1.01. This results suggest we don’t quite have convergence for \phi and \sigma^2. We can also assess convergence by looking at trace plot of the chains.



The traceplot for \sigma^2 does suggest a lack of convergence, which may suggest there’s no unique posterior mode or another problem. We can try re-specifying the model or running for more iterations to see if things improve, but we’ll leave it for now.

Now let’s plot the output on our map of Kenya. There are lots of different ways we can visualise this: the predicted prevalence, the probability the prevalence is above or below a certain threshold, or a combination of the two. We’ll plot the estimated prevalence and add in contour lines for where there is a greater than 80% probability that the prevalence of stunting is greater than 10%, which is considered serious:

fil$prob<-apply(pred,2,function(x)I(length(x[x>0.10])/length(x) > 0.8)*1)

p2+ geom_raster(data=fil,aes(x=x2,y=y2,fill=est),alpha=0.8,interpolate=TRUE)+



Hopefully this post will be useful to those looking to do some geostatistical modelling or for an example of converting statistical models into usable code. I’ve opted for a Bayesian approach, which allows us to allow for and average over the uncertainty in the model parameters and hyperparameters. There are lots of other ways of estimating these sorts of models, for example, check out the R package PrevMap which implements Markov Chain maximum likelihood methods. Finally, check out these two recent papers (among many) for good, recent discussions of these methods.


Some may have wondered why the featured image for this post was of people panning for gold. Well, geostatistics has its origins in the South African gold mining industry. Gold occurs in thin seams between sedimentary strata and is often very deep and is of variable quality even within a seam. Geostatistics was developed in an attempt to predict the location of gold seams from a small number of samples. One of the pioneers was Danie Krige, hence why geostatistical estimation is often referred to as kriging.