Method of the month: custom likelihoods with Stan

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is custom likelihoods with Stan.


Regular readers of this blog will know that I am a fan of Bayesian methods. The exponential growth in personal computing power has opened up a whole new range of Bayesian models at home. WinBUGS and JAGS were the go-to pieces of software for estimating Bayesian models, both using Markov Chain Monte Carlo (MCMC) methods.  Theoretically, an MCMC chain will explore the posterior distribution. But MCMC has flaws. For example, if the target distribution has a high degree of curvature, such as many hierarchical models might exhibit, then MCMC chains can have trouble exploring it. To compensate, the chains stay in the ‘difficult’ bit of the space for longer before leaving to go elsewhere so its average oscillates around the true value. Asymptotically, these oscillations balance out, but in real, finite time, they ultimately lead to bias. And further, MCMC chains are very slow to converge to the target distribution, and for complex models can take a literal lifetime. An alternative, Hamiltonian Monte Carlo (HMC), provides a solution to these issues. Michael Betancourt’s introduction to HCM is great for anyone interested in the topic.

Stan is a ‘probabilistic programming language’ that implements HMC. A huge range of probability distributions are already implemented in the software, check out the manual for more information. And there is an R package, rstanarm, that estimates a number of standard models using normal R code that even means you can use these tools without learning the code. However, Stan may not have the necessary distributions for more complex econometric or statistical models. It used to be the case that you would have to build your own MCMC sampler – but given the problems with MCMC, this is now strongly advised against in lieu of HMC. Fortunately, we can implement our own probability density functions in Stan. So, if you can write down the (log) likelihood for your model, you can estimate it in Stan!

The aim of this post is to provide an example of implementing a custom probability function in Stan from the likelihood of our model. We will look at the nested logit model. These models have been widely used for multinomial choice problems. An area of interest among health economists is the choice of hospital provider. A basic multinomial choice model, such as a multinomial logit, requires an independence of irrelevant alternatives (IIA) assumption that says the odds of choosing one option over another is independent of any other alternative. For example, it would assume that the odds of me choosing the pharmacy in town over the hospital in town would be unaffected by a pharmacy opening on my own road. This is likely too strong. There are many ways to relax this assumption, the nested logit being one. The nested logit is useful when choices can be ‘nested’ in groups and assumes there is a correlation among choices with each group. For example, we can group health care providers into pharmacies, primary care clinics, hospitals, etc. such as this:



Econometric model

Firstly, we need a nesting structure for our choices, like that described above. We’ll consider a 2-level nesting structure, with branches and total choices, with Rt choices in each branch t. Like with most choice models we start from an additive random utility model, which is, for individual i=1,…,N, and with choice over branch and option:

U_{itr} = V_{itr} + \epsilon_{itr}

Then the chosen option is the one with the highest utility. The motivating feature of the nested logit is that the hierarchical structure allows us to factorise the joint probability of choosing branch and option r into a conditional and marginal model:

p_{itr} = p_{it} \times p_{ir|t}

Multinomial choice models arise when the errors are assumed to have a generalised extreme value (GEV) distribution, which gives use the multinomial logit model. We will model the deterministic part of the equation with branch-varying and option-varying variables:

V_{itr} = Z_{it}'\alpha + X_{itr}'\beta_t

Then the model can be written as:

p_{itr} = p_{it} \times p_{ir|t} = \frac{exp(Z_{it}'\alpha + \rho_t I_{it})}{\sum_{k \in T} exp(Z_{ik}'\alpha + \rho_k I_{ik})} \times \frac{exp(X_{itr}'\beta_t/\rho_t)}{\sum_{m \in R_t} exp( X_{itm}'\beta_t/\rho_t) }

where \rho_t is variously called a scale parameter, correlation parameter, etc. and defines the within branch correlation (arising from the GEV distribution). We also have the log-sum, which is also called the inclusive value:

I_{it} = log \left(  \sum_{m \in R_t} exp( X_{itm}'\beta_t/\rho_t)  \right).

Now we have our model setup, the log likelihood over all individuals is

\sum_{i=1}^N \sum_{k \in T} \sum_{m \in R_t} y_{itr} \left[ Z_{it}'\alpha + \rho_t I_{it} - log \left( \sum_{k \in T} exp(Z_{ik}'\alpha + \rho_k I_{ik}) \right) + X_{itr}'\beta_t/\rho_t - log \left(  \sum_{m \in R_t} exp( X_{itm}'\beta_t/\rho_t) \right) \right]

As a further note, for the model to be compatible with an ARUM specification, a number of conditions need to be satisfied. One of these is satisfied is 0<\rho_t \leq 1, so we will make that restriction. We have also only included alternative-varying variables, but we are often interested in individual varying variables and allowing parameters to vary over alternatives, which can be simply added to this specification, but we will leave them out for now to keep things “simple”. We will also use basic weakly informative priors and leave prior specification as a separate issue we won’t consider further:

\alpha \sim normal(0,5), \beta_t \sim normal(0,5), \rho_t \sim Uniform(0,1)


DISCLAIMER: This code is unlikely to be the most efficient, nor can I guarantee it is 100% correct – use at your peril!

The following assumes a familiarity with Stan and R.

Stan programs are divided into blocks including data, parameters, and model. The functions block allows us to define custom (log) probability density functions. These take a form something like:

real xxx_lpdf(real y, ...){}

which says that the function outputs a real valued variable and take a real valued variable, y, as one of its arguments. The _lpdf suffix allows the function to act as a density function in the program (and equivalently _lpmf for log probability mass functions for discrete variables). Now we just have to convert the log likelihood above into a function. But first, let’s just consider what data we will be passing to the program:

  • N, the number of observations;
  • T, the number of branches;
  • P, the number of branch-varying variables;
  • Q, the number of choice-varying variables;
  • R, a T x 1 vector with the number of choices in each branch, from which we can also derive the total number of options as sum(R). We will call the total number of options Rk for now;
  • Y, a N x Rk vector, where Y[i,j] = 1 if individual i=1,…,N chose choice j=1,…,Rk;
  • Z, a N x T x P array of branch-varying variables;
  • X, a N x Rk x Q array of choice-varying variables.

And the parameters:

  • \rho , a T x 1 vector of correlation parameters;
  • \alpha , a P x 1 vector of branch-level covariates;
  • \beta , a P x T matrix of choice-varying covariates.

Now, to develop the code, we will specify the function for individual observations of Y, rather than the whole matrix, and then perform the sum over all the individuals in the model block. So we only need to feed in each individual’s observations into the function rather than the whole data set. The model is specified in blocks as follows (with all the data and parameter as arguments to the function):

 real nlogit_lpdf(real[] y, real[,] Z, real[,] X, int[] R, 
   vector alpha, matrix beta, vector tau){
//first define our additional local variables
 real lprob; //variable to hold log prob
 int count1; //keep track of which option in the loops
 int count2; //keep track of which option in the loops
 vector[size(R)] I_val; //inclusive values
 real denom; //sum denominator of marginal model
//for the variables appearing in sum loops, set them to zero
 lprob = 0;
 count1 = 0;
 count2 = 0;
 denom = 0;
 // determine the log-sum for each conditional model, p_ir|t, 
 //i.e. inclusive value
 for(k in 1:size(R)){
    I_val[k] = 0;
    for(m in 1:R[k]){
       count1 = count1 + 1;
       I_val[k] = I_val[k] + exp(to_row_vector(X[count1,])*
          beta[,k] /tau[k]);
    I_val[k] = log(I_val[k]);
 //determine the sum for the marginal model, p_it, denomininator
 for(k in 1:size(R)){
    denom = denom + exp(to_row_vector(Z[k,])*alpha + tau[k]*I_val[k]);
 //put everything together in the log likelihood
 for(k in 1:size(R)){
    for(m in 1:R[k]){
       count2 = count2 + 1;
       lprob = lprob + y[count2]*(to_row_vector(Z[k,])*alpha + 
         tau[k]*I_val[k] - log(denom) + 
         to_row_vector(X[count2,])*beta[,k] - I_val[k]);
// return the log likelihood value
 return lprob;
 int N; //number of observations
 int T; //number of branches
 int R[T]; //number of options per branch
 int P; //dim of Z
 int Q; //dim of X
 real y[N,sum(R)]; //outcomes array
 real Z[N,T,P]; //branch-varying variables array
 real X[N,sum(R),Q]; //option-varying variables array
 vector<lower=0, upper=1>[T] rho; //scale-parameters
 vector[P] alpha; //branch-varying parameters
 matrix[Q,T] beta; //option-varying parameters
//specify priors
 for(p in 1:P) alpha[p] ~ normal(0,5); 
 for(q in 1:Q) for(t in 1:T) beta[q,t] ~ normal(0,5);

//loop over all observations with the data 
 for(i in 1:N){
    y[i] ~ nlogit(Z[i,,],X[i,,],R,alpha,beta,rho);

Simulation model

To see whether our model is doing what we’re hoping it’s doing, we can run a simple test with simulated data. It may be useful to compare the result we get to those from other estimators; the nested logit is most frequently estimated using the FIML estimator. But, neither Stata nor R provide packages that estimate a model with branch-varying variables – another reason why we sometimes need to program our own models.

The code we’ll use to simulate the data is:

#### simulate 2-level nested logit data ###

N <- 300 #number of people
P <- 2 #number of branch variant variables
Q <- 2 #number of option variant variables
R <- c(2,2,2) #vector with number of options per branch
T <- length(R) #number of branches
Rk <- sum(R) #number of options

#simulate data

Z <- array(rnorm(N*T*P,0,0.5),dim = c(N,T,P))
X <- array(rnorm(N*Rk*Q,0,0.5), dim = c(N,Rk,Q))

rho <- runif(3,0.5,1)
beta <- matrix(rnorm(T*Q,0,1),c(Q,T))
alpha <- rnorm(P,0,1)

#option models #change beta indexing as required
vals_opt <- cbind(exp(X[,1,]%*%beta[,1]/rho[1]),exp(X[,2,]%*%beta[,1]/rho[1]),exp(X[,3,]%*%beta[,2]/rho[2]),

incl_val <- cbind(vals_opt[,1]+vals_opt[,2],vals_opt[,3]+vals_opt[,4],vals_opt[,5]+vals_opt[,6])

vals_branch <- cbind(exp(Z[,1,]%*%alpha + rho[1]*log(incl_val[,1])),
 exp(Z[,2,]%*%alpha + rho[2]*log(incl_val[,2])),
 exp(Z[,3,]%*%alpha + rho[3]*log(incl_val[,3])))

sum_branch <- rowSums(vals_branch)

probs <- cbind((vals_opt[,1]/incl_val[,1])*(vals_branch[,1]/sum_branch),

Y = t(apply(probs, 1, rmultinom, n = 1, size = 1))

Then we’ll put the data into a list and run the Stan program with 500 iterations and 3 chains:

data <- list(
 y = Y,
 X = X,
 Z = Z,
 R = R,
 T = T,
 N = N,
 P = P,
 Q = Q

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

fit <- stan("C:/Users/Samuel/Dropbox/Code/nlogit.stan",
 data = data,
 chains = 3,
 iter = 500)

Which gives results (with 25t and 75th percentiles dropped to fit on screen):

> print(fit)
Inference for Stan model: nlogit.
3 chains, each with iter=500; warmup=250; thin=1; 
post-warmup draws per chain=250, total post-warmup draws=750.

             mean se_mean   sd    2.5%     50%     75%   97.5% n_eff Rhat
rho[1]       1.00    0.00 0.00    0.99    1.00    1.00    1.00   750  1
rho[2]       0.87    0.00 0.10    0.63    0.89    0.95    1.00   750  1
rho[3]       0.95    0.00 0.04    0.84    0.97    0.99    1.00   750  1
alpha[1]    -1.00    0.01 0.17   -1.38   -0.99   -0.88   -0.67   750  1
alpha[2]    -0.56    0.01 0.16   -0.87   -0.56   -0.45   -0.26   750  1
beta[1,1]   -3.65    0.01 0.32   -4.31   -3.65   -3.44   -3.05   750  1
beta[1,2]   -0.28    0.01 0.24   -0.74   -0.27   -0.12    0.15   750  1
beta[1,3]    0.99    0.01 0.25    0.48    0.98    1.15    1.52   750  1
beta[2,1]   -0.15    0.01 0.25   -0.62   -0.16    0.00    0.38   750  1
beta[2,2]    0.28    0.01 0.24   -0.16    0.28    0.44    0.75   750  1
beta[2,3]    0.58    0.01 0.24    0.13    0.58    0.75    1.07   750  1
lp__      -412.84    0.14 2.53 -418.56 -412.43 -411.05 -409.06   326  1

Samples were drawn using NUTS(diag_e) at Sun May 06 14:16:43 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Which we can compare to the original parameters:

> beta
           [,1]       [,2]      [,3]
[1,] -3.9381389 -0.3476054 0.7191652
[2,] -0.1182806  0.2736159 0.5237470
> alpha
[1] -0.9654045 -0.6505002
> rho
[1] 0.9503473 0.9950653 0.5801372

You can see that the posterior means and quantiles of the distribution provide pretty good estimates of the original parameters. Convergence diagnostics such as Rhat and traceplots (not reproduced here) show good convergence of the chains. But, of course, this is not enough for us to rely on it completely – you would want to investigate further to ensure that the chains were actually exploring the posterior of interest.


I am not aware of any examples in health economics of using custom likelihoods in Stan. There are not even many examples of Bayesian nested logit models, one exception being a paper by Lahiri and Gao, who ‘analyse’ the nested logit using MCMC. But given the limitations of MCMC discussed above, one should prefer this implementation in the post rather than the MCMC samplers of that paper. It’s also a testament to computing advances and Stan that in 2001 an MCMC sampler and analysis could fill a paper in a decent econometrics journal and now we can knock one out for a blog post.

In terms of nested logit models in general in health economics, there are many examples going back 30 years (e.g. this article from 1987). More recent papers have preferred “mixed” or “random parameters” logit or probit specifications, which are much more flexible than the nested logit. We would advise these sorts of models for this reason. The nested logit was used as an illustrative example of estimating custom likelihoods for this post.



Brent Gibbons’s journal round-up for 9th April 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The effect of Medicaid on management of depression: evidence from the Oregon Health Insurance Experiment. The Milbank Quarterly [PubMed] Published 5th March 2018

For the first journal article of this week’s AHE round-up, I selected a follow-up study on the Oregon health insurance experiment. The Oregon Health Insurance Experiment (OHIE) used a lottery system to expand Medicaid to low-income uninsured adults (and their associated households) who were previously ineligible for coverage. Those interested in being part of the study had to sign up. Individuals were then randomly selected through the lottery, after which individuals needed to take further action to complete enrollment in Medicaid, which included showing that enrollment criteria were satisfied (e.g. income below 100% of poverty line). These details are important because many who were selected for the lottery did not complete enrollment in Medicaid, though being selected through the lottery was associated with a 25 percentage point increase in the probability of having insurance (which the authors confirm was overwhelmingly due to Medicaid and not other insurance). More details on the study and data are publicly available. The OHIE is a seminal study in that it allows researchers to study the effects of having insurance in an experimental design – albeit in the U.S. health care system’s context. The other study that comes to mind is of course the famous RAND health insurance experiment that allowed researchers to study the effects of different levels of health insurance coverage. For the OHIE, the authors importantly point out that it is not necessarily obvious what the impact of having insurance is. While we would expect increases in health care utilization, it is possible that increases in primary care utilization could result in offsetting reductions in other settings (e.g. hospital or emergency department use). Also, while we would expect increases in health as a result of increases in health care use, it is possible that by reducing adverse financial consequences (e.g. of unhealthy behavior), health insurance could discourage investments in health. Medicaid has also been criticized by some as not very good insurance – though there are strong arguments to the contrary. First-year outcomes were detailed in another paper. These included increased health care utilization (across all settings), decreased out-of-pocket medical expenditures, decreased medical debt, improvements in self-reported physical and mental health, and decreased probability of screening positive for depression. In the follow-up paper on management of depression, the authors further explore the causal effect and causal pathway of having Medicaid on depression diagnosis, treatment, and symptoms. Outcomes of interest are the effect of having Medicaid on the prevalence of undiagnosed and untreated depression, the use of depression treatments including medication, and on self-reported depressive symptoms. Where possible, outcomes are examined for those with a prior depression diagnosis and those without. In order to examine the effect of Medicaid insurance (vs. being uninsured), the authors needed to control for the selection bias introduced from uncompleted enrollment into Medicaid. Instrumental variable 2SLS was used with lottery selection as the sole instrument. Local average treatment effects were reported with clustered standard errors on the household. The effect of Medicaid on the management of depression was overwhelmingly positive. For those with no prior depression diagnosis, it increased the chance of receiving a diagnosis and decreased the prevalence of undiagnosed depression (those who scored high on study survey depression instrument but with no official diagnosis). As far as treatment, Medicaid reduced the share of the population with untreated depression, virtually eliminating untreated depression among those with pre-lottery depression. There was a large reduction in unmet need for mental health treatment and an increased share who received specific mental health treatments (i.e. prescription drugs and talk therapy). For self-reported symptoms, Medicaid reduced the overall rate screened for depression symptoms in the post-lottery period. All effects were relatively strong in magnitude, giving an overall convincing picture that Medicaid increased access to treatment, which improved depression symptoms. The biggest limitation of this study is its generalizability. Much of the results were focused on the city of Portland, which may not represent more rural parts of the state. More importantly, this was limited to the state of Oregon for low-income adults who not only expressed interest in signing up, but who were able to follow through to complete enrollment. Other limitations were that the study only looked at the first two years of outcomes and that there was limited information on the types of treatments received.

Tobacco regulation and cost-benefit analysis: how should we value foregone consumer surplus? American Journal of Health Economics [PubMed] [RePEcPublished 23rd January 2018

This second article addresses a very interesting theoretical question in cost-benefit analysis, that has emerged in the context of tobacco regulation. The general question is how should foregone consumer surplus, in the form of reduced smoking, be valued? The history of this particular question in the context of recent FDA efforts to regulate smoking is quite fascinating. I highly recommend reading the article just for this background. In brief, the FDA issued proposed regulations to implement graphic warning labels on cigarettes in 2010 and more recently proposed that cigars and e-cigarettes should also be subject to FDA regulation. In both cases, an economic impact analysis was required and debates ensued on if, and how, foregone consumer surplus should be valued. Economists on both sides weighed-in, some arguing that the FDA should not consider foregone consumer surplus because smoking behavior is irrational, others arguing consumers are perfectly rational and informed and the full consumer surplus should be valued, and still others arguing that some consumer surplus should be counted but there is likely bounded rationality and that it is methodologically unclear how to perform a valuation in such a case. The authors helpfully break down the debate into the following questions: 1) if we assume consumers are fully informed and rational, what is the right approach? 2) are consumers fully informed and rational? and 3) if consumers are not fully informed and rational, what is the right approach? The reason the first question is important is that the FDA was conducting the economic impact analysis by examining health gains and foregone consumer surplus separately. However, if consumers are perfectly rational and informed, their preferences already account for health impacts, meaning that only changes in consumer surplus should be counted. On the second question, the authors explore the literature on smoking behavior to understand “whether consumers are rational in the sense of reflecting stable preferences that fully take into account the available information on current and expected future consequences of current choices.” In general, the literature shows that consumers are pretty well aware of the risks, though they may underestimate the difficulty of quitting. On whether consumers are rational is a much harder question. The authors explore different rational addiction models, including quasi-rational addiction models that take into account more recent developments in behavioral economics, but declare that the literature at this point provides no clear answer and that no empirical test exists to distinguish between rational and quasi-rational models. Without answering whether consumers are fully informed and rational, the authors suggest that welfare analysis – even in the face of bounded rationality – can still use a similar valuation approach to consumer surplus as was recommended for when consumers are fully informed and rational. A series of simple supply and demand curves are presented where there is a biased demand curve (demand under bounded rationality) and an unbiased demand curve (demand where fully informed and rational) and different regulations are illustrated. The implication is that rather than trying to estimate health gains as a result of regulations, what is needed is to understand the amount of demand bias as result of bounded rationality. Foregone consumer surplus can then be appropriately measured. Of course, more research is needed to estimate if, and how much, ‘demand bias’ or bounded rationality exists. The framework of the paper is extremely useful and it pushes health economists to consider advances that have been made in environmental economics to account for bounded rationality in cost-benefit analysis.

2SLS versus 2SRI: appropriate methods for rare outcomes and/or rare exposures. Health Economics [PubMed] Published 26th March 2018

This third paper I will touch on only briefly, but I wanted to include it as it addresses an important methodological topic. The paper explores several alternative instrumental variable estimation techniques for situations when the treatment (exposure) variable is binary, compared to the common 2SLS (two-stage least squares) estimation technique which was developed for a linear setting with continuous endogenous treatments and outcome measures. A more flexible approach, referred to as 2SRI (two-stage residual inclusion) allows for non-linear estimation methods in the first stage (and second stage), including logit or probit estimation methods. As the title suggests, these alternative estimation methods may be particularly useful when treatment (exposure) and/or outcomes are rare (e.g below 5%). Monte Carlo simulations are performed on what the authors term ‘the simplest case’ where the outcome, treatment, and instrument are binary variables and a range of results are considered as the treatment and/or outcome become rarer. Model bias and consistency are assessed in the ability to produce average treatment effects (ATEs) and local average treatment effects (LATEs), comparing the 2SLS, several forms of probit-probit 2SRI models, and a bivariate probit model. Results are that the 2SLS produced biased estimates of the ATE, especially as treatment and outcomes become rarer. The 2SRI models had substantially higher bias than the bivariate probit in producing ATEs (though the bivariate probit requires the assumption of bivariate normality). For LATE, 2SLS always produces consistent estimates, even if the linear probability model produces out of range predictions. Estimates for 2SRI models and the bivariate probit model were biased in producing LATEs. An empirical example was also tested with data on the impact of long-term care insurance on long-term care use. Conclusions are that 2SRI models do not dependably produce unbiased estimates of ATEs. Among the 2SRI models though, there were varying levels of bias and the 2SRI model with generalized residuals appeared to produce the least ATE bias. For more rare treatments and outcomes, the 2SRI model with Anscombe residuals generated the least ATE bias. Results were similar to another simulation study by Chapman and Brooks. The study enhances our understanding of how different instrumental variable estimation methods may function under conditions where treatment and outcome variables have nonlinear distributions and where those same treatments and outcomes are rare. In general, the authors give a cautionary note to say that there is not one perfect estimation method in these types of conditions and that researchers should be aware of the potential pitfalls of different estimation methods.



Thesis Thursday: Francesco Longo

On the third Thursday of every month, we speak to a recent graduate about their thesis and their studies. This month’s guest is Dr Francesco Longo who has a PhD from the University of York. If you would like to suggest a candidate for an upcoming Thesis Thursday, get in touch.

Essays on hospital performance in England
Luigi Siciliani
Repository link

What do you mean by ‘hospital performance’, and how is it measured?

The concept of performance in the healthcare sector covers a number of dimensions including responsiveness, affordability, accessibility, quality, and efficiency. A PhD does not normally provide enough time to investigate all these aspects and, hence, my thesis mostly focuses on quality and efficiency in the hospital sector. The concept of quality or efficiency of a hospital is also surprisingly broad and, as a consequence, perfect quality and efficiency measures do not exist. For example, mortality and readmissions are good clinical quality measures but the majority of hospital patients do not die and are not readmitted. How well does the hospital treat these patients? Similarly for efficiency: knowing that a hospital is more efficient because it now has lower costs is essential, but how is that hospital actually reducing costs? My thesis tries to answer also these questions by analysing various quality and efficiency indicators. For example, Chapter 3 uses quality measures such as overall and condition-specific mortality, overall readmissions, and patient-reported outcomes for hip replacement. It also uses efficiency indicators such as bed occupancy, cancelled elective operations, and cost indexes. Chapter 4 analyses additional efficiency indicators, such as admissions per bed, the proportion of day cases, and proportion of untouched meals.

You dedicated a lot of effort to comparing specialist and general hospitals. Why is this important?

The first part of my thesis focuses on specialisation, i.e. an organisational form which is supposed to generate greater efficiency, quality, and responsiveness but not necessarily lower costs. Some evidence from the US suggests that orthopaedic and surgical hospitals had 20 percent higher inpatient costs because of, for example, higher staffing levels and better quality of care. In the English NHS, specialist hospitals play an important role because they deliver high proportions of specialised services, commonly low-volume but high-cost treatments for patients with complex and rare conditions. Specialist hospitals, therefore, allow the achievement of a critical mass of clinical expertise to ensure patients receive specialised treatments that produce better health outcomes. More precisely, my thesis focuses on specialist orthopaedic hospitals which, for instance, provide 90% of bone and soft tissue sarcomas surgeries, and 50% of scoliosis treatments. It is therefore important to investigate the financial viability of specialist orthopaedic hospitals relative to general hospitals that undertake similar activities, under the current payment system. The thesis implements weighted least square regressions to compare profit margins between specialist and general hospitals. Specialist orthopaedic hospitals are found to have lower profit margins, which are explained by patient characteristics such as age and severity. This means that, under the current payment system, providers that generally attract more complex patients such as specialist orthopaedic hospitals may be financially disadvantaged.

In what way is your analysis of competition in the NHS distinct from that of previous studies?

The second part of my thesis investigates the effect of competition on quality and efficiency under two different perspectives. First, it explores whether under competitive pressures neighbouring hospitals strategically interact in quality and efficiency, i.e. whether a hospital’s quality and efficiency respond to neighbouring hospitals’ quality and efficiency. Previous studies on English hospitals analyse strategic interactions only in quality and they employ cross-sectional spatial econometric models. Instead, my thesis uses panel spatial econometric models and a cross-sectional IV model in order to make causal statements about the existence of strategic interactions among rival hospitals. Second, the thesis examines the direct effect of hospital competition on efficiency. The previous empirical literature has studied this topic by focusing on two measures of efficiency such as unit costs and length of stay measured at the aggregate level or for a specific procedure (hip and knee replacement). My thesis provides a richer analysis by examining a wider range of efficiency dimensions. It combines a difference-in-difference strategy, commonly used in the literature, with Seemingly Unrelated Regression models to estimate the effect of competition on efficiency and enhance the precision of the estimates. Moreover, the thesis tests whether the effect of competition varies for more or less efficient hospitals using an unconditional quantile regression approach.

Where should researchers turn next to help policymakers understand hospital performance?

Hospitals are complex organisations and the idea of performance within this context is multifaceted. Even when we focus on a single performance dimension such as quality or efficiency, it is difficult to identify a measure that could work as a comprehensive proxy. It is therefore important to decompose as much as possible the analysis by exploring indicators capturing complementary aspects of the performance dimension of interest. This practice is likely to generate findings that are readily interpretable by policymakers. For instance, some results from my thesis suggest that hospital competition improves efficiency by reducing admissions per bed. Such an effect is driven by a reduction in the number of beds rather than an increase in the number of admissions. In addition, competition improves efficiency by pushing hospitals to increase the proportion of day cases. These findings may help to explain why other studies in the literature find that competition decreases length of stay: hospitals may replace elective patients, who occupy hospital beds for one or more nights, with day case patients, who are instead likely to be discharged the same day of admission.