OHE Lunchtime Seminar: What Can NHS Trusts Do to Reduce Cancer Waiting Times?

OHE Lunchtime Seminar with Sarah Karlsberg, Steve Paling, and Júlia Esquerré on ‘What can NHS trusts do to reduce cancer waiting times?’ To be held on 14th November 2018 from 12 p.m. to 2 p.m.

Rapid diagnosis and access to treatment for cancer are vital for both clinical outcomes and patient experience of care. The NHS Constitution contains several waiting times targets, including that 85% of patients diagnosed with cancer should receive treatment within 62 days of referral. However, waiting times are increasing in England: the 62-day target has not been met since late 2013 and, in July 2018, the NHS recorded its worst performance since records began in October 2009.

This seminar will present evidence on where NHS trusts can take practical steps to reduce cancer waiting times. The work uses patient-level data (Hospital Episode Statistics) from 2016/17 and an econometric model to quantify the potential effects of several recommendations on the average length of patients’ cancer pathways. The project won the 2018 John Hoy Memorial Award for the best piece of economic analysis produced by government economists.

Sarah Karlsberg, Steven Paling, and Júlia González Esquerré work in the NHS Improvement Economics Team, which provides economics expertise to NHS Improvement (previously Monitor and the Trust Development Authority) and the provider sector. Their work covers all aspects of provider policy, including operational and financial performance, quality of care, leadership and strategic change. Sarah is also a Visiting Fellow at OHE.

Download the full seminar invite here.

The seminar will be held in the Sir Alexander Fleming Room, Southside, 7th Floor, 105 Victoria Street, London SW1E 6QT. A buffet lunch will be available from 12 p.m. The seminar will start promptly at 12:30 p.m. and finish promptly at 2 p.m.

If you would like to attend this seminar, please reply to ohegeneral@ohe.org.

Sam Watson’s journal round-up for 29th October 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Researcher Requests for Inappropriate Analysis and Reporting: A U.S. Survey of Consulting Biostatisticians. Annals of Internal Medicine. [PubMed] Published October 2018.

I have spent a fair bit of time masquerading as a statistician. While I frequently try to push for Bayesian analyses where appropriate, I have still had to do Frequentist work including power and sample size calculations. In principle these power calculations serve a good purpose: if the study is likely to produce very uncertain results it won’t contribute much to scientific knowledge and so won’t justify its cost. It can indicate that a two-arm trial would be preferred over a three-arm trial despite losing an important comparison. But many power analyses, I suspect, are purely for show; all that is wanted is the false assurance of some official looking statistics to demonstrate that a particular design is good enough. Now, I’ve never worked on economic evaluation, but I can imagine that the same pressures can sometimes exist to achieve a certain result. This study presents a survey of 400 US-based statisticians, which asks them how frequently they are asked to do some inappropriate analysis or reporting and to rate how egregious the request is. For example, the most severe request is thought to be to falsify statistical significance. But it includes common requests like to not show plots as they don’t reveal an effect as significant as thought, to downplay ‘insignificant’ findings, or to dress up post hoc power calculations as a priori analyses. I would think that those responding to this survey are less likely to be those who comply with such requests and the survey does not ask them if they did. But it wouldn’t be a big leap to suggest that there are those who do comply, career pressures being what they are. We already know that statistics are widely misused and misreported, especially p-values. Whether this is due to ignorance or malfeasance, I’ll let the reader decide.

Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results. Advances in Methods and Practices in Psychological Science. [PsyArXiv] Published August 2018.

Every data analysis requires a large number of decisions. From receiving the raw data, the analyst must decide what to do with missing or outlying values, which observations to include or exclude, whether any transformations of the data are required, how to code and combined categorical variables, how to define the outcome(s), and so forth. The consequence of each of these decisions leads to a different analysis, and if all possible analyses were enumerated there could be a myriad. Gelman and Loken called this the ‘garden of forking paths‘ after the short story by Jorge Luis Borges, who explored this idea. Gelman and Loken identify this as the source of the problem called p-hacking. It’s not that researchers are conducting thousands of analyses and publishing the one with the statistically significant result, but that each decision along the way may be favourable towards finding a statistically significant result. Do the outliers go against what you were hypothesising? Exclude them. Is there a nice long tail of the distribution in the treatment group? Don’t take logs.

This article explores the garden of forking paths by getting a number of analysts to try to answer the same question with the same data set. The question was, are darker skinned soccer players more likely to receive a red card that their lighter skinned counterparts? The data set provided had information on league, country, position, skin tone (based on subjective rating), and previous cards. Unsurprisingly there were a large range of results, with point estimates ranging from odds ratios of 0.89 to 2.93, with a similar range of standard errors. Looking at the list of analyses, I see a couple that I might have pursued, both producing vastly different results. The authors see this as demonstrating the usefulness of crowdsourcing analyses. At the very least it should be stark warning to any analyst to be transparent with every decision and to consider its consequences.

Front-Door Versus Back-Door Adjustment With Unmeasured Confounding: Bias Formulas for Front-Door and Hybrid Adjustments With Application to a Job Training Program. Journal of the American Statistical Association. Published October 2018.

Econometricians love instrumental variables. Without any supporting evidence, I would be willing to conjecture it is the most widely used type of analysis in empirical economic causal inference. When the assumptions are met it is a great tool, but decent instruments are hard to come by. We’ve covered a number of unconvincing applications on this blog where the instrument might be weak or not exogenous, and some of my own analyses have been criticised (rightfully) on these grounds. But, and we often forget, there are other causal inference techniques. One of these, which I think is unfamiliar to most economists, is the ‘front-door’ adjustment. Consider the following diagram:

frontdoorOn the right is the instrumental variable type causal model. Provided Z satisfies an exclusion restriction. i.e. independent of U, (and some other assumptions) it can be used to estimate the causal effect of A on Y. The front-door approach, on the left, shows a causal diagram where there is a post-treatment variable, M, unrelated to U, and which causes the outcome Y. Pearl showed that under a similar set of assumptions as instrumental variables, that the effect of A on Y was entirely mediated by M, and that there were no common causes of A and M or of M and Y, then M could be used to identify the causal effect of A on Y. This article discusses the front-door approach in the context of estimating the effect of a jobs training program (a favourite of James Heckman). The instrumental variable approach uses random assignment to the program, while the front-door analysis, in the absence of randomisation, uses program enrollment as its mediating variable. The paper considers the effect of the assumptions breaking down, and shows the front-door estimator to be fairly robust.



Method of the month: Shared parameter models

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is shared parameter models.


Missing data and data errors are an inevitability rather than a possibility. If these data were missing as a result of a random computer error, then there would be no problem, no bias would result in estimators of statistics from these data. But, this is probably not why they’re missing. People drop out of surveys and trials often because they choose to, if they move away, or worse if they die. The trouble with this is that those factors that influence these decisions and events are typically also those that affect the outcomes of interest in our studies, thus leading to bias. Unfortunately, missing data is often improperly dealt with. For example, a study of randomised controlled trials (RCTs) in the big four medical journals found that 95% had some missing data, and around 85% of those did not deal with it in a suitable way. An instructive article in the BMJ illustrated the potentially massive biases that dropout in RCTs can generate. Similar effects should be expected from dropout in panel studies and other analyses. Now, if the data are missing at random – i.e. the probability of missing data or dropout is independent of the data conditional on observed covariates – then we could base our inferences on just the observed data. But this is often not the case, so what do we do in these circumstances?


If we have a full set of data Y and a set of indicators for whether each observation is missing R, plus some parameters \theta and \phi, then we can factorise their joint distribution, f(Y,R;\theta,\phi) in three ways:

Selection model


Perhaps most familiar to econometricians, this factorisation involves the marginal distribution of the full data and the conditional distribution of missingness given the data. The Heckman selection model is an example of this factorisation. For example, one could specify a probit model for dropout and a normally distributed outcome, and then the full likelihood would involve the product of the two.

Pattern-mixture model


This approach specifies a marginal distribution for the missingness or dropout mechanism and then the distribution of the data differs according to the type of missingness or dropout. The data are a mixture of different patterns, i.e. distributions. This type of model is implied when non-response is not considered missing data per se, and we’re interested in inferences within each sub-population. For example, when estimating quality of life at a given age, the quality of life of those that have died is not of interest, but their dying can bias the estimates.

Shared parameter model


Now, the final way we can model these data posits unobserved variables, \alpha, conditional on which Y and R are independent. These models are most appropriate when the dropout or missingness is attributable to some underlying process changing over time, such as disease progression or household attitudes, or an unobserved variable, such as health status.

At the simplest level, one could consider two separate models with correlated random effects, for example, adding in covariates x and having a linear mixed model and probit selection model for person i at time t

Y_{it} = x_{it}'\theta + \alpha_{1,i} + u_{it}

R_{it} = \Phi(x_{it}'\theta + \alpha_{2,i})

(\alpha_{1,i},\alpha_{2,i}) \sim MVN(0,\Sigma) and u_{it} \sim N(0,\sigma^2)

so that the random effects are multivariate normally distributed.

A more complex and flexible specification for longitudinal settings would permit the random effects to vary over time, differently between models and individuals:

Y_{i}(t) = x_{i}(t)'\theta + z_{1,i} (t)\alpha_i + u_{it}

R_{i}(t) = G(x_{i}'\theta + z_{2,i} (t)\alpha_i)

\alpha_i \sim h(.) and u_{it} \sim N(0,\sigma^2)

As an example, if time were discrete in this model then z_{1,i} could be a series of parameters for each time period z_{1,i} = [\lambda_1,\lambda_2,...,\lambda_T], what are often referred to as ‘factor loadings’ in the structural equation modelling literature. We will run up against identifiability problems with these more complex models. For example, if the random effect was normally distributed i.e. \alpha_i \sim N(0,\sigma^2_\alpha) then we could multiply each factor loading by \rho and then \alpha_i \sim N(0,\sigma^2_\alpha / \rho^2) would give us an equivalent model. So, we would have to put restrictions on the parameters. We can set the variance of the random effect to be one, i.e. \alpha_i \sim N(0,1). We can also set one of the factor loadings to zero, without loss of generality, i.e. z_{1,i} = [0,...,\lambda_T].

The distributional assumptions about the random effects can have potentially large effects on the resulting inferences. It is possible therefore to non-parametrically model these as well – e.g. using a mixture distribution. Ultimately, these models are a useful method to deal with data that are missing not at random, such as informative dropout from panel studies.


Estimation can be tricky with these models given the need to integrate out the random effects. For frequentist inferences, expectation maximisation (EM) is one way of estimating these models, but as far as I’m aware the algorithm would have to be coded for the problem specifically in Stata or R. An alternative is using some kind of quadrature based method. The Stata package stjm fits shared parameter models for longitudinal and survival data, with similar specifications to those above.

Otherwise, Bayesian tools, such as Hamiltonian Monte Carlo, may have more luck dealing with the more complex models. For the simpler correlated random effects specification specified above one can use the stan_mvmer command in the rstanarm package. For more complex models, one would need to code the model in something like Stan.


For a health economics specific discussion of these types of models, one can look to the chapter Latent Factor and Latent Class Models to Accommodate Heterogeneity, Using Structural Equation in the Encyclopedia of Health Economics, although shared parameter models only get a brief mention. However, given that that book is currently on sale for £1,000, it may be beyond the wallet of the average researcher! Some health-related applications may be more helpful. Vonesh et al. (2011) used shared parameter models to look at the effects of diet and blood pressure control on renal disease progression. Wu and others (2011) look at how to model the effects of a ‘concomitant intervention’, which is one applied when a patient’s health status deteriorates and so is confounded with health, using shared parameter models. And, Baghfalaki and colleagues (2017) examine heterogeneous random effect specification for shared parameter models and apply this to HIV data.