Sam Watson’s journal round-up for 16th April 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The impact of NHS expenditure on health outcomes in England: alternative approaches to identification in all‐cause and disease specific models of mortality. Health Economics [PubMedPublished 2nd April 2018

Studies looking at the relationship between health care expenditure and patient outcomes have exploded in popularity. A recent systematic review identified 65 studies by 2014 on the topic – and recent experience from these journal round-ups suggests this number has increased significantly since then. The relationship between national spending and health outcomes is important to inform policy and health care budgets, not least through the specification of a cost-effectiveness threshold. Karl Claxton and colleagues released a big study looking at all the programmes of care in the NHS in 2015 purporting to estimate exactly this. I wrote at the time that: (i) these estimates are only truly an opportunity cost if the health service is allocatively efficient, which it isn’t; and (ii) their statistical identification method, in which they used a range of socio-economic variables as instruments for expenditure, was flawed as the instruments were neither strong determinants of expenditure nor (conditionally) independent of population health. I also noted that their tests would be unlikely to be any good to detect this problem. In response to the first, Tony O’Hagan commented to say that that they did not assume NHS efficiency, nor even that it was assumed that the NHS is trying to maximise health. This may well have been the case, but I would still, perhaps pedantically, argue then that this is therefore not an opportunity cost. For the question of instrumental variables, an alternative method was proposed by Martyn Andrews and co-authors, using information that feeds into the budget allocation formula as instruments for expenditure. In this new article, Claxton, Lomas, and Martin adopt Andrews’s approach and apply it across four key programs of care in the NHS to try to derive cost-per-QALY thresholds. First off, many of my original criticisms I would also apply to this paper, to which I’d also add one: (Statistical significance being used inappropriately complaint alert!!!) The authors use what seems to be some form of stepwise regression by including and excluding regressors on the basis of statistical significance – this is a big no-no and just introduces large biases (see this article for a list of reasons why). Beyond that, the instruments issue – I think – is still a problem, as it’s hard to justify, for example, an input price index (which translates to larger budgets) as an instrument here. It is certainly correlated with higher expenditure – inputs are more expensive in higher price areas after all – but this instrument won’t be correlated with greater inputs for this same reason. Thus, it’s the ‘wrong kind’ of correlation for this study. Needless to say, perhaps I am letting the perfect be the enemy of the good. Is this evidence strong enough to warrant a change in a cost-effectiveness threshold? My inclination would be that it is not, but that is not to deny it’s relevance to the debate.

Risk thresholds for alcohol consumption: combined analysis of individual-participant data for 599 912 current drinkers in 83 prospective studies. The Lancet Published 14th April 2018

“Moderate drinkers live longer” is the adage of the casual drinker as if to justify a hedonistic pursuit as purely pragmatic. But where does this idea come from? Studies that have compared risk of cardiovascular disease to level of alcohol consumption have shown that disease risk is lower in those that drink moderately compared to those that don’t drink. But correlation does not imply causation – non-drinkers might differ from those that drink. They may be abstinent after experiencing health issues related to alcohol, or be otherwise advised to not drink to protect their health. If we truly believed moderate alcohol consumption was better for your health than no alcohol consumption we’d advise people who don’t drink to drink. Moreover, if this relationship were true then there would be an ‘optimal’ level of consumption where any protective effect were maximised before being outweighed by the adverse effects. This new study pools data from three large consortia each containing data from multiple studies or centres on individual alcohol consumption, cardiovascular disease (CVD), and all-cause mortality to look at these outcomes among drinkers, excluding non-drinkers for the aforementioned reasons. Reading the methods section, it’s not wholly clear, if replicability were the standard, what was done. I believe that for each different database a hazard ratio or odds ratio for the risk of CVD or mortality for eight groups of alcohol consumption was estimated, these ratios were then subsequently pooled in a random-effects meta-analysis. However, it’s not clear to me why you would need to do this in two steps when you could just estimate a hierarchical model that achieves the same thing while also propagating any uncertainty through all the levels. Anyway, a polynomial was then fitted through the pooled ratios – again, why not just do this in the main stage and estimate some kind of hierarchical semi-parametric model instead of a three-stage model to get the curve of interest? I don’t know. The key finding is that risk generally increases above around 100g/week alcohol (around 5-6 UK glasses of wine per week), below which it is fairly flat (although whether it is different to non-drinkers we don’t know). However, the picture the article paints is complicated, risk of stroke and heart failure go up with increased alcohol consumption, but myocardial infarction goes down. This would suggest some kind of competing risk: the mechanism by which alcohol works increases your overall risk of CVD and your proportional risk of non-myocardial infarction CVD given CVD.

Family ruptures, stress, and the mental health of the next generation [comment] [reply]. American Economic Review [RePEc] Published April 2018

I’m not sure I will write out the full blurb again about studies of in utero exposure to difficult or stressful conditions and later life outcomes. There are a lot of them and they continue to make the top journals. Admittedly, I continue to cover them in these round-ups – so much so that we could write a literature review on the topic on the basis of the content of this blog. Needless to say, exposure in the womb to stressors likely increases the risk of low birth weight birth, neonatal and childhood disease, poor educational outcomes, and worse labour market outcomes. So what does this new study (and the comments) contribute? Firstly, it uses a new type of stressor – maternal stress caused by a death in the family and apparently this has a dose-response as stronger ties to the deceased are more stressful, and secondly, it looks at mental health outcomes of the child, which are less common in these sorts of studies. The identification strategy compares the effect of the death on infants who are in the womb to those infants who experience it shortly after birth. Herein lies the interesting discussion raised in the above linked comment and reply papers: in this paper the sample contains all births up to one year post birth and to be in the ‘treatment’ group the death had to have occurred between conception and the expected date of birth, so those babies born preterm were less likely to end up in the control group than those born after the expected date. This spurious correlation could potentially lead to bias. In the authors’ reply, they re-estimate their models by redefining the control group on the basis of expected date of birth rather than actual. They find that their estimates for the effect of their stressor on physical outcomes, like low birth weight, are much smaller in magnitude, and I’m not sure they’re clinically significant. For mental health outcomes, again the estimates are qualitatively small in magnitude, but remain similar to the original paper but this choice phrase pops up (Statistical significance being used inappropriately complaint alert!!!): “We cannot reject the null hypothesis that the mental health coefficients presented in panel C of Table 3 are statistically the same as the corresponding coefficients in our original paper.” Statistically the same! I can see they’re different! Anyway, given all the other evidence on the topic I don’t need to explain the results in detail – the methods discussion is far more interesting.


Method of the month: Coding qualitative data

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is coding qualitative data.


Health economists are increasingly stepping away from quantitative datasets and conducting interviews and focus groups, as well as collecting free text responses. Good qualitative analysis requires thought and rigour. In this blog post, I focus on coding of textual data – a fundamental part of analysis in nearly all qualitative studies. Many textbooks deal with this in detail. I have drawn on three in particular in this blog post (and my research): Coast (2017), Miles and Huberman (1994), and Ritchie and Lewis (2003).

Coding involves tagging segments of the text with salient words or short phrases. This assists the researcher with retrieving the data for further analysis and is, in itself, the first stage of analysing the data. Ultimately, the codes will feed into the final themes or model resulting from the research. So the codes – and the way they are applied – are important!


There is no ‘right way’ to code. However, I have increasingly found it useful to think of two phases of coding. First, ‘open coding’, which refers to the initial exploratory process of identifying pertinent phrases and concepts in the data. Second, formal or ‘axial’ coding, involving the application of a clear, pre-specified coding framework consistently across the source material.

Open coding

Any qualitative analysis should start with the researcher being very familiar with both the source material (such as interview transcripts) and the study objectives. This sounds obvious, but it is easy, as a researcher, to get drawn into the narrative of an interview and forget what exactly you are trying to get out of the research and, by extension, the coding. Open coding requires the researcher to go through the text, carefully, line-by-line, tagging segments with a code to denote its meaning. It is important to be inquisitive. What is being said? Does this relate to the research question and, if so, how?

Take, for example, the excerpt below from a speech by the Secretary of State for Health, Jeremy Hunt, on safety and efficiency in the NHS in 2015:

Let’s look at those challenges. And I think we have good news and bad news. If I start with the bad news it is that we face a triple whammy of huge financial pressures because of the deficit that we know we have to tackle as a country, of the ageing population that will mean we have a million more over 70s by 2020, and also of rising consumer expectations, the incredible excitement that people feel when they read about immunotherapy in the newspapers that gives a heart attack to me and Simon Stevens but is very very exciting for the country. The desire for 24/7 access to healthcare. These are expectations that we have to recognise in the NHS but all of these add to a massive pressure on the system.

This excerpt may be analysed, for example, as part of a study into demand pressures on the NHS. And, in this case, codes such as “ageing population” “consumer expectations” “immunotherapy” “24/7 access to healthcare” might initially be identified. However, if the study was investigating the nature of ministerial responsibility for the NHS, one might pull out very different codes, such as “tackle as a country”, “public demands vs. government stewardship” and “minister – chief exec shared responsibility”.

Codes can be anything – attitudes, behaviours, viewpoints – so long as they relate to the research question. It is very useful to get (at least) one other person to also code some of the same source material. Comparing codes will provide new ideas for the coding framework, a different perspective of the meaning of the source material and a check that key sections of the source material have not been missed. Researchers shouldn’t aim to code all (or even most) of the text of a transcript – there is always some redundancy. And, in general, initial codes should be as close to the source text as possible – some interpretation is fine but it is important to not get too abstract too quickly!

Formal or ‘axial’ coding

When the researcher has an initial list of codes, it is a good time to develop a formal coding framework. The aim here is to devise an index of some sort to tag all the data in a logical, systematic and comprehensive way, and in a way that will be useful for further analysis.

One way to start is to chart how the initial codes can be grouped and relate to one another. For example, in analysing NHS demand pressures, a researcher may group “immunotherapy” with other medical innovations mentioned elsewhere in the study. It’s important to avoid having many disconnected codes, and at this stage, many codes will be changed, subdivided, or combined. Much like an index, the resulting codes could be organised into loose chapters (or themes) such as “1. Consumer expectations”, “2. Access” and/or there might be a hierarchical relationship between codes, for example, with codes relating to national and local demand pressures. A proper axial coding framework has categories and sub-categories of codes with interdependencies formally specified.

There is no right number of codes. There could be as few as 10, or as many as 50, or more. It is crucial however that the list of codes are logically organised (not alphabetically listed) and sufficiently concise, so that the researcher can hold them in their head while coding transcripts. Alongside the coding framework itself – which may only be a page – it can be very helpful to put together an explanatory document with more detail on the meaning of each code and possibly some examples.


Once the formal coding framework is finalised it can be applied to the source material. I find this a good stage to use software like Nvivo. While coding in Nvivo takes a similar amount of time to paper-based methods, it can help speed up the process of retrieving and comparing segments of the text later on. Other software packages are available and some researchers prefer to use computer packages earlier in the process or not all – it is a personal choice.

Again, it is a good idea to involve at least one other person. One possibility is for two researchers to apply the framework separately and code the first, say 5 pages of a transcript. Reliability between coders can then be compared, with any discrepancies discussed and used to adjust the coding framework accordingly. The researchers could then repeat the process. Once reliability is at an acceptable level, a researcher should be able to code the transcripts in a much more reproducible way.

Even at this stage, the formal coding framework does not need to be set in stone. If it is based on a subset of interviews, new issues are likely to emerge in subsequent transcripts and these may need to be incorporated. Additionally, analyses may be conducted with sub-samples of participants or the analysis may move from more descriptive to explanatory work, and therefore the coding needs may change.


Published qualitative studies will often mention that transcript data were coded, with few details to discern how this was done. In the study I worked on to develop the ICECAP-A capability measure, we coded to identify influences on quality of life in the first batch of interviews and dimensions of quality of life in later batches of interviews. A recent study into disinvestment decisions highlights how a second rater can be used in coding. Reporting guidelines for qualitative research papers highlight three important items related to coding – number of coders, description of the coding tree (framework), and derivation of the themes – that ought to be included in study write-ups.

Coding qualitative data can feel quite laborious. However, the real benefit of a well organised coding framework comes when reconstituting transcript data under common codes or themes. Codes that relate clearly to the research question, and one another, allow the researcher to reorganise the data with real purpose. Juxtaposing previously unrelated text and quotes sparks the discovery of exciting new links in the data. In turn, this spawns the interpretative work that is the fundamental value of the qualitative analysis. In economics parlance, good coding can improve both the efficiency of retrieving text for analysis and the quality of the analytical output itself.


Method of the month: Synthetic control

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is synthetic control.


Health researchers are often interested in estimating the effect of a policy of change at the aggregate level. This might include a change in admissions policies at a particular hospital, or a new public health policy applied to a state or city. A common approach to inference in these settings is difference in differences (DiD) methods. Pre- and post-intervention outcomes in a treated unit are compared with outcomes in the same periods for a control unit. The aim is to estimate a counterfactual outcome for the treated unit in the post-intervention period. To do this, DiD assumes that the trend over time in the outcome is the same for both treated and control units.

It is often the case in practice that we have multiple possible control units and multiple time periods of data. To predict the post-intervention counterfactual outcomes, we can note that there are three sources of information: i) the outcomes in the treated unit prior to the intervention, ii) the behaviour of other time series predictive of that in the treated unit, including outcomes in similar but untreated units and exogenous predictors, and iii) prior knowledge of the effect of the intervention. The latter of these only really comes into play in Bayesian set-ups of this method. With longitudinal data we could just throw all this into a regression model and estimate the parameters. However, generally, this doesn’t allow for unobserved confounders to vary over time. The synthetic control method does.


Abadie, Diamond, and Haimueller motivate the synthetic control method using the following model:

y_{it} = \delta_t + \theta_t Z_i + \lambda_t \mu_i + \epsilon_{it}

where y_{it} is the outcome for unit i at time t, \delta_t are common time effects, Z_i are observed covariates with time-varying parameters \theta_t, \lambda_t are unobserved common factors with \mu_i as unobserved factor loadings, and \epsilon_{it} is an error term. Abadie et al show in this paper that one can derive a set of weights for the outcomes of control units that can be used to estimate the post-intervention counterfactual outcomes in the treated unit. The weights are estimated as those that would minimise the distance between the outcome and covariates in the treated unit and the weighted outcomes and covariates in the control units. Kreif et al (2016) extended this idea to multiple treated units.

Inference is difficult in this framework. So to produce confidence intervals, ‘placebo’ methods are proposed. The essence of this is to re-estimate the models, but using a non-intervention point in time as the intervention date to determine the frequency with which differences of a given order of magnitude are observed.

Brodersen et al take a different approach to motivating these models. They begin with a structural time-series model, which is a form of state-space model:

y_t = Z'_t \alpha_t + \epsilon_t

\alpha_{t+1} = T_t \alpha_t + R_t \eta_t

where in this case, y_t is the outcome at time t, \alpha_t is the state vector and Z_t is an output vector with \epsilon_t as an error term. The second equation is the state equation that governs the evolution of the state vector over time where T_t is a transition matrix, R_t is a diffusion matrix, and \eta_t is the system error.

From this setup, Brodersen et al expand the model to allow for control time series (e.g. Z_t = X'_t \beta), local linear time trends, seasonal components, and allowing for dynamic effects of covariates. In this sense the model is perhaps more flexible than that of Abadie et al. Not all of the large number of covariates may be necessary, so they propose a ‘slab and spike’ prior, which combines a point mass at zero with a weakly informative distribution over the non-zero values. This lets the data select the coefficients, as it were.

Inference in this framework is simpler than above. The posterior predictive distribution can be ‘simply’ estimated for the counterfactual time series to give posterior probabilities of differences of various magnitudes.



  • Synth Implements the method of Abadie et al.


  • Synth Implements the method of Abadie et al.
  • CausalImpact Implements the method of Brodersen et al.


Kreif et al (2016) estimate the effect of pay for performance schemes in hospitals in England and compare the synthetic control method to DiD. Pieters et al (2016) estimate the effects of democratic reform on under-five mortality. We previously covered this paper in a journal round-up and a subsequent post, for which we also used the Brodersen et al method described above. We recently featured a paper by Lépine et al (2017) in a discussion of user fees. The synthetic control method was used to estimate the impact that the removal of user fees had in various districts of Zambia on use of health care.