Method of the month: custom likelihoods with Stan

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is custom likelihoods with Stan.


Regular readers of this blog will know that I am a fan of Bayesian methods. The exponential growth in personal computing power has opened up a whole new range of Bayesian models at home. WinBUGS and JAGS were the go-to pieces of software for estimating Bayesian models, both using Markov Chain Monte Carlo (MCMC) methods.  Theoretically, an MCMC chain will explore the posterior distribution. But MCMC has flaws. For example, if the target distribution has a high degree of curvature, such as many hierarchical models might exhibit, then MCMC chains can have trouble exploring it. To compensate, the chains stay in the ‘difficult’ bit of the space for longer before leaving to go elsewhere so its average oscillates around the true value. Asymptotically, these oscillations balance out, but in real, finite time, they ultimately lead to bias. And further, MCMC chains are very slow to converge to the target distribution, and for complex models can take a literal lifetime. An alternative, Hamiltonian Monte Carlo (HMC), provides a solution to these issues. Michael Betancourt’s introduction to HCM is great for anyone interested in the topic.

Stan is a ‘probabilistic programming language’ that implements HMC. A huge range of probability distributions are already implemented in the software, check out the manual for more information. And there is an R package, rstanarm, that estimates a number of standard models using normal R code that even means you can use these tools without learning the code. However, Stan may not have the necessary distributions for more complex econometric or statistical models. It used to be the case that you would have to build your own MCMC sampler – but given the problems with MCMC, this is now strongly advised against in lieu of HMC. Fortunately, we can implement our own probability density functions in Stan. So, if you can write down the (log) likelihood for your model, you can estimate it in Stan!

The aim of this post is to provide an example of implementing a custom probability function in Stan from the likelihood of our model. We will look at the nested logit model. These models have been widely used for multinomial choice problems. An area of interest among health economists is the choice of hospital provider. A basic multinomial choice model, such as a multinomial logit, requires an independence of irrelevant alternatives (IIA) assumption that says the odds of choosing one option over another is independent of any other alternative. For example, it would assume that the odds of me choosing the pharmacy in town over the hospital in town would be unaffected by a pharmacy opening on my own road. This is likely too strong. There are many ways to relax this assumption, the nested logit being one. The nested logit is useful when choices can be ‘nested’ in groups and assumes there is a correlation among choices with each group. For example, we can group health care providers into pharmacies, primary care clinics, hospitals, etc. such as this:



Econometric model

Firstly, we need a nesting structure for our choices, like that described above. We’ll consider a 2-level nesting structure, with branches and total choices, with Rt choices in each branch t. Like with most choice models we start from an additive random utility model, which is, for individual i=1,…,N, and with choice over branch and option:

U_{itr} = V_{itr} + \epsilon_{itr}

Then the chosen option is the one with the highest utility. The motivating feature of the nested logit is that the hierarchical structure allows us to factorise the joint probability of choosing branch and option r into a conditional and marginal model:

p_{itr} = p_{it} \times p_{ir|t}

Multinomial choice models arise when the errors are assumed to have a generalised extreme value (GEV) distribution, which gives use the multinomial logit model. We will model the deterministic part of the equation with branch-varying and option-varying variables:

V_{itr} = Z_{it}'\alpha + X_{itr}'\beta_t

Then the model can be written as:

p_{itr} = p_{it} \times p_{ir|t} = \frac{exp(Z_{it}'\alpha + \rho_t I_{it})}{\sum_{k \in T} exp(Z_{ik}'\alpha + \rho_k I_{ik})} \times \frac{exp(X_{itr}'\beta_t/\rho_t)}{\sum_{m \in R_t} exp( X_{itm}'\beta_t/\rho_t) }

where \rho_t is variously called a scale parameter, correlation parameter, etc. and defines the within branch correlation (arising from the GEV distribution). We also have the log-sum, which is also called the inclusive value:

I_{it} = log \left(  \sum_{m \in R_t} exp( X_{itm}'\beta_t/\rho_t)  \right).

Now we have our model setup, the log likelihood over all individuals is

\sum_{i=1}^N \sum_{k \in T} \sum_{m \in R_t} y_{itr} \left[ Z_{it}'\alpha + \rho_t I_{it} - log \left( \sum_{k \in T} exp(Z_{ik}'\alpha + \rho_k I_{ik}) \right) + X_{itr}'\beta_t/\rho_t - log \left(  \sum_{m \in R_t} exp( X_{itm}'\beta_t/\rho_t) \right) \right]

As a further note, for the model to be compatible with an ARUM specification, a number of conditions need to be satisfied. One of these is satisfied is 0<\rho_t \leq 1, so we will make that restriction. We have also only included alternative-varying variables, but we are often interested in individual varying variables and allowing parameters to vary over alternatives, which can be simply added to this specification, but we will leave them out for now to keep things “simple”. We will also use basic weakly informative priors and leave prior specification as a separate issue we won’t consider further:

\alpha \sim normal(0,5), \beta_t \sim normal(0,5), \rho_t \sim Uniform(0,1)


DISCLAIMER: This code is unlikely to be the most efficient, nor can I guarantee it is 100% correct – use at your peril!

The following assumes a familiarity with Stan and R.

Stan programs are divided into blocks including data, parameters, and model. The functions block allows us to define custom (log) probability density functions. These take a form something like:

real xxx_lpdf(real y, ...){}

which says that the function outputs a real valued variable and take a real valued variable, y, as one of its arguments. The _lpdf suffix allows the function to act as a density function in the program (and equivalently _lpmf for log probability mass functions for discrete variables). Now we just have to convert the log likelihood above into a function. But first, let’s just consider what data we will be passing to the program:

  • N, the number of observations;
  • T, the number of branches;
  • P, the number of branch-varying variables;
  • Q, the number of choice-varying variables;
  • R, a T x 1 vector with the number of choices in each branch, from which we can also derive the total number of options as sum(R). We will call the total number of options Rk for now;
  • Y, a N x Rk vector, where Y[i,j] = 1 if individual i=1,…,N chose choice j=1,…,Rk;
  • Z, a N x T x P array of branch-varying variables;
  • X, a N x Rk x Q array of choice-varying variables.

And the parameters:

  • \rho , a T x 1 vector of correlation parameters;
  • \alpha , a P x 1 vector of branch-level covariates;
  • \beta , a P x T matrix of choice-varying covariates.

Now, to develop the code, we will specify the function for individual observations of Y, rather than the whole matrix, and then perform the sum over all the individuals in the model block. So we only need to feed in each individual’s observations into the function rather than the whole data set. The model is specified in blocks as follows (with all the data and parameter as arguments to the function):

 real nlogit_lpdf(real[] y, real[,] Z, real[,] X, int[] R, 
   vector alpha, matrix beta, vector tau){
//first define our additional local variables
 real lprob; //variable to hold log prob
 int count1; //keep track of which option in the loops
 int count2; //keep track of which option in the loops
 vector[size(R)] I_val; //inclusive values
 real denom; //sum denominator of marginal model
//for the variables appearing in sum loops, set them to zero
 lprob = 0;
 count1 = 0;
 count2 = 0;
 denom = 0;
 // determine the log-sum for each conditional model, p_ir|t, 
 //i.e. inclusive value
 for(k in 1:size(R)){
    I_val[k] = 0;
    for(m in 1:R[k]){
       count1 = count1 + 1;
       I_val[k] = I_val[k] + exp(to_row_vector(X[count1,])*
          beta[,k] /tau[k]);
    I_val[k] = log(I_val[k]);
 //determine the sum for the marginal model, p_it, denomininator
 for(k in 1:size(R)){
    denom = denom + exp(to_row_vector(Z[k,])*alpha + tau[k]*I_val[k]);
 //put everything together in the log likelihood
 for(k in 1:size(R)){
    for(m in 1:R[k]){
       count2 = count2 + 1;
       lprob = lprob + y[count2]*(to_row_vector(Z[k,])*alpha + 
         tau[k]*I_val[k] - log(denom) + 
         to_row_vector(X[count2,])*beta[,k] - I_val[k]);
// return the log likelihood value
 return lprob;
 int N; //number of observations
 int T; //number of branches
 int R[T]; //number of options per branch
 int P; //dim of Z
 int Q; //dim of X
 real y[N,sum(R)]; //outcomes array
 real Z[N,T,P]; //branch-varying variables array
 real X[N,sum(R),Q]; //option-varying variables array
 vector<lower=0, upper=1>[T] rho; //scale-parameters
 vector[P] alpha; //branch-varying parameters
 matrix[Q,T] beta; //option-varying parameters
//specify priors
 for(p in 1:P) alpha[p] ~ normal(0,5); 
 for(q in 1:Q) for(t in 1:T) beta[q,t] ~ normal(0,5);

//loop over all observations with the data 
 for(i in 1:N){
    y[i] ~ nlogit(Z[i,,],X[i,,],R,alpha,beta,rho);

Simulation model

To see whether our model is doing what we’re hoping it’s doing, we can run a simple test with simulated data. It may be useful to compare the result we get to those from other estimators; the nested logit is most frequently estimated using the FIML estimator. But, neither Stata nor R provide packages that estimate a model with branch-varying variables – another reason why we sometimes need to program our own models.

The code we’ll use to simulate the data is:

#### simulate 2-level nested logit data ###

N <- 300 #number of people
P <- 2 #number of branch variant variables
Q <- 2 #number of option variant variables
R <- c(2,2,2) #vector with number of options per branch
T <- length(R) #number of branches
Rk <- sum(R) #number of options

#simulate data

Z <- array(rnorm(N*T*P,0,0.5),dim = c(N,T,P))
X <- array(rnorm(N*Rk*Q,0,0.5), dim = c(N,Rk,Q))

rho <- runif(3,0.5,1)
beta <- matrix(rnorm(T*Q,0,1),c(Q,T))
alpha <- rnorm(P,0,1)

#option models #change beta indexing as required
vals_opt <- cbind(exp(X[,1,]%*%beta[,1]/rho[1]),exp(X[,2,]%*%beta[,1]/rho[1]),exp(X[,3,]%*%beta[,2]/rho[2]),

incl_val <- cbind(vals_opt[,1]+vals_opt[,2],vals_opt[,3]+vals_opt[,4],vals_opt[,5]+vals_opt[,6])

vals_branch <- cbind(exp(Z[,1,]%*%alpha + rho[1]*log(incl_val[,1])),
 exp(Z[,2,]%*%alpha + rho[2]*log(incl_val[,2])),
 exp(Z[,3,]%*%alpha + rho[3]*log(incl_val[,3])))

sum_branch <- rowSums(vals_branch)

probs <- cbind((vals_opt[,1]/incl_val[,1])*(vals_branch[,1]/sum_branch),

Y = t(apply(probs, 1, rmultinom, n = 1, size = 1))

Then we’ll put the data into a list and run the Stan program with 500 iterations and 3 chains:

data <- list(
 y = Y,
 X = X,
 Z = Z,
 R = R,
 T = T,
 N = N,
 P = P,
 Q = Q

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

fit <- stan("C:/Users/Samuel/Dropbox/Code/nlogit.stan",
 data = data,
 chains = 3,
 iter = 500)

Which gives results (with 25t and 75th percentiles dropped to fit on screen):

> print(fit)
Inference for Stan model: nlogit.
3 chains, each with iter=500; warmup=250; thin=1; 
post-warmup draws per chain=250, total post-warmup draws=750.

             mean se_mean   sd    2.5%     50%     75%   97.5% n_eff Rhat
rho[1]       1.00    0.00 0.00    0.99    1.00    1.00    1.00   750  1
rho[2]       0.87    0.00 0.10    0.63    0.89    0.95    1.00   750  1
rho[3]       0.95    0.00 0.04    0.84    0.97    0.99    1.00   750  1
alpha[1]    -1.00    0.01 0.17   -1.38   -0.99   -0.88   -0.67   750  1
alpha[2]    -0.56    0.01 0.16   -0.87   -0.56   -0.45   -0.26   750  1
beta[1,1]   -3.65    0.01 0.32   -4.31   -3.65   -3.44   -3.05   750  1
beta[1,2]   -0.28    0.01 0.24   -0.74   -0.27   -0.12    0.15   750  1
beta[1,3]    0.99    0.01 0.25    0.48    0.98    1.15    1.52   750  1
beta[2,1]   -0.15    0.01 0.25   -0.62   -0.16    0.00    0.38   750  1
beta[2,2]    0.28    0.01 0.24   -0.16    0.28    0.44    0.75   750  1
beta[2,3]    0.58    0.01 0.24    0.13    0.58    0.75    1.07   750  1
lp__      -412.84    0.14 2.53 -418.56 -412.43 -411.05 -409.06   326  1

Samples were drawn using NUTS(diag_e) at Sun May 06 14:16:43 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Which we can compare to the original parameters:

> beta
           [,1]       [,2]      [,3]
[1,] -3.9381389 -0.3476054 0.7191652
[2,] -0.1182806  0.2736159 0.5237470
> alpha
[1] -0.9654045 -0.6505002
> rho
[1] 0.9503473 0.9950653 0.5801372

You can see that the posterior means and quantiles of the distribution provide pretty good estimates of the original parameters. Convergence diagnostics such as Rhat and traceplots (not reproduced here) show good convergence of the chains. But, of course, this is not enough for us to rely on it completely – you would want to investigate further to ensure that the chains were actually exploring the posterior of interest.


I am not aware of any examples in health economics of using custom likelihoods in Stan. There are not even many examples of Bayesian nested logit models, one exception being a paper by Lahiri and Gao, who ‘analyse’ the nested logit using MCMC. But given the limitations of MCMC discussed above, one should prefer this implementation in the post rather than the MCMC samplers of that paper. It’s also a testament to computing advances and Stan that in 2001 an MCMC sampler and analysis could fill a paper in a decent econometrics journal and now we can knock one out for a blog post.

In terms of nested logit models in general in health economics, there are many examples going back 30 years (e.g. this article from 1987). More recent papers have preferred “mixed” or “random parameters” logit or probit specifications, which are much more flexible than the nested logit. We would advise these sorts of models for this reason. The nested logit was used as an illustrative example of estimating custom likelihoods for this post.



Sam Watson’s journal round-up for 30th April 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The Millennium Villages Project: a retrospective, observational, endline evaluation. The Lancet Global Health [PubMedPublished May 2018

There are some clinical researchers who would have you believe observational studies are completely useless. The clinical trial is king, they might say, observation studies are just too biased. And while it’s true that observational studies are difficult to do well and convincingly, they can be a reliable and powerful source of evidence. Similarly, randomised trials are frequently flawed, for example there’s often missing data that hasn’t been dealt with, or a lack of allocation concealment, and many researchers forget that randomisation does not guarantee a balance of covariates, it merely increases the probability of it. I bring this up, as this study is a particularly carefully designed observational data study that I think serves as a good example to other researchers. The paper is an evaluation of the Millennium Villages Project, an integrated intervention program designed to help rural villages across sub-Saharan Africa meet the Millennium Development Goals over ten years between 2005 and 2015. Initial before-after evaluations of the project were criticised for inferring causal “impacts” from before and after data (for example, this Lancet paper had to be corrected after some criticism). To address these concerns, this new paper is incredibly careful about choosing appropriate control villages against which to evaluate the intervention. Their method is too long to summarise here, but in essence they match intervention villages to other villages on the basis of district, agroecological zone, and a range of variables from the DHS – matches were they reviewed for face validity and revised until a satisfactory matching was complete. The wide range of outcomes are all scaled to a standard normal and made to “point” in the same direction, i.e. so an increase indicated economic development. Then, to avoid multiple comparisons problems, a Bayesian hierarchical model is used to pool data across countries and outcomes. Costs data were also reported. Even better, “statistical significance” is barely mentioned at all! All in all, a neat and convincing evaluation.

Reconsidering the income‐health relationship using distributional regression. Health Economics [PubMed] [RePEcPublished 19th April 2018

The relationship between health and income has long been of interest to health economists. But it is a complex relationship. Increases in income may change consumption behaviours and a change in the use of time, promoting health, while improvements to health may lead to increases in income. Similarly, people who are more likely to make higher incomes may also be those who look after themselves, or maybe not. Disentangling these various factors has generated a pretty sizeable literature, but almost all of the empirical papers in this area (and indeed all empirical papers in general) use modelling techniques to estimate the effect of something on the expected value, i.e. mean, of some outcome. But the rest of the distribution is of interest – the mean effect of income may not be very large, but a small increase in income for poorer individuals may have a relatively large effect on the risk of very poor health. This article looks at the relationship between income and the conditional distribution of health using something called “structured additive distribution regression” (SADR). My interpretation of SADR is that, one would model the outcome y ~ g(a,b) as being distributed according to some distribution g(.) indexed by parameters a and b, for example, a normal or Gamma distribution has two parameters. One would then specify a generalised linear model for a and b, e.g. a = f(X’B). I’m not sure this is a completely novel method, as people use the approach to, for example, model heteroscedasticity. But that’s not to detract from the paper itself. The findings are very interesting – increases to income have a much greater effect on health at the lower end of the spectrum.

Ask your doctor whether this product is right for you: a Bayesian joint model for patient drug requests and physician prescriptions. Journal of the Royal Statistical Society: Series C Published April 2018.

When I used to take econometrics tutorials for undergraduates, one of the sessions involved going through coursework about the role of advertising. To set the scene, I would talk about the work of Alfred Marshall, the influential economist from the late 1800s/early 1900s. He described two roles for advertising: constructive and combative. The former is when advertising grows the market as a whole, increasing everyone’s revenues, and the latter is when ads just steal market share from rivals without changing the size of the market. Later economists would go on to thoroughly develop theories around advertising, exploring such things as the power of ads to distort preferences, the supply of ads and their complementarity with the product they’re selling, or seeing ads as a source of consumer information. Nevertheless, Marshall’s distinction is still a key consideration, although often phrased in different terms. This study examines a lot of things, but one of its key objectives is to explore the role of direct to consumer advertising on prescriptions of brands of drugs. The system is clearly complex: drug companies advertise both to consumers and physicians, consumers may request the drug from the physician, and the physician may or may not prescribe it. Further, there may be correlated unobservable differences between physicians and patients, and the choice to advertise to particular patients may not be exogenous. The paper does a pretty good job of dealing with each of these issues, but it is dense and took me a couple of reads to work out what was going on, especially with the mix of Bayesian and Frequentist terms. Examining the erectile dysfunction drug market, the authors reckon that direct to consumer advertising reduces drug requests across the category, while increasing the proportion of requests for the advertised drug – potentially suggesting a “combative” role. However, it’s more complex than that patient requests and doctor’s prescriptions seem to be influenced by a multitude of factors.


Sam Watson’s journal round-up for 16th April 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The impact of NHS expenditure on health outcomes in England: alternative approaches to identification in all‐cause and disease specific models of mortality. Health Economics [PubMedPublished 2nd April 2018

Studies looking at the relationship between health care expenditure and patient outcomes have exploded in popularity. A recent systematic review identified 65 studies by 2014 on the topic – and recent experience from these journal round-ups suggests this number has increased significantly since then. The relationship between national spending and health outcomes is important to inform policy and health care budgets, not least through the specification of a cost-effectiveness threshold. Karl Claxton and colleagues released a big study looking at all the programmes of care in the NHS in 2015 purporting to estimate exactly this. I wrote at the time that: (i) these estimates are only truly an opportunity cost if the health service is allocatively efficient, which it isn’t; and (ii) their statistical identification method, in which they used a range of socio-economic variables as instruments for expenditure, was flawed as the instruments were neither strong determinants of expenditure nor (conditionally) independent of population health. I also noted that their tests would be unlikely to be any good to detect this problem. In response to the first, Tony O’Hagan commented to say that that they did not assume NHS efficiency, nor even that it was assumed that the NHS is trying to maximise health. This may well have been the case, but I would still, perhaps pedantically, argue then that this is therefore not an opportunity cost. For the question of instrumental variables, an alternative method was proposed by Martyn Andrews and co-authors, using information that feeds into the budget allocation formula as instruments for expenditure. In this new article, Claxton, Lomas, and Martin adopt Andrews’s approach and apply it across four key programs of care in the NHS to try to derive cost-per-QALY thresholds. First off, many of my original criticisms I would also apply to this paper, to which I’d also add one: (Statistical significance being used inappropriately complaint alert!!!) The authors use what seems to be some form of stepwise regression by including and excluding regressors on the basis of statistical significance – this is a big no-no and just introduces large biases (see this article for a list of reasons why). Beyond that, the instruments issue – I think – is still a problem, as it’s hard to justify, for example, an input price index (which translates to larger budgets) as an instrument here. It is certainly correlated with higher expenditure – inputs are more expensive in higher price areas after all – but this instrument won’t be correlated with greater inputs for this same reason. Thus, it’s the ‘wrong kind’ of correlation for this study. Needless to say, perhaps I am letting the perfect be the enemy of the good. Is this evidence strong enough to warrant a change in a cost-effectiveness threshold? My inclination would be that it is not, but that is not to deny it’s relevance to the debate.

Risk thresholds for alcohol consumption: combined analysis of individual-participant data for 599 912 current drinkers in 83 prospective studies. The Lancet Published 14th April 2018

“Moderate drinkers live longer” is the adage of the casual drinker as if to justify a hedonistic pursuit as purely pragmatic. But where does this idea come from? Studies that have compared risk of cardiovascular disease to level of alcohol consumption have shown that disease risk is lower in those that drink moderately compared to those that don’t drink. But correlation does not imply causation – non-drinkers might differ from those that drink. They may be abstinent after experiencing health issues related to alcohol, or be otherwise advised to not drink to protect their health. If we truly believed moderate alcohol consumption was better for your health than no alcohol consumption we’d advise people who don’t drink to drink. Moreover, if this relationship were true then there would be an ‘optimal’ level of consumption where any protective effect were maximised before being outweighed by the adverse effects. This new study pools data from three large consortia each containing data from multiple studies or centres on individual alcohol consumption, cardiovascular disease (CVD), and all-cause mortality to look at these outcomes among drinkers, excluding non-drinkers for the aforementioned reasons. Reading the methods section, it’s not wholly clear, if replicability were the standard, what was done. I believe that for each different database a hazard ratio or odds ratio for the risk of CVD or mortality for eight groups of alcohol consumption was estimated, these ratios were then subsequently pooled in a random-effects meta-analysis. However, it’s not clear to me why you would need to do this in two steps when you could just estimate a hierarchical model that achieves the same thing while also propagating any uncertainty through all the levels. Anyway, a polynomial was then fitted through the pooled ratios – again, why not just do this in the main stage and estimate some kind of hierarchical semi-parametric model instead of a three-stage model to get the curve of interest? I don’t know. The key finding is that risk generally increases above around 100g/week alcohol (around 5-6 UK glasses of wine per week), below which it is fairly flat (although whether it is different to non-drinkers we don’t know). However, the picture the article paints is complicated, risk of stroke and heart failure go up with increased alcohol consumption, but myocardial infarction goes down. This would suggest some kind of competing risk: the mechanism by which alcohol works increases your overall risk of CVD and your proportional risk of non-myocardial infarction CVD given CVD.

Family ruptures, stress, and the mental health of the next generation [comment] [reply]. American Economic Review [RePEc] Published April 2018

I’m not sure I will write out the full blurb again about studies of in utero exposure to difficult or stressful conditions and later life outcomes. There are a lot of them and they continue to make the top journals. Admittedly, I continue to cover them in these round-ups – so much so that we could write a literature review on the topic on the basis of the content of this blog. Needless to say, exposure in the womb to stressors likely increases the risk of low birth weight birth, neonatal and childhood disease, poor educational outcomes, and worse labour market outcomes. So what does this new study (and the comments) contribute? Firstly, it uses a new type of stressor – maternal stress caused by a death in the family and apparently this has a dose-response as stronger ties to the deceased are more stressful, and secondly, it looks at mental health outcomes of the child, which are less common in these sorts of studies. The identification strategy compares the effect of the death on infants who are in the womb to those infants who experience it shortly after birth. Herein lies the interesting discussion raised in the above linked comment and reply papers: in this paper the sample contains all births up to one year post birth and to be in the ‘treatment’ group the death had to have occurred between conception and the expected date of birth, so those babies born preterm were less likely to end up in the control group than those born after the expected date. This spurious correlation could potentially lead to bias. In the authors’ reply, they re-estimate their models by redefining the control group on the basis of expected date of birth rather than actual. They find that their estimates for the effect of their stressor on physical outcomes, like low birth weight, are much smaller in magnitude, and I’m not sure they’re clinically significant. For mental health outcomes, again the estimates are qualitatively small in magnitude, but remain similar to the original paper but this choice phrase pops up (Statistical significance being used inappropriately complaint alert!!!): “We cannot reject the null hypothesis that the mental health coefficients presented in panel C of Table 3 are statistically the same as the corresponding coefficients in our original paper.” Statistically the same! I can see they’re different! Anyway, given all the other evidence on the topic I don’t need to explain the results in detail – the methods discussion is far more interesting.