# Rita Faria’s journal round-up for 4th November 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The marginal benefits of healthcare spending in the Netherlands: estimating cost-effectiveness thresholds using a translog production function. Health Economics [PubMed] Published 30th August 2019

The marginal productivity of the healthcare sector or, as commonly known, the supply-side cost-effectiveness threshold, is a hot topic right now. A few years ago, we could only guess at the magnitude of health that was displaced by reimbursing expensive and not-that-beneficial drugs. Since the seminal work by Karl Claxton and colleagues, we have started to have a pretty good idea of what we’re giving up.

This paper by Niek Stadhouders and colleagues adds to this literature by estimating the marginal productivity of hospital care in the Netherlands. Spoiler alert: they estimated that hospital care generates 1 QALY for around €74,000 at the margin, with 95% confidence intervals ranging from €53,000 to €94,000. Remarkably, it’s close to the Dutch upper reference value for the cost-effectiveness threshold at €80,000!

The approach for estimation is quite elaborate because it required building QALYs and costs, and accounting for the effect of mortality on costs. The diagram in Figure 1 is excellent in explaining it. Their approach is different from the Claxton et al method, in that they corrected for the cost due to changes in mortality directly rather than via an instrumental variable analysis. To estimate the marginal effect of spending on health, they use a translog function. The confidence intervals are generated with Monte Carlo simulation and various robustness checks are presented.

This is a fantastic paper, which will be sure to have important policy implications. Analysts conducting cost-effectiveness analysis in the Netherlands, do take note.

Mixed-effects models for health care longitudinal data with an informative visiting process: a Monte Carlo simulation study. Statistica Neerlandica Published 5th September 2019

Electronic health records are the current big thing in health economics research, but they’re not without challenges. One issue is that the data reflects the clinical management, rather than a trial protocol. This means that doctors may test more severe patients more often. For example, people with higher cholesterol may get more frequent cholesterol tests. The challenge is that traditional methods for longitudinal data assume independence between observation times and disease severity.

Alessandro Gasparini and colleagues set out to solve this problem. They propose using inverse intensity of visit weighting within a mixed-methods model framework. Importantly, they provide a Stata package that includes the method. It’s part of the wide ranging and super-useful merlin package.

It was great to see how the method works with the directed acyclic graph. Essentially, after controlling for confounders, the longitudinal outcome and the observation process are associated through shared random effects. By assuming a distribution for the shared random effects, the model blocks the path between the outcome and the observation process. It makes it sound easy!

The paper goes through the method, compares it with other methods in the literature in a simulation study, and applies to a real case study. It’s a brilliant paper that deserves a close look by all of those using electronic health records.

Would you like to use a propensity score method but don’t know where to start? Look no further! This paper by Rishi Desai and Jessica Franklin provides a practical guide to propensity score methods.

They start by explaining what a propensity score is and how it can be used, from matching to reweighting and regression adjustment. I particularly enjoyed reading about the importance of conceptualising the target of inference, that is, what treatment effect are we trying to estimate. In the medical literature, it is rare to see a paper that is clear on whether it is average treatment effect or average treatment effect among the treated population.

I found the algorithm for method selection really useful. Here, Rishi and Jessica describe the steps in the choice of the propensity score method and recommend their preferred method for each situation. The paper also includes the application of each method to the example of dabigatran versus warfarin for atrial fibrillation. Thanks to the graphs, we can visualise how the distribution of the propensity score changes for each method and depending on the target of inference.

This is an excellent paper to those starting their propensity score analyses, or for those who would like a refresher. It’s a keeper!

Credits

# Sam Watson’s journal round-up for 10th September 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Probabilistic sensitivity analysis in cost-effectiveness models: determining model convergence in cohort models. PharmacoEconomics [PubMed] Published 27th July 2018

Probabilistic sensitivity analysis (PSA) is rightfully a required component of economic evaluations. Deterministic sensitivity analyses are generally biased; averaging the outputs of a model based on a choice of values from a complex joint distribution is not likely to be a good reflection of the true model mean. PSA involves repeatedly sampling parameters from their respective distributions and analysing the resulting model outputs. But how many times should you do this? Most times, an arbitrary number is selected that seems “big enough”, say 1,000 or 10,000. But these simulations themselves exhibit variance; so-called Monte Carlo error. This paper discusses making the choice of the number of simulations more formal by assessing the “convergence” of simulation output.

In the same way as sample sizes are chosen for trials, the number of simulations should provide an adequate level of precision, anything more wastes resources without improving inferences. For example, if the statistic of interest is the net monetary benefit, then we would want the confidence interval (CI) to exclude zero as this should be a sufficient level of certainty for an investment decision. The paper, therefore, proposed conducting a number of simulations, examining the CI for when it is ‘narrow enough’, and conducting further simulations if it is not. However, I see a problem with this proposal: the variance of a statistic from a sequence of simulations itself has variance. The stopping points at which we might check CI are themselves arbitrary: additional simulations can increase the width of the CI as well as reduce them. Consider the following set of simulations from a simple ratio of random variables $ICER = gamma(1,0.01)/normal(0.01,0.01)$:The “stopping rule” therefore proposed doesn’t necessarily indicate “convergence” as a few more simulations could lead to a wider, as well as narrower, CI. The heuristic approach is undoubtedly an improvement on the current way things are usually done, but I think there is scope here for a more rigorous method of assessing convergence in PSA.

Mortality due to low-quality health systems in the universal health coverage era: a systematic analysis of amenable deaths in 137 countries. The Lancet [PubMed] Published 5th September 2018

Richard Horton, the oracular editor-in-chief of the Lancet, tweeted last week:

There is certainly an argument that academic journals are good forums to make advocacy arguments. Who better to interpret the analyses presented in these journals than the authors and audiences themselves? But, without a strict editorial bulkhead between analysis and opinion, we run the risk that the articles and their content are influenced or dictated by the political whims of editors rather than scientific merit. Unfortunately, I think this article is evidence of that.

No-one debates that improving health care quality will improve patient outcomes and experience. It is in the very definition of ‘quality’. This paper aims to estimate the numbers of deaths each year due to ‘poor quality’ in low- and middle-income countries (LMICs). The trouble with this is two-fold: given the number of unknown quantities required to get a handle on this figure, the definition of quality notwithstanding, the uncertainty around this figure should be incredibly high (see below); and, attributing these deaths in a causal way to a nebulous definition of ‘quality’ is tenuous at best. The approach of the article is, in essence, to assume that the differences in fatality rates of treatable conditions between LMICs and the best performing health systems on Earth, among people who attend health services, are entirely caused by ‘poor quality’. This definition of quality would therefore seem to encompass low resourcing, poor supply of human resources, a lack of access to medicines, as well as everything else that’s different in health systems. Then, to get to this figure, the authors have multiple sources of uncertainty including:

• Using a range of proxies for health care utilisation;
• Using global burden of disease epidemiology estimates, which have associated uncertainty;
• A number of data slicing decisions, such as truncating case fatality rates;
• Estimating utilisation rates based on a predictive model;
• Estimating the case-fatality rate for non-users of health services based on other estimated statistics.

Despite this, the authors claim to estimate a 95% uncertainty interval with a width of only 300,000 people, with a mean estimate of 5.0 million, due to ‘poor quality’. This seems highly implausible, and yet it is claimed to be a causal effect of an undefined ‘poor quality’. The timing of this article coincides with the Lancet Commission on care quality in LMICs and, one suspects, had it not been for the advocacy angle on care quality, it would not have been published in this journal.

Embedding as a pitfall for survey‐based welfare indicators: evidence from an experiment. Journal of the Royal Statistical Society: Series A Published 4th September 2018

Health economists will be well aware of the various measures used to evaluate welfare and well-being. Surveys are typically used that are comprised of questions relating to a number of different dimensions. These could include emotional and social well-being or physical functioning. Similar types of surveys are also used to collect population preferences over states of the world or policy options, for example, Kahneman and Knetsch conducted a survey of WTP for different environmental policies. These surveys can exhibit what is called an ’embedding effect’, which Kahneman and Knetsch described as when the value of a good varies “depending on whether the good is assessed on its own or embedded as part of a more inclusive package.” That is to say that the way people value single dimensional attributes or qualities can be distorted when they’re embedded as part of a multi-dimensional choice. This article reports the results of an experiment involving students who were asked to weight the relative importance of different dimensions of the Better Life Index, including jobs, housing, and income. The randomised treatment was whether they rated ‘jobs’ as a single category, or were presented with individual dimensions, such as the unemployment rate and job security. The experiment shows strong evidence of embedding – the overall weighting substantially differed by treatment. This, the authors conclude, means that the Better Life Index fails to accurately capture preferences and is subject to manipulation should a researcher be so inclined – if you want evidence to say your policy is the most important, just change the way the dimensions are presented.

Credits

# Method of the month: custom likelihoods with Stan

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is custom likelihoods with Stan.

# Principles

Regular readers of this blog will know that I am a fan of Bayesian methods. The exponential growth in personal computing power has opened up a whole new range of Bayesian models at home. WinBUGS and JAGS were the go-to pieces of software for estimating Bayesian models, both using Markov Chain Monte Carlo (MCMC) methods.  Theoretically, an MCMC chain will explore the posterior distribution. But MCMC has flaws. For example, if the target distribution has a high degree of curvature, such as many hierarchical models might exhibit, then MCMC chains can have trouble exploring it. To compensate, the chains stay in the ‘difficult’ bit of the space for longer before leaving to go elsewhere so its average oscillates around the true value. Asymptotically, these oscillations balance out, but in real, finite time, they ultimately lead to bias. And further, MCMC chains are very slow to converge to the target distribution, and for complex models can take a literal lifetime. An alternative, Hamiltonian Monte Carlo (HMC), provides a solution to these issues. Michael Betancourt’s introduction to HCM is great for anyone interested in the topic.

Stan is a ‘probabilistic programming language’ that implements HMC. A huge range of probability distributions are already implemented in the software, check out the manual for more information. And there is an R package, rstanarm, that estimates a number of standard models using normal R code that even means you can use these tools without learning the code. However, Stan may not have the necessary distributions for more complex econometric or statistical models. It used to be the case that you would have to build your own MCMC sampler – but given the problems with MCMC, this is now strongly advised against in lieu of HMC. Fortunately, we can implement our own probability density functions in Stan. So, if you can write down the (log) likelihood for your model, you can estimate it in Stan!

The aim of this post is to provide an example of implementing a custom probability function in Stan from the likelihood of our model. We will look at the nested logit model. These models have been widely used for multinomial choice problems. An area of interest among health economists is the choice of hospital provider. A basic multinomial choice model, such as a multinomial logit, requires an independence of irrelevant alternatives (IIA) assumption that says the odds of choosing one option over another is independent of any other alternative. For example, it would assume that the odds of me choosing the pharmacy in town over the hospital in town would be unaffected by a pharmacy opening on my own road. This is likely too strong. There are many ways to relax this assumption, the nested logit being one. The nested logit is useful when choices can be ‘nested’ in groups and assumes there is a correlation among choices with each group. For example, we can group health care providers into pharmacies, primary care clinics, hospitals, etc. such as this:

# Implementation

## Econometric model

Firstly, we need a nesting structure for our choices, like that described above. We’ll consider a 2-level nesting structure, with branches and total choices, with Rt choices in each branch t. Like with most choice models we start from an additive random utility model, which is, for individual i=1,…,N, and with choice over branch and option:

$U_{itr} = V_{itr} + \epsilon_{itr}$

Then the chosen option is the one with the highest utility. The motivating feature of the nested logit is that the hierarchical structure allows us to factorise the joint probability of choosing branch and option r into a conditional and marginal model:

$p_{itr} = p_{it} \times p_{ir|t}$

Multinomial choice models arise when the errors are assumed to have a generalised extreme value (GEV) distribution, which gives use the multinomial logit model. We will model the deterministic part of the equation with branch-varying and option-varying variables:

$V_{itr} = Z_{it}'\alpha + X_{itr}'\beta_t$

Then the model can be written as:

$p_{itr} = p_{it} \times p_{ir|t} = \frac{exp(Z_{it}'\alpha + \rho_t I_{it})}{\sum_{k \in T} exp(Z_{ik}'\alpha + \rho_k I_{ik})} \times \frac{exp(X_{itr}'\beta_t/\rho_t)}{\sum_{m \in R_t} exp( X_{itm}'\beta_t/\rho_t) }$

where $\rho_t$ is variously called a scale parameter, correlation parameter, etc. and defines the within branch correlation (arising from the GEV distribution). We also have the log-sum, which is also called the inclusive value:

$I_{it} = log \left( \sum_{m \in R_t} exp( X_{itm}'\beta_t/\rho_t) \right)$.

Now we have our model setup, the log likelihood over all individuals is

$\sum_{i=1}^N \sum_{k \in T} \sum_{m \in R_t} y_{itr} \left[ Z_{it}'\alpha + \rho_t I_{it} - log \left( \sum_{k \in T} exp(Z_{ik}'\alpha + \rho_k I_{ik}) \right) + X_{itr}'\beta_t/\rho_t - log \left( \sum_{m \in R_t} exp( X_{itm}'\beta_t/\rho_t) \right) \right]$

As a further note, for the model to be compatible with an ARUM specification, a number of conditions need to be satisfied. One of these is satisfied is $0<\rho_t \leq 1$, so we will make that restriction. We have also only included alternative-varying variables, but we are often interested in individual varying variables and allowing parameters to vary over alternatives, which can be simply added to this specification, but we will leave them out for now to keep things “simple”. We will also use basic weakly informative priors and leave prior specification as a separate issue we won’t consider further:

$\alpha \sim normal(0,5), \beta_t \sim normal(0,5), \rho_t \sim Uniform(0,1)$

## Software

DISCLAIMER: This code is unlikely to be the most efficient, nor can I guarantee it is 100% correct – use at your peril!

The following assumes a familiarity with Stan and R.

Stan programs are divided into blocks including data, parameters, and model. The functions block allows us to define custom (log) probability density functions. These take a form something like:

real xxx_lpdf(real y, ...){}

which says that the function outputs a real valued variable and take a real valued variable, y, as one of its arguments. The _lpdf suffix allows the function to act as a density function in the program (and equivalently _lpmf for log probability mass functions for discrete variables). Now we just have to convert the log likelihood above into a function. But first, let’s just consider what data we will be passing to the program:

• N, the number of observations;
• T, the number of branches;
• P, the number of branch-varying variables;
• Q, the number of choice-varying variables;
• R, a T x 1 vector with the number of choices in each branch, from which we can also derive the total number of options as sum(R). We will call the total number of options Rk for now;
• Y, a N x Rk vector, where Y[i,j] = 1 if individual i=1,…,N chose choice j=1,…,Rk;
• Z, a N x T x P array of branch-varying variables;
• X, a N x Rk x Q array of choice-varying variables.

And the parameters:

• $\rho$, a T x 1 vector of correlation parameters;
• $\alpha$, a P x 1 vector of branch-level covariates;
• $\beta$, a P x T matrix of choice-varying covariates.

Now, to develop the code, we will specify the function for individual observations of Y, rather than the whole matrix, and then perform the sum over all the individuals in the model block. So we only need to feed in each individual’s observations into the function rather than the whole data set. The model is specified in blocks as follows (with all the data and parameter as arguments to the function):

functions{
real nlogit_lpdf(real[] y, real[,] Z, real[,] X, int[] R,
vector alpha, matrix beta, vector tau){
//first define our additional local variables
real lprob; //variable to hold log prob
int count1; //keep track of which option in the loops
int count2; //keep track of which option in the loops
vector[size(R)] I_val; //inclusive values
real denom; //sum denominator of marginal model
//for the variables appearing in sum loops, set them to zero
lprob = 0;
count1 = 0;
count2 = 0;
denom = 0;

// determine the log-sum for each conditional model, p_ir|t,
//i.e. inclusive value
for(k in 1:size(R)){
I_val[k] = 0;
for(m in 1:R[k]){
count1 = count1 + 1;
I_val[k] = I_val[k] + exp(to_row_vector(X[count1,])*
beta[,k] /tau[k]);
}
I_val[k] = log(I_val[k]);
}

//determine the sum for the marginal model, p_it, denomininator
for(k in 1:size(R)){
denom = denom + exp(to_row_vector(Z[k,])*alpha + tau[k]*I_val[k]);
}

//put everything together in the log likelihood
for(k in 1:size(R)){
for(m in 1:R[k]){
count2 = count2 + 1;
lprob = lprob + y[count2]*(to_row_vector(Z[k,])*alpha +
tau[k]*I_val[k] - log(denom) +
to_row_vector(X[count2,])*beta[,k] - I_val[k]);
}
}
// return the log likelihood value
return lprob;
}
}
data{
int N; //number of observations
int T; //number of branches
int R[T]; //number of options per branch
int P; //dim of Z
int Q; //dim of X
real y[N,sum(R)]; //outcomes array
real Z[N,T,P]; //branch-varying variables array
real X[N,sum(R),Q]; //option-varying variables array
}
parameters{
vector<lower=0, upper=1>[T] rho; //scale-parameters
vector[P] alpha; //branch-varying parameters
matrix[Q,T] beta; //option-varying parameters
}
model{
//specify priors
for(p in 1:P) alpha[p] ~ normal(0,5);
for(q in 1:Q) for(t in 1:T) beta[q,t] ~ normal(0,5);

//loop over all observations with the data
for(i in 1:N){
y[i] ~ nlogit(Z[i,,],X[i,,],R,alpha,beta,rho);
}
}

## Simulation model

To see whether our model is doing what we’re hoping it’s doing, we can run a simple test with simulated data. It may be useful to compare the result we get to those from other estimators; the nested logit is most frequently estimated using the FIML estimator. But, neither Stata nor R provide packages that estimate a model with branch-varying variables – another reason why we sometimes need to program our own models.

The code we’ll use to simulate the data is:

#### simulate 2-level nested logit data ###

N <- 300 #number of people
P <- 2 #number of branch variant variables
Q <- 2 #number of option variant variables
R <- c(2,2,2) #vector with number of options per branch
T <- length(R) #number of branches
Rk <- sum(R) #number of options

#simulate data

Z <- array(rnorm(N*T*P,0,0.5),dim = c(N,T,P))
X <- array(rnorm(N*Rk*Q,0,0.5), dim = c(N,Rk,Q))

#parameters
rho <- runif(3,0.5,1)
beta <- matrix(rnorm(T*Q,0,1),c(Q,T))
alpha <- rnorm(P,0,1)

#option models #change beta indexing as required
vals_opt <- cbind(exp(X[,1,]%*%beta[,1]/rho[1]),exp(X[,2,]%*%beta[,1]/rho[1]),exp(X[,3,]%*%beta[,2]/rho[2]),
exp(X[,4,]%*%beta[,2]/rho[2]),exp(X[,5,]%*%beta[,3]/rho[3]),exp(X[,6,]%*%beta[,3]/rho[3]))

incl_val <- cbind(vals_opt[,1]+vals_opt[,2],vals_opt[,3]+vals_opt[,4],vals_opt[,5]+vals_opt[,6])

vals_branch <- cbind(exp(Z[,1,]%*%alpha + rho[1]*log(incl_val[,1])),
exp(Z[,2,]%*%alpha + rho[2]*log(incl_val[,2])),
exp(Z[,3,]%*%alpha + rho[3]*log(incl_val[,3])))

sum_branch <- rowSums(vals_branch)

probs <- cbind((vals_opt[,1]/incl_val[,1])*(vals_branch[,1]/sum_branch),
(vals_opt[,2]/incl_val[,1])*(vals_branch[,1]/sum_branch),
(vals_opt[,3]/incl_val[,2])*(vals_branch[,2]/sum_branch),
(vals_opt[,4]/incl_val[,2])*(vals_branch[,2]/sum_branch),
(vals_opt[,5]/incl_val[,3])*(vals_branch[,3]/sum_branch),
(vals_opt[,6]/incl_val[,3])*(vals_branch[,3]/sum_branch))

Y = t(apply(probs, 1, rmultinom, n = 1, size = 1))

Then we’ll put the data into a list and run the Stan program with 500 iterations and 3 chains:

data <- list(
y = Y,
X = X,
Z = Z,
R = R,
T = T,
N = N,
P = P,
Q = Q
)

require(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

fit <- stan("C:/Users/Samuel/Dropbox/Code/nlogit.stan",
data = data,
chains = 3,
iter = 500)

Which gives results (with 25t and 75th percentiles dropped to fit on screen):

> print(fit)
Inference for Stan model: nlogit.
3 chains, each with iter=500; warmup=250; thin=1;
post-warmup draws per chain=250, total post-warmup draws=750.

mean se_mean   sd    2.5%     50%     75%   97.5% n_eff Rhat
rho[1]       1.00    0.00 0.00    0.99    1.00    1.00    1.00   750  1
rho[2]       0.87    0.00 0.10    0.63    0.89    0.95    1.00   750  1
rho[3]       0.95    0.00 0.04    0.84    0.97    0.99    1.00   750  1
alpha[1]    -1.00    0.01 0.17   -1.38   -0.99   -0.88   -0.67   750  1
alpha[2]    -0.56    0.01 0.16   -0.87   -0.56   -0.45   -0.26   750  1
beta[1,1]   -3.65    0.01 0.32   -4.31   -3.65   -3.44   -3.05   750  1
beta[1,2]   -0.28    0.01 0.24   -0.74   -0.27   -0.12    0.15   750  1
beta[1,3]    0.99    0.01 0.25    0.48    0.98    1.15    1.52   750  1
beta[2,1]   -0.15    0.01 0.25   -0.62   -0.16    0.00    0.38   750  1
beta[2,2]    0.28    0.01 0.24   -0.16    0.28    0.44    0.75   750  1
beta[2,3]    0.58    0.01 0.24    0.13    0.58    0.75    1.07   750  1
lp__      -412.84    0.14 2.53 -418.56 -412.43 -411.05 -409.06   326  1

Samples were drawn using NUTS(diag_e) at Sun May 06 14:16:43 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Which we can compare to the original parameters:

> beta
[,1]       [,2]      [,3]
[1,] -3.9381389 -0.3476054 0.7191652
[2,] -0.1182806  0.2736159 0.5237470
> alpha
[1] -0.9654045 -0.6505002
> rho
[1] 0.9503473 0.9950653 0.5801372

You can see that the posterior means and quantiles of the distribution provide pretty good estimates of the original parameters. Convergence diagnostics such as Rhat and traceplots (not reproduced here) show good convergence of the chains. But, of course, this is not enough for us to rely on it completely – you would want to investigate further to ensure that the chains were actually exploring the posterior of interest.

# Applications

I am not aware of any examples in health economics of using custom likelihoods in Stan. There are not even many examples of Bayesian nested logit models, one exception being a paper by Lahiri and Gao, who ‘analyse’ the nested logit using MCMC. But given the limitations of MCMC discussed above, one should prefer this implementation in the post rather than the MCMC samplers of that paper. It’s also a testament to computing advances and Stan that in 2001 an MCMC sampler and analysis could fill a paper in a decent econometrics journal and now we can knock one out for a blog post.

In terms of nested logit models in general in health economics, there are many examples going back 30 years (e.g. this article from 1987). More recent papers have preferred “mixed” or “random parameters” logit or probit specifications, which are much more flexible than the nested logit. We would advise these sorts of models for this reason. The nested logit was used as an illustrative example of estimating custom likelihoods for this post.

Credit