Thesis Thursday: Andrea Gabrio

On the third Thursday of every month, we speak to a recent graduate about their thesis and their studies. This month’s guest is Dr Andrea Gabrio who has a PhD from University College London. If you would like to suggest a candidate for an upcoming Thesis Thursday, get in touch.

Full Bayesian methods to handle missing data in health economic evaluation
Gianluca Baio, Alexina Mason, Rachael Hunter
Repository link

What kind of assumptions about missing data are made in trial-based economic evaluations?

In any analysis, assumptions about the missing values are always made, about those values which are not observed. Since the final results may depend on these assumptions, it is important that they are as plausible as possible within the context considered. For example, in trial-based economic evaluations, missing values often occur when data are collected through self-reported patient questionnaires and in many cases it is plausible that patients with unobserved responses are different from the others (e.g. have worse health states). In general, it is very important that a range of plausible scenarios (defined according to the available information) are considered, and that the robustness of our conclusions across them is assessed in sensitivity analysis. Often, however, analysts prefer to ignore this uncertainty and rely on ‘default’ approaches (e.g. remove the missing data from the analysis) which implicitly make unrealistic assumptions and possibly lead to biased results. For a more in-depth overview of current practice, I refer to my published review.

Given that any assumption about the missing values cannot be checked from the data at hand, an ideal approach to handle missing data should combine a well-defined model for the observed data and explicit assumptions about missingness.

What do you mean by ‘full Bayesian’?

The term ‘full Bayesian’ is a technicality and typically indicates that, in the Bayesian analysis, the prior distributions are freely specified by the analyst, rather than being based on the data (e.g. ’empirical Bayesian’). Being ‘fully’ Bayesian has some key advantages for handling missingness compared to other approaches, especially in small samples. First, a flexible choice of the priors may help to stabilise inference and avoid giving too much weight to implausible parameter values. Second, external information about missingness (e.g. expert opinion) can be easily incorporated into the model through the priors. This is essential when performing sensitivity analysis to missingness, as it allows assessment of the robustness of the results to a range of assumptions, with the uncertainty of any unobserved quantity (parameters or missing data) being fully propagated and quantified in the posterior distribution.

How did you use case studies to support the development of your methods?

In my PhD I had access to economic data from two small trials, which were characterised by considerable amounts of missing outcome values and which I used as motivating examples to implement my methods. In particular, individual-level economic data are characterised by a series of complexities that make it difficult to justify the use of more ‘standardised’ methods and which, if not taken into account, may lead to biased results.

Examples of these include the correlation between effectiveness and costs, the skewness in the empirical distributions of both outcomes, the presence of identical values for many individuals (e.g. excess zeros or ones), and, on top of that, missingness. In many cases, the implementation of methods to handle these issues is not straightforward, especially when multiple types of complexities affect the data.

The flexibility of the Bayesian framework allows the specification of a model whose level of complexity can be increased in a relatively easy way to handle all these problems simultaneously, while also providing a natural way to perform probabilistic sensitivity analysis. I refer to my published work to see an example of how Bayesian models can be implemented to handle trial-based economic data.

How does your framework account for longitudinal data?

Since the data collected within a trial have a longitudinal nature (i.e. collected at different times), it is important that any missingness methods for trial-based economic evaluations take into account this feature. I therefore developed a Bayesian parametric model for a bivariate health economic longitudinal response which, together with accounting for the typical complexities of the data (e.g. skewness), can be fitted to all the effectiveness and cost variables in a trial.

Time dependence between the responses is formally taken into account by means of a series of regressions, where each variable can be modelled conditionally on other variables collected at the same or at previous time points. This also offers an efficient way to handle missingness, as the available evidence at each time is included in the model, which may provide valuable information for imputing the missing data and therefore improve the confidence in the final results. In addition, sensitivity analysis to a range of missingness assumptions can be performed using a ‘pattern mixture’ approach. This allows the identification of certain parameters, known as sensitivity parameters, on which priors can be specified to incorporate external information and quantify its impact on the conclusions. A detailed description of the longitudinal model and the missing data analyses explored is also available online.

Are your proposed methods easy to implement?

Most of the methods that I developed in my project were implemented in JAGS, a software specifically designed for the analysis of Bayesian models using Markov Chain Monte Carlo simulation. Like other Bayesian software (e.g. OpenBUGS and STAN), JAGS is freely available and can be interfaced with different statistical programs, such as R, SAS, Stata, etc. Therefore, I believe that, once people are willing to overcome the initial barrier of getting familiar with a new software language, these programs provide extremely powerful tools to implement Bayesian methods. Although in economic evaluations analysts are typically more familiar with frequentist methods (e.g. multiple imputations), it is clear that as the complexity of the analysis increases, the implementation of these methods would require tailor-made routines for the optimisation of non-standard likelihood functions, and a full Bayesian approach is likely to be a preferable option as it naturally allows the propagation of uncertainty to the wider economic model and to perform sensitivity analysis.

Jason Shafrin’s journal round-up for 9th September 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Price effects of a hospital merger: heterogeneity across health insurers, hospital products, and hospital locations. Health Economics [PubMed] [RePEc] Published 1st July 2019

Most economics literature indicates that hospital mergers typically result in higher prices. But what does higher prices mean? Does it mean higher prices for all services? Higher prices for all health insurers?

Many economic models assume that hospitals charge a standard base rate and charges for individuals’ procedures are a fixed ratio of the base across all hospitals. This approach would make sense in a DRG-based system where prices are proportional to the product of a hospital’s base rate and the Medicare Severity DRG specific weight for a given hospitalization.

In practice, however, it is possible for prices to vary across procedures, across different negotiated contracts with insurers, and even across different locations within the same hospital system. For instance, the economic theory in this paper shows that the effect of a hospital merger increases prices most when an insurer’s bargaining power is high. Why? Because if the insurer had weak bargaining power, the hospital already would have high prices; the marginal impact is only felt when insurers had market power to begin with. Another interesting theoretical prediction is that if substitution between hospitals is stronger for service A than service B, prices will increase more for the former product, since the merger decreases the ability of consumers to substitute across hospitals due to decreased supply.

In their empirical applications, the authors use a comprehensive nationwide patient‐level data set from the Netherlands, on hospital admissions and prices. The study looks at three separate services: hip replacement, knee replacement, and cataract surgery. They use a difference-in-difference approach to measure the impact of a merger on prices for different services and across payers.

Although the authors did replicate earlier findings and showed that prices generally rise after a merger, the authors found significant heterogeneity. For instance, prices rose for hip replacements but not for knee replacements or cataracts. Prices rose for four health insurers but not for a fifth. In short, while previous findings about average prices still hold, in the real world, the price impact is much more heterogeneous than previous models would predict.

The challenges of universal health insurance in developing countries: evidence from a large-scale randomized experiment in Indonesia. NBER Working Paper [RePEc] Published August 2019

In 2014, the Indonesian government launched Jaminan Kesehatan Nasional (JKN), a national, contributory health insurance program that aimed to provide universal health coverage by 2019. The program requires individuals to pay premiums for coverage but there is an insurance mandate. JKN, however, faced two key challenges: low enrollment and high cost. Only 20% of eligible individuals enrolled. Further, the claims paid exceeded premiums received by a factor of more than 6 to 1.

This working paper by Banerjee et al describes a large-scale, multi-arm experiment to examine three interventions to potentially address these issues. The interventions included: (i) premium subsidy, (ii) transaction cost reduction, and (iii) information dissemination. For the first intervention, individuals received either 50% or 100% premium subsidy if they signed up within a limited time frame. For the second intervention, households received at-home assistance to enroll in plans through the online registration system (rather than traveling to a distant insurance office to enroll). For the third intervention, the authors randomized some individuals to receive various informational items. The real benefit of this study is that people were randomized to these different interventions.

Using this study design, the authors found that premium assistance did increase enrollment. Further, premium assistance did not affect per person costs since the individuals who enrolled were healthier on average. Thus, the fear that subsidies would increase adverse selection was unfounded. The authors also found that offering help in registering for insurance increased enrollment. Thus, it appears that the ‘hassle cost’ of signing up for a government program represents a real hassle with tangible implications. However, the additional insurance information provided had no effect on enrollment.

These results are both encouraging and discouraging. Premium subsidies work and do not drive up cost per person. However, enrollment levels – even with a 100% premium subsidy and assistance registering for insurance – were only at 30%. This figure is far better than the baseline figure of 8%, but far from the ‘universal’ coverage envisioned by the creators of JKN.


Method of the month: constrained randomisation

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is constrained randomisation.


Randomised experimental studies are one of the best ways of estimating the causal effects of an intervention. They have become more and more widely used in economics; Banerjee and Duflo are often credited with popularising them among economists. When done well, randomly assigning a treatment ensures both observable and unobservable factors are independent of treatment status and likely to be balanced between treatment and control units.

Many of the interventions economists are interested in are at a ‘cluster’ level, be it a school, hospital, village, or otherwise. So the appropriate experimental design would be a cluster randomised controlled trial (cRCT), in which the clusters are randomised to treatment or control and individuals within each cluster are observed either cross-sectionally or longitudinally. But, except in cases of large budgets, the number of clusters participating can be fairly small. When randomising a relatively small number of clusters we could by chance end up with a quite severe imbalance in key covariates between trial arms. This presents a problem if we suspect a priori that these covariates have an influence on key outcomes.

One solution to the problem of potential imbalance is covariate-based constrained randomisation. The principle here is to conduct a large number of randomisations, assess the balance of covariates in each one using some balance metric, and then to randomly choose one of the most balanced according to this metric. This method preserves the important random treatment assignment while ensuring covariate balance. Stratified randomisation also has a similar goal, but in many cases may not be possible if there are continuous covariates of interest or too few clusters to distribute among many strata.


Conducting covariate constrained randomisation is straightforward and involves the following steps:

  1. Specifying the important baseline covariates to balance the clusters on. For each cluster j we have L covariates x_{il}; l=1,...L.
  2. Characterising each cluster in terms of these covariates, i.e. creating the x_{il}.
  3. Enumerating all potential randomisation schemes or simulating a large number of them. For each one, we will need to measure the balance of the x_{il} between trial arms.
  4. Selecting a candidate set of randomisation schemes that are sufficiently balanced according to some pre-specified criterion from which we can randomly choose our treatment allocation.

Balance scores

A key ingredient in the above steps is the balance score. This score needs to be some univariate measure of potentially multivariate imbalance between two (or more) groups. A commonly used score is that proposed by Raab and Butcher:

\sum_{l=1}^{L} \omega_l (\bar{x}_{1l}-\bar{x}_{0l})^2

where \bar{x}_{1l} and \bar{x}_{0l} are the mean values of covariate l in the treatment and control groups respectively, and \omega_l is some weight, which is often the inverse standard deviation of the covariate. Conceptually the score is a sum of standardised differences in means, so lower values indicate greater balance. But other scores would also work. Indeed, any statistic that measures the distance between the distributions of two variables would work and could be summed up over the covariates. This could include the maximum distance:

max_l |x_{1l} - x_{0l}|

the Manhattan distance:

\sum_{l=1}^{L} |x_{1l}-x_{0l}|

or even the Symmetrised Bayesian Kullback-Leibler divergence (I can’t be bothered to type this one out). Grischott has developed a Shiny application to estimate all these distances in a constrained randomisation framework, detailed in this paper.

Things become more complex if there are more than two trial arms. All of the above scores are only able to compare two groups. However, there already exist a number of univariate measures of multivariate balance in the form of MANOVA (multivariate analysis of variance) test statistics. For example, if we have G trial arms and let X_{jg} = \left[ x_{jg1},...,x_{jgL} \right]' then the between group covariance matrix is:

B = \sum_{g=1}^G N_g(\bar{X}_{.g} - \bar{X}_{..})(\bar{X}_{.g} - \bar{X}_{..})'

and the within group covariance matrix is:

W = \sum_{g=1}^G \sum_{j=1}^{N_g} (X_{jg}-\bar{X}_{.g})(X_{jg}-\bar{X}_{.g})'

which we can use in a variety of statistics including Wilks’ Lambda, for example:

\Lambda = \frac{det(W)}{det(W+B)}

No trial has previously used covariate constrained randomisation with multiple groups, as far as I am aware, but this is the subject of an ongoing paper investigating these scores – so watch this space!

Once the scores have been calculated for all possible schemes or a very large number of possible schemes, we select from among those which are most balanced. The most balanced are defined according to some quantile of the balance score, say the top 15%.

As a simple simulated example of how this might be coded in R, let’s consider a trial of 8 clusters with two standard-normally distributed covariates. We’ll use the Raab and Butcher score from above:

#simulate the covariates
n <- 8
x1 <- rnorm(n)
x2 <- rnorm(n)
x <- matrix(c(x1,x2),ncol=2)
#enumerate all possible schemes - you'll need the partitions package here
schemes <- partitions::setparts(c(n/2,n/2))
#write a function that will estimate the score
#for each scheme which we can apply over our
#set of schemes
balance_score <- function(scheme,covs){
treat.idx <- I(scheme==2)
control.idx <- I(scheme==1)
treat.means <- apply(covs[treat.idx,],2,mean)
control.means <- apply(covs[control.idx,],2,mean)
cov.sds <- apply(covs,2,sd)
#Raab-butcher score
score <- sum((treat.means - control.means)^2/cov.sds)
#apply the function
scores <- apply(schemes,2,function(i)balance_score(i,x))
#find top 15% of schemes (lowest scores)
scheme.set <- which(scores <= quantile(scores,0.15))
#choose one at random
scheme.number <- sample(scheme.set,1)
scheme.chosen <- schemes[,scheme.number]


A commonly used method of cluster trial analysis is by estimating a mixed-model, i.e. a hierarchical model with cluster-level random effects. Two key questions are whether to control for the covariates used in the randomisation, and which test to use for treatment effects. Fan Li has two great papers answering these questions for linear models and binomial models. One key conclusion is that the appropriate type I error rates are only achieved in models adjusted for the covariates used in the randomisation. For non-linear models type I error rates can be way off for many estimators especially with small numbers of clusters, which is often the reason for doing constrained randomisation in the first place, so a careful choice is needed here. I would recommend adjusted permutation tests if in doubt to ensure the appropriate type I error rates. Of course, one could take a Bayesian approach to analysis, although there is no analysis that I’m aware of, of the performance of these models for these analyses (another case of “watch this space!”).


There are many trials that used this procedure and listing even a fraction would be a daunting task. But I would be remiss for not noting a trial of my own that uses covariate constrained randomisation. It is investigating the effect of providing an incentive to small and medium sized enterprises to adhere to a workplace well-being programme. There are good applications used as examples in Fan Li’s papers mentioned above. A trial that featured in a journal round-up in February used covariate constrained randomisation to balance a very small number of clusters in a trial of a medicines access programme in Kenya.