Method of the month: constrained randomisation

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is constrained randomisation.

Principle

Randomised experimental studies are one of the best ways of estimating the causal effects of an intervention. They have become more and more widely used in economics; Banerjee and Duflo are often credited with popularising them among economists. When done well, randomly assigning a treatment ensures both observable and unobservable factors are independent of treatment status and likely to be balanced between treatment and control units.

Many of the interventions economists are interested in are at a ‘cluster’ level, be it a school, hospital, village, or otherwise. So the appropriate experimental design would be a cluster randomised controlled trial (cRCT), in which the clusters are randomised to treatment or control and individuals within each cluster are observed either cross-sectionally or longitudinally. But, except in cases of large budgets, the number of clusters participating can be fairly small. When randomising a relatively small number of clusters we could by chance end up with a quite severe imbalance in key covariates between trial arms. This presents a problem if we suspect a priori that these covariates have an influence on key outcomes.

One solution to the problem of potential imbalance is covariate-based constrained randomisation. The principle here is to conduct a large number of randomisations, assess the balance of covariates in each one using some balance metric, and then to randomly choose one of the most balanced according to this metric. This method preserves the important random treatment assignment while ensuring covariate balance. Stratified randomisation also has a similar goal, but in many cases may not be possible if there are continuous covariates of interest or too few clusters to distribute among many strata.

Implementation

Conducting covariate constrained randomisation is straightforward and involves the following steps:

  1. Specifying the important baseline covariates to balance the clusters on. For each cluster j we have L covariates x_{il}; l=1,...L.
  2. Characterising each cluster in terms of these covariates, i.e. creating the x_{il}.
  3. Enumerating all potential randomisation schemes or simulating a large number of them. For each one, we will need to measure the balance of the x_{il} between trial arms.
  4. Selecting a candidate set of randomisation schemes that are sufficiently balanced according to some pre-specified criterion from which we can randomly choose our treatment allocation.

Balance scores

A key ingredient in the above steps is the balance score. This score needs to be some univariate measure of potentially multivariate imbalance between two (or more) groups. A commonly used score is that proposed by Raab and Butcher:

\sum_{l=1}^{L} \omega_l (\bar{x}_{1l}-\bar{x}_{0l})^2

where \bar{x}_{1l} and \bar{x}_{0l} are the mean values of covariate l in the treatment and control groups respectively, and \omega_l is some weight, which is often the inverse standard deviation of the covariate. Conceptually the score is a sum of standardised differences in means, so lower values indicate greater balance. But other scores would also work. Indeed, any statistic that measures the distance between the distributions of two variables would work and could be summed up over the covariates. This could include the maximum distance:

max_l |x_{1l} - x_{0l}|

the Manhattan distance:

\sum_{l=1}^{L} |x_{1l}-x_{0l}|

or even the Symmetrised Bayesian Kullback-Leibler divergence (I can’t be bothered to type this one out). Grischott has developed a Shiny application to estimate all these distances in a constrained randomisation framework, detailed in this paper.

Things become more complex if there are more than two trial arms. All of the above scores are only able to compare two groups. However, there already exist a number of univariate measures of multivariate balance in the form of MANOVA (multivariate analysis of variance) test statistics. For example, if we have G trial arms and let X_{jg} = \left[ x_{jg1},...,x_{jgL} \right]' then the between group covariance matrix is:

B = \sum_{g=1}^G N_g(\bar{X}_{.g} - \bar{X}_{..})(\bar{X}_{.g} - \bar{X}_{..})'

and the within group covariance matrix is:

W = \sum_{g=1}^G \sum_{j=1}^{N_g} (X_{jg}-\bar{X}_{.g})(X_{jg}-\bar{X}_{.g})'

which we can use in a variety of statistics including Wilks’ Lambda, for example:

\Lambda = \frac{det(W)}{det(W+B)}

No trial has previously used covariate constrained randomisation with multiple groups, as far as I am aware, but this is the subject of an ongoing paper investigating these scores – so watch this space!

Once the scores have been calculated for all possible schemes or a very large number of possible schemes, we select from among those which are most balanced. The most balanced are defined according to some quantile of the balance score, say the top 15%.

As a simple simulated example of how this might be coded in R, let’s consider a trial of 8 clusters with two standard-normally distributed covariates. We’ll use the Raab and Butcher score from above:

#simulate the covariates
n <- 8
x1 <- rnorm(n)
x2 <- rnorm(n)
x <- matrix(c(x1,x2),ncol=2)
#enumerate all possible schemes - you'll need the partitions package here
schemes <- partitions::setparts(c(n/2,n/2))
#write a function that will estimate the score
#for each scheme which we can apply over our
#set of schemes
balance_score <- function(scheme,covs){
treat.idx <- I(scheme==2)
control.idx <- I(scheme==1)
treat.means <- apply(covs[treat.idx,],2,mean)
control.means <- apply(covs[control.idx,],2,mean)
cov.sds <- apply(covs,2,sd)
#Raab-butcher score
score <- sum((treat.means - control.means)^2/cov.sds)
return(score)
}
#apply the function
scores <- apply(schemes,2,function(i)balance_score(i,x))
#find top 15% of schemes (lowest scores)
scheme.set <- which(scores <= quantile(scores,0.15))
#choose one at random
scheme.number <- sample(scheme.set,1)
scheme.chosen <- schemes[,scheme.number]

Analyses

A commonly used method of cluster trial analysis is by estimating a mixed-model, i.e. a hierarchical model with cluster-level random effects. Two key questions are whether to control for the covariates used in the randomisation, and which test to use for treatment effects. Fan Li has two great papers answering these questions for linear models and binomial models. One key conclusion is that the appropriate type I error rates are only achieved in models adjusted for the covariates used in the randomisation. For non-linear models type I error rates can be way off for many estimators especially with small numbers of clusters, which is often the reason for doing constrained randomisation in the first place, so a careful choice is needed here. I would recommend adjusted permutation tests if in doubt to ensure the appropriate type I error rates. Of course, one could take a Bayesian approach to analysis, although there is no analysis that I’m aware of, of the performance of these models for these analyses (another case of “watch this space!”).

Application

There are many trials that used this procedure and listing even a fraction would be a daunting task. But I would be remiss for not noting a trial of my own that uses covariate constrained randomisation. It is investigating the effect of providing an incentive to small and medium sized enterprises to adhere to a workplace well-being programme. There are good applications used as examples in Fan Li’s papers mentioned above. A trial that featured in a journal round-up in February used covariate constrained randomisation to balance a very small number of clusters in a trial of a medicines access programme in Kenya.

Credit

Transformative treatments: a big methodological challenge for health economics

Social scientists, especially economists, are concerned with causal inference: understanding whether and how an event causes a certain effect. Typically, we subscribe to the view that causal relations are reducible to sets of counterfactuals, and we use ever more sophisticated methods, such as instrumental variables and propensity score matching, to estimate these counterfactuals. Under the right set of assumptions, like that unobserved differences between study subjects are time invariant or that a treatment causes its effect through a certain mechanism, we can derive estimators for average treatment effects. All uncontroversial stuff indeed.

A recent paper from L.A. Paul and Kieran Healy introduces an argument of potential importance to how we can interpret studies investigating causal relations. In particular, they make the argument that we don’t know if individual preferences persist in a study through treatment. It is in general not possible to distinguish between the case where a treatment has satisfied an underlying revealed preference, or transformed an individual’s preferences. If preferences are changed or transformed, rather than revealed, then they are, in effect, a different population and in a causal inference type study, no longer comparable to the control population.

To quote their thought experiment:

Vampires: In the 21st century, vampires begin to populate North America. Psychologists decide to study the implications this could have for the human population. They put out a call for undergraduates to participate in a randomized controlled experiment, and recruit a local vampire with scientific interests. After securing the necessary permissions, they randomize and divide their population of undergraduates into a control group and a treatment group. At t1, members of each group are given standard psychological assessments measuring their preferences about vampires in general and about becoming a vampire in particular. Then members of the experimental group are bitten by the lab vampire.

Members of both groups are left to go about their daily lives for a period of time. At t2, they are assessed. Members of the control population do not report any difference in their preferences at t2. All members of the treated population, on the other hand, report living richer lives, enjoying rewarding new sensory experiences, and having a new sense of meaning at t2. As a result, they now uniformly report very strong pro-vampire preferences. (Some members of the treatment group also expressed pro-vampire preferences before the experiment, but these were a distinct minority.) In exit interviews, all treated subjects also testify that they have no desire to return to their previous condition.

Should our psychologists conclude that being bitten by a vampire somehow satisfies people’s underlying, previously unrecognized, preferences to become vampires? No. They should conclude that being bitten by a vampire causes you to become a vampire (and thus, to prefer being one). Being bitten by a vampire and then being satisfied with the result does not satisfy or reveal your underlying preference to be a vampire. Being bitten by a vampire transforms you: it changes your preferences in a deep and fundamental way, by replacing your underlying human preferences with vampire preferences, no matter what your previous preferences were.

In our latest journal round-up, I featured a paper that used German reunification in 1989 as a natural experiment to explore the impact of novel food items in the market on consumption and weight gain. The transformative treatments argument comes into play here. Did reunification reveal the preferences of East Germans for the novel food stuffs, or did it change their preferences for foodstuffs overall due to the significant cultural change? If the latter case is true then West Germans do not constitute an appropriate control group. The causal mechanism at play is also important to the development of policy: for example, without reunification there may not have been any impact from novel food products.

This argument is also sometimes skirted around with regards to the valuing of health states. Should it be the preferences of healthy people, or the experienced utility of sick people, that determine health state values? Do physical trauma and disease reveal our underlying preferences for different health states, or do they transform us to have different preferences entirely? Any study looking at the effect of disease on health status or quality of life could not distinguish between the two. Yet the two cases are akin to using the same or different groups of people to do the valuation of health states.

Consider also something like estimating the impact of retirement on health and quality of life. If self-reported quality of life is observed to improve in one of these studies, we don’t know if that is because retirement has satisfied a pre-existing preference for the retired lifestyle, or retirement has transformed a person’s preferences. In the latter case, the appropriate control group to evaluate the causal effect of retirement is not non-retired persons.

Paul and Healy do not make their argument to try to prevent or undermine research in the social sciences, they interpret their conclusion as a “methodological challenge”. The full implications of the above arguments have not been explored but could be potentially great and new innovations in methodology to estimate average causal effects could be warranted. How this may be achieved, I’ll have to admit, I do not know.

Credits