R for trial and model-based cost-effectiveness analysis: workshop

Background and objectives

It is our pleasure to announce a workshop and training event on the use of R for trial and model-based cost-effectiveness analysis (CEA). This follows our successful workshop on R for CEA in 2018.

Our event will begin with a half-day short course on R for decision trees and Markov models and the use of the BCEA package for graphical and statistical analysis of results; this will be delivered by Gianluca Baio of UCL and Howard Thom of Bristol University.

This will be followed by a one-day workshop in which we will present a wide variety of technical aspects by experts from academia, industry, and government institutions (including NICE). Topics will include decision trees, Markov models, discrete event simulation, integration of network meta-analysis, extrapolation of survival curves, and development of R packages.

We will include a pre-workshop virtual code challenge on a problem set by our scientific committee. This will take place over Github and a Slack channel with participants encouraged to submit final R code solutions for peer review on efficiency, flexibility, elegance and transparency. Prizes will be provided for the best entry.

Participants are also invited to submit abstracts for potential oral presentations. An optional dinner and networking event will be held on the evening of 8th July.

To submit an abstract, please send it to howard.thom@bristol.ac.uk with the subject “R for CEA abstract”. The word limit is 300. Abstract submission deadline is 15 May 2019 and the scientific committee will make decisions on acceptance by 1st June 2018.

Preliminary Programme

Day 2: Workshop. Tuesday 9th July.

• 9:30-9:45. Howard Thom. Welcome
• 9:45-10:15. Nathan Green. Imperial College London. _Simple, pain-free decision trees in R for the Excel user
• 10:15-10:35 Pedro Saramago. Centre for Health Economics, University of York. Using R for Markov modelling: an introduction
• 10:35-10:55. Alison Smith. University of Leeds. Discrete event simulation models in R
• 10:55-11:10. Coffee
• 11:10-12:20. Participants oral presentation session (4 speakers, 15 minutes each)
• 12:20-13:45. Lunch
• 13:45-14:00. Gianluca Baio. University College London. Packing up, shacking up’s (going to be) all you wanna do!. Building packages in R and Github
• 14:00-14:15. Jeroen Jansen. Innovation and Value Initiative. State transition models and integration with network meta-analysis
• 14:15-14:25. Ash Bullement. Delta Hat Analytics, UK. Fitting and extrapolating survival curves for CEA models
• 14:25-14:45. Iryna Schlackow. Nuffield Department of Public Health, University of Oxford. Generic R methods to prepare routine healthcare data for disease modelling
• 14:45-15:00. Coffee
• 15:00-15:15. Initiatives for the future and challenges in gaining R acceptance (ISPOR Taskforce, ISPOR Special Interest Group, future of the R for CEA workshop)
• 15:15-16:30. Participant discussion.
• 16:30-16:45. Anthony Hatswell. Close and conclusions

R for trial and model-based cost-effectiveness analysis: short course

Background and objectives

It is our pleasure to announce a workshop and training event on the use of R for trial and model-based cost-effectiveness analysis (CEA). This follows our successful workshop on R for CEA in 2018.

Our event will begin with a half-day short course on R for decision trees and Markov models and the use of the BCEA package for graphical and statistical analysis of results; this will be delivered by Gianluca Baio of UCL and Howard Thom of Bristol University.

This will be followed by a one-day workshop in which we will present a wide variety of technical aspects by experts from academia, industry, and government institutions (including NICE). Topics will include decision trees, Markov models, discrete event simulation, integration of network meta-analysis, extrapolation of survival curves, and development of R packages.

We will include a pre-workshop virtual code challenge on a problem set by our scientific committee. This will take place over Github and a Slack channel with participants encouraged to submit final R code solutions for peer review on efficiency, flexibility, elegance and transparency. Prizes will be provided for the best entry.

Participants are also invited to submit abstracts for potential oral presentations. An optional dinner and networking event will be held on the evening of 8th July.

Preliminary Programme

Day 1: Introduction to R for Cost-Effectiveness Modelling. Monday 8th July.

• 13:00-13:15. Howard Thom. Welcome and introductions
• 13:15-13:45. Howard Thom. Building a decision tree in R
• 13:45-14:15. Gianluca Baio. Using BCEA to summarise outputs of an economic model
• 14:15-14:45. Practical 1 (Decision trees)
• 14:45-15:00. Coffee break
• 15:00-15:45. Howard Thom. R for building Markov models
• 15:45-16:15. Gianluca Baio. Further use of BCEA
• 16:15-17:00. Practical 2 (Markov models)

Method of the month: Permutation tests

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is permutation tests.

Principle

One of the main objections to the use of p-values for statistical inference is that they are often misunderstood. They can be interpreted as the probability a null hypothesis is true, which they are not. Part of the cause of this problem is the black-box approach to statistical software. You can plug in data to Stata, R, or any other package, ask it to run a regression or test with ease, and it will spit out a load of p-values. Many people will just take it on trust that the software is returning the test of the hypothesis of interest and that the method has the correct properties, like type one error rate. But if one had to go through the process to obtain a test-statistic and the relevant distribution to compare it to, perhaps then the p-value wouldn’t be so misunderstood. For trials involving randomisation, permutation tests (or randomisation tests or exact tests) are just such a process.

Permutation tests were first proposed by Ronald Fisher in the early 20th Century. The basic principle is that to test differences between two groups assigned at random we can determine the exact distribution of a test statistic (such as a difference in means) under the null hypothesis by calculating the value of the test statistic for all possible ways of arranging our units into the two groups. The value of the test statistic for the actual assignment can be compared to this distribution to determine the p-value.

The simplest example would be to test a difference in means for a continuous outcome between two groups assigned in a randomised controlled trial. Let’s generate some data (we’ll do this in R) from a simple trial with 50 individuals per arm. In the control arm the data come from a $N(0,1)$ distribution and in the treatment arm they come from a $N(0.5,1)$ distribution:

n <- 100 #number of individuals
D <- sample(c(rep(0,n/2),rep(1,n/2)),n) #treatment assignment
y <- rnorm(n,mean=D*0.5,sd=1) #generate normal outcomes
T.diff <- mean(y[(D*1:n)])-mean(y[(-1*(D-1)*1:n)]) #actual difference in means

Now let’s add a function to randomly re-assign units to treatment and control and calculate the difference in means we would observe under the null of no difference based on our generated data. We will then plot this distribution and add a line showing our where the actual difference in means lies.

#function to generate differences in means
permStat <- function(n,y){
D <- sample(c(rep(0,n/2),rep(1,n/2)),n) #generate new assignment
T.diff <- mean(y[(D*1:n)])-mean(y[(-1*(D-1)*1:n)])
return(T.diff)
}
T.dist <- sapply(1:500,function(i)permStat(n,y)) #apply it 500 times
qplot(T.dist)+geom_vline(xintercept=T.diff,col="red") #plot

Our 2-sided p-value here is 0.04, i.e. the proportion of values at least as extreme as our test statistic.

For a more realistic example we can consider a cluster randomised trial with a binary outcome. The reason for choosing this example is that estimating non-linear mixed models is difficult. Calculating test statistics, especially when the number of clusters is relatively small, is even harder. The methods used in most statistics packages have inflated type one errors, unbeknownst to many. So let’s set up the following trial: two-arms with 8 clusters per arm, and 100 patients per cluster, which is representative of trials of, say, hospitals. The data generating mechanism is for patient $i$ in cluster $j$

$y_{ij} = Bernoulli(p_{ij})$

$p_{ij} = Logit(\alpha_j + x_j'\beta + D_{j}\gamma)$

So no individual level covariates, four Bernoulli(0.3) covariates $x_j$ with $\beta = [1,1,1,1]$, and a treatment indicator $D_j$ with treatment effect $\gamma=0$ (to look at type one error rates). The cluster effect is $\alpha_j \sim N(0,\sigma^2_\alpha)$ and $\sigma^2_\alpha$ is chosen to give an intraclass correlation coefficient of 0.05. We’ll simulate data from this model and then estimate the model above and test the null hypothesis $H_0:\gamma=0$ in two ways. First, we’ll use the popular R package lme4 and the command glmer, which uses adaptive Gaussian quadrature to estimate the parameters and covariance matrix; the built in p-values are derived from standard Wald t-statistics. Second, we’ll use our permutation tests.

Gail et al. (1996) examine permutation tests for these kinds of models. They propose the following residual-based test (although one can use other tests based on the likelihood): (i) estimate the simple model under the null with no treatment effect and no hierarchical effect, i.e. $p_{ij}=Logit(\alpha+x_{ij}'\beta)$; (ii) for each individual generate their predicted values and residuals $r_{ij}$; (iii) generate the cluster average residuals $\bar{r}_{.j}=N_j^{-1}\sum_{i=1}^{N_j} r_{ij}$. Then the test statistic is

$U=N^{-1}_{j} \left( \sum_{j=1}^{2N_j}D_{jg}\bar{r}_{.j} - \sum_{j=1}^{2N_j}(D_{j}-1)\bar{r}_{.j} \right) = N^{-1}_{j} \sum_{j=1}^{2N_j}(2D_{j}-1)\bar{r}_{.j}$

Under the null and given equal cluster sizes, the residual means are exchangeable. So the exact distribution of $U$ can be obtained by calculating it under all possible randomisation schemes. The p-value is then the quantile of this distribution under which the test statistic falls for the actual randomisation scheme. For larger numbers of clusters, it is not feasible to permute every possible randomisation scheme, so we approximate the distribution of $U$ using 500 randomly generated schemes. The following figure shows the estimated type one error rates using the two different methods (and 200 simulations):

The figure clearly shows an inflated type one error rates for the standard p-values reported by glmer especially for smaller numbers of clusters per arm. By contrast the residual permutation test shows approximately correct type one error rates (given more simulations there should be less noise in these estimates).

Implementation

Implementation of these tests is straightforward in different software packages. In Stata, one can use the command permute, for which you specify the different groups, number of permutations and command to estimate the treatment effect. In R, there are various packages, like coin, that perform a similar function. For more complex models particular non-linear ones and ones involving adjustment, one has to be careful about how to specify the appropriate test statistic and model under the null hypothesis, which may involve a little programming, but it is relatively straightforward to do so.

Applications

These methods have widespread applications for anyone looking to use null hypothesis significance testing. So a complete overview of the literature is not possible. Instead, we highlight a few uses of these methods.

In a previous post in this series we covered synthetic control methods; one of the ways of computing test statistics for this method has been called ‘placebo tests’, which are an exact parallel to the permutation tests discussed here. Krief and others discuss the use of these methods for evaluating health policies. Another example from a regression-based analysis is provided by Dunn and Shapiro. And Jacob, Ludwig, and Miller examine the impact of a lottery for vouchers to move to another area and employ these tests.

Sugar et al derive health states for depression from the SF-12 and use permutation test methods to validate the health states. Barber and Thompson use these tests to examine costs data from an RCT.

Credit