Heterogeneity and Markov models

The big appeal of Markov models is their relative simplicity, with their focus on what happens with a whole cohort, instead of individual patients. Because of this, they are relatively bad at taking into account patient heterogeneity (true differences in outcomes between patients, which can be explained by for example disease severity, age, biomarkers). In the past, there have been several ways of dealing with patient heterogeneity. Earlier this year, I and my co-authors Dr. Lucas Goossens and Prof.Dr. Maureen Rutten-van Mölken, published a study showing the outcomes of these differences in approach. We show that three of the four methods are useful in different circumstances. The fourth one should not be used anymore.

In practice, heterogeneity is often ignored. An average value of the patient population will then be used for any variables representing patient characteristics in the model. The cost-effectiveness outcomes for this ‘average patient’ are then assumed to represent the entire patient population. In addition to ignoring available evidence, the results are difficult to interpret since the ‘average patient’ does not exist. With non-linearity being the rule rather than the exception in Markov modelling, heterogeneity should be taken into account explicitly in order to obtain a correct cost-effectiveness estimate over a heterogeneous population. This method can therefore be useful only if there is little heterogeneity, or it is expected not to have an influence on the cost-effectiveness outcomes.

An alternative is to define several subgroups of patients by defining several different combinations of patient characteristics, and to calculate the outcomes for each of these. The comparison of subgroups allows for the exploration of the effect that differences between patients have on cost-effectiveness outcomes. In our study, subgroup analyses did lead to insight in the differences between the different types of patients, but not all outcomes were useful for decision makers. After all, policy and reimbursement decisions are commonly made for an entire patient population, not subgroups. If a decision maker wants to use the subgroup analyses for decision regarding specific subgroups, equity concerns are always an issue. Patient heterogeneity in clinical characteristics, such as starting FEV1% in our study, may be acceptable for sub-group specific recommendations. Other input parameters, such as gender, race or in our case age, are not. This part of the existing heterogeneity has to be ignored if you use subgroup analyses.

In some cases, heterogeneity has been handled by simply combining it with parameter uncertainty in a probabilistic sensitivity analysis (PSA). The expected outcome for the Single Loop PSA is correct for the population, but the distribution of the expected outcome (which reflects the uncertainty in which many decision makers are interested) is not correct. The outcomes ignore the fundamental difference between the patient heterogeneity and parameter uncertainty. In our study, it even influenced the shape of the cost-effectiveness plane, leading to an overestimation of uncertainty. In our opinion, this method should never be used any more.

In order to correctly separate parameter uncertainty and heterogeneity, the analysis requires a nested Monte Carlo simulation, by drawing a number of individual patients within each PSA iteration. In this way you can investigate sampling uncertainty, while still accounting for patient heterogeneity. This method accounts sufficiently for heterogeneity, is easily interpretable and can be performed using existing software. In essence, this ‘Double Loop PSA’ uses the existing Expected Value of Partial Perfect Information (EVPPI) methodology with a different goal.

Calculation time may be a burden for this method, compared to the other options. In our study, we have chosen a small sample of 30 randomly drawn patients within each PSA draw, to avoid the rapidly increasing computation time. After testing, we concluded that 30 would be a good middle ground between accuracy and runtime. In our case, the calculation time was 9 hours (one overnight calculation) which is not a huge obstacle, in our opinion. Fortunately, since computational speed increases rapidly, it is likely that using faster, more modern computers would decrease the necessary time.

To conclude, we think that three of the methods discussed can be useful in cost-effectiveness research, each in different circumstances. When little or no heterogeneity is expected, or when it is not expected to influence the cost-effectiveness results, disregarding heterogeneity may be correct. In our case study, heterogeneity did have an impact. Subgroup analyses may inform policy decisions on each subgroup, as long as they are well defined and the characteristics of the cohort that define a subgroup truly represent the patients within that subgroup. Despite the necessary calculation time, the Double Loop PSA is a viable alternative which leads to better results and better policy decisions, when accounting for heterogeneity in a Markov model. Directly combining patient heterogeneity with parameter uncertainty in a PSA can only be used to calculate the point estimate of the expected outcome. It disregards the fundamental differences between heterogeneity and sampling uncertainty and overestimates uncertainty as a result.

Solution for Trap 66 (Postcondition violated)

WinBUGS is a widely used free software program within health-economics. It allows for Bayesian statistical modelling, using Gibbs sampling. (Hence the name: the Windows version of Bayesian inference Using Gibbs Sampling). One of the drawbacks of WinBUGS is the notoriously uninformative error messages you can receive. While Google is usually a Fountain of Knowledge on solving errors, where WinBUGS is concerned it often only serves up other people asking the same question, and hardly any answers. This post is about one error message that I found, the solution that’s sometimes offered which I think only partly solves the problem and the solution I found which solves it completely.

The error message itself is “Trap 66 (postcondition violated)”. Variance priors have been identified as the culprits. The suggested solutions I could find (for example here, here and here) all point towards those priors being too big. The standard advice is then to reduce the priors (for example from dunif(0,100) to dunif(0,10)) and rerun it. This usually solves the problem.

However, to me, this doesn’t make a whole lot of sense theoretically. And, in a rare case of the two aligning, it also didn’t solve my problems in practice. I have been performing a simulation study, in which I have about 8,000 similar, but different data sets (8 scenarios, 1000 repetitions in each). They all represent mixed treatment comparisons (MTC), which are analyzed by WinBUGS. I used SAS to create the data, send it to WinBUGS and collect and analyse the outcomes. When I started the random effects MTC, the “Trap 66 (postcondition violated)” popped up around dataset 45. Making the priors smaller, as suggested, solved the problem for this particular dataset, but it came back on data set 95. The funny thing is that making the priors higher also solved the problem for the original dataset, but once again the same problem arose at a different data set (this time number 16).

Whenever I tried to recreate the problem, it would give the same error message at the exact same point in time, even though it’s a random sampler. From this it seems to be that the reason why the suggested solution works for one data is that the generated ‘chains’, as they are called in WinBUGS, are the same with the same priors and initial values. Defining a smaller prior will give a different chain which is likely not to cause problems. But so will a larger prior or a different initial value. However, it didn’t really solve the problem.

The solution I have found to work for all 8,000 data sets is to not look at the maximum value of the prior, but at the minimum value. The prior that is given for a standard error usually looks something like dunif(0,X). In my case, I did an extra step, with a prior on a variable called tau, for which I specify a uniform prior. The precision (one divided by the variance) that goes into the link function is defined by

prec <- pow(tau,-2)

This does not make any difference for the problem or the solution. My hypothesis is that when Trap 66 comes up, the chain generates a tau (or standard error, if that’s what you modelled directly) equal to 0, which resolves into a precision equal to 1 divided by 0, or infinity. The solution is to let the prior not start at 0, but at a small epsilon. I used dunif(0.001,10), which solved all my problems.

This solution is related to a different problem I once had when programming a probability. I mistakenly used a dunif(0,1) prior. Every now and then the chain will generate exactly 0 or 1, which does not sit well with the binomial link function. The error message is different (“Undefined real result”), but the solution is again to use a prior that does not include the extremes. In my case, using a flat dbeta instead (which I should have done to begin with) solved that problem.

Any suggestions and comments are much appreciated. You can download WinBUGS, the free immortality key and lots of examples from the BUGS project website here. It also contains a list of common error messages and their solutions. Not Trap 66, obviously.