The big appeal of Markov models is their relative simplicity, with their focus on what happens with a whole cohort, instead of individual patients. Because of this, they are relatively bad at taking into account patient heterogeneity (true differences in outcomes between patients, which can be explained by for example disease severity, age, biomarkers). In the past, there have been several ways of dealing with patient heterogeneity. Earlier this year, I and my co-authors Dr. Lucas Goossens and Prof.Dr. Maureen Rutten-van Mölken, published a study showing the outcomes of these differences in approach. We show that three of the four methods are useful in different circumstances. The fourth one should not be used anymore.

In practice, heterogeneity is often ignored. An average value of the patient population will then be used for any variables representing patient characteristics in the model. The cost-effectiveness outcomes for this ‘average patient’ are then assumed to represent the entire patient population. In addition to ignoring available evidence, the results are difficult to interpret since the ‘average patient’ does not exist. With non-linearity being the rule rather than the exception in Markov modelling, heterogeneity should be taken into account explicitly in order to obtain a correct cost-effectiveness estimate over a heterogeneous population. This method can therefore be useful only if there is little heterogeneity, or it is expected not to have an influence on the cost-effectiveness outcomes.

An alternative is to define several subgroups of patients by defining several different combinations of patient characteristics, and to calculate the outcomes for each of these. The comparison of subgroups allows for the exploration of the effect that differences between patients have on cost-effectiveness outcomes. In our study, subgroup analyses did lead to insight in the differences between the different types of patients, but not all outcomes were useful for decision makers. After all, policy and reimbursement decisions are commonly made for an entire patient population, not subgroups. If a decision maker wants to use the subgroup analyses for decision regarding specific subgroups, equity concerns are always an issue. Patient heterogeneity in clinical characteristics, such as starting FEV1% in our study, may be acceptable for sub-group specific recommendations. Other input parameters, such as gender, race or in our case age, are not. This part of the existing heterogeneity has to be ignored if you use subgroup analyses.

In some cases, heterogeneity has been handled by simply combining it with parameter uncertainty in a probabilistic sensitivity analysis (PSA). The expected outcome for the Single Loop PSA is correct for the population, but the distribution of the expected outcome (which reflects the uncertainty in which many decision makers are interested) is not correct. The outcomes ignore the fundamental difference between the patient heterogeneity and parameter uncertainty. In our study, it even influenced the shape of the cost-effectiveness plane, leading to an overestimation of uncertainty. In our opinion, this method should never be used any more.

In order to correctly separate parameter uncertainty and heterogeneity, the analysis requires a nested Monte Carlo simulation, by drawing a number of individual patients within each PSA iteration. In this way you can investigate sampling uncertainty, while still accounting for patient heterogeneity. This method accounts sufficiently for heterogeneity, is easily interpretable and can be performed using existing software. In essence, this ‘Double Loop PSA’ uses the existing Expected Value of Partial Perfect Information (EVPPI) methodology with a different goal.

Calculation time may be a burden for this method, compared to the other options. In our study, we have chosen a small sample of 30 randomly drawn patients within each PSA draw, to avoid the rapidly increasing computation time. After testing, we concluded that 30 would be a good middle ground between accuracy and runtime. In our case, the calculation time was 9 hours (one overnight calculation) which is not a huge obstacle, in our opinion. Fortunately, since computational speed increases rapidly, it is likely that using faster, more modern computers would decrease the necessary time.

To conclude, we think that three of the methods discussed can be useful in cost-effectiveness research, each in different circumstances. When little or no heterogeneity is expected, or when it is not expected to influence the cost-effectiveness results, disregarding heterogeneity may be correct. In our case study, heterogeneity did have an impact. Subgroup analyses may inform policy decisions on each subgroup, as long as they are well defined and the characteristics of the cohort that define a subgroup truly represent the patients within that subgroup. Despite the necessary calculation time, the Double Loop PSA is a viable alternative which leads to better results and better policy decisions, when accounting for heterogeneity in a Markov model. Directly combining patient heterogeneity with parameter uncertainty in a PSA can only be used to calculate the point estimate of the expected outcome. It disregards the fundamental differences between heterogeneity and sampling uncertainty and overestimates uncertainty as a result.

I really enjoyed this review. I wonder if we should have a general discussion of non-linearity and Markov and similar linear models.

Harold