Chris Sampson’s journal round-up for 14th November 2016

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Weighing clinical evidence using patient preferences: an application of probabilistic multi-criteria decision analysis. PharmacoEconomics [PubMedPublished 10th November 2016

There are at least two ways in which preferences determine the allocation of health care resources (in a country with a HTA agency, at least). One of them we think about a lot; the (societal) valuation of health states as defined by a multi-attribute measure (like the EQ-5D). The other relates to patient preferences that determine whether or not a specific individual (and their physician) will choose to use a particular technology, given its expected clinical outcomes for that individual. A drug may very well make sense at the aggregate level but be a very bad choice for a particular individual when compared with alternatives. It’s right that this process should be deliberative and not solely driven by an algorithm, but it’s also important to maintain transparent and consistent decision making. Multi-criteria decision analysis (MCDA) has been proposed as a means of achieving this, and it can be used to take into account the uncertainty associated with clinical outcomes. In this study the authors present an approach that also incorporates random preference variation along with parameter uncertainty in both preferences and clinical evidence. The model defines a value function and estimates the impact of uncertainty using probabilistic Monte Carlo simulation, which in turn estimates the mean value of each possible treatment in the population. Treatments can therefore be ranked according to patients’ preferences, along with an estimate of the uncertainty associated with this ranking. To demonstrate the utility of the model it is applied to an example for the relative value of HAARTs for HIV, with parameters derived from clinical evaluations and stated preferences studies. It’s nice to see that the authors also provide their R script. One headline finding seems to be that this approach is likely to demonstrate just how much uncertainty is involved that might not previously have been given much attention. It could therefore help steer us towards more valuable research in the future. And it could be used to demonstrate that optimal decisions might change when all sources of uncertainty are considered. Clearly a potential application of this method is in the realm of personalised medicine, which is slowly but inevitably reaching beyond the confines of pharmacogenomics.

Communal sharing and the provision of low-volume high-cost health services: results of a survey. PharmacoEconomics – Open Published 4th November 2016

One of the distributional concerns we might have about the QALY-maximisation approach is its implications for people with rare diseases. Drugs for rare diseases are often expensive (because the marginal cost is likely to be higher) and therefore less cost-effective. There is mixed evidence about whether or not people exhibit a preference for redistributive allocation of QALY-creating resources according to rarity. Of course, the result you get from such studies is dependent on the question you ask. In order to ask the right question it’s important to understand the mechanisms by which people might prefer allocation of additional resources to services for rare diseases. One suggestion in the literature is the preservation of hope. This study presents another, based on the number of people sharing the cost. So imagine a population of 1000 people, and all those people share the cost of health care. For a rare disease, more people will share the cost of the treatment per person treated. So if 10 people have the disease, that’s 100 payers per recipient. If 100 people have the disease then it’s just 10 payers per recipient. The idea is that people prefer a situation in which more people share the cost, and on that basis prefer to allocate resources to rare diseases. A web-based survey was conducted in Australia in which 702 people were asked to divide a budget between a small patient group with a high-cost illness and a large patient group with a low-cost illness. There were also a set of questions in which respondents indicated the importance of 6 possible influences on their decisions. The findings show that people did choose to allocate more funds to the rarer disease, despite the reduced overall health gain. This suggests that people do have a preference for wider cost sharing, which could explain extra weight being given to rare diseases. I think it’s a good idea that deserves more research, but for me there are a few problems with the study. Much of the effect could be explained by people’s non-linear valuations of risk, as the scenario highlighted that the respondents themselves would be at risk of the disease. We also can’t clearly differentiate between an effect due to the rarity of the disease (and associated cost sharing) and an effect due to the severity of the disease.

The challenge of conditional reimbursement: stopping reimbursement can be more difficult than not starting in the first place! Value in Health Published 3rd November 2016

If anything’s going to make me read a paper, it’s an exclamation mark! Conditional reimbursement of technologies that are probably effective but probably not cost-effective can be conducted in a rational way in order to generate research findings and benefit social welfare in the long run. But that can only hold true if those technologies subsequently found (through more research) to be ineffective or too costly are then made unavailable. Otherwise conditional reimbursement agreements will do more harm than good. This study uses discrete choice experiments to compare public (n=1169) and potential policymaker (n=90) values associated with the removal of an available treatment compared with non-reimbursement of a new treatment. The results showed (in addition to some other common findings) that both the public and policymakers preferred reimbursement of an existing treatment over the reimbursement of a new treatment, and were willing to accept an ICER of more than €7,000 higher for an existing treatment. Though the DCE found it to be a significant determinant, 60% of policymakers reported that they thought that reimbursement status was unimportant, so there may be some cognitive dissonance going on there. The most obvious (and probably most likely) explanation for the observed preference for currently reimbursed treatments is loss aversion. But it could also be that people recognise real costs associated with ending reimbursement that are not reflected in either the QALY estimates or the costs to the health system. Whatever the explanation, HTA agencies need to bear this in mind when using conditional reimbursement agreements.

Head-to-head comparison of health-state values derived by a probabilistic choice model and scores on a visual analogue scale. The European Journal of Health Economics [PubMed] Published 2nd November 2016

I’ve always had a fondness for a good old VAS as a direct measure of health state (dare we say utility) values, despite the limitations of the approach. This study compares discrete choices for EQ-5D-5L states with VAS valuations – thus comparing indirect and direct health state valuations – in Canada, the USA, England and The Netherlands (n=1775). Each respondent had to make a forced choice between two EQ-5D-5L health states and then assess both states on a single VAS. Ten different pairs were completed by each respondent. The two different approaches correlated strongly within and across countries, as we might expect. And pairs of EQ-5D-5L states that were valued relatively low or high in the discrete choice model were also valued accordingly in the VAS. But the relationship between the two approaches was non-linear in that values differed more at the ends of the scale, with poor health states valued more differently in the choice model and good health states valued more differently on the VAS. This probably just reflects some of the biases observed in the use of VAS that are already well-documented, particularly context bias and end-state aversion. This study clearly suggests (though does not by itself prove) that discrete choice models are a better choice for health state valuation… but the VAS ain’t dead yet.

Credits

Biased towards bias?

A six hour delay flying home from the Health Economists Study Group conference in Gran Canaria is providing me with ample time to mull over the great issues in life. One of these big issues is of course the trade-off between bias and variance.

Typically the discussion of an empirical economics paper at a conference will focus heavily on the model and estimation method. Often the word ‘endogenous’ echoes round the room as the discussion considers whether the estimator employed is biased or not. This is of course an important consideration for any empirical work; but, the question of efficiency (essentially the variance) of the estimator rarely comes up. Indeed, Andrew Gelman has discussed this predilection among economists elsewhere. So, why don’t we prefer to think in terms of overall error?

As an example consider that fixed-effects (FE) models are generally almost always preferred to random-effects (RE) models among economists. (Although the meaning of these terms varies widely!) This is for reasons of unbiasedness; we teach undergraduates to choose FE if the Hausman test for a difference in FE and RE is rejected. But, RE is more efficient. So the question should be under what conditions is the overall error smaller in the RE model. If much of the variation is between individuals (or whatever the unit of a panel of data is) rather than within individuals, then the efficiency gains of RE may outweigh error due to bias.

To give a more mathematically explicit example consider the use of an ordinary least squares estimator (OLS) versus a two stage least squares (2SLS) estimator. If we have the simple linear model y = xβ + u, the OLS estimator is biased if Corr(x,u)=ρ≠0. In such cases if an instrumental variable, say z, is available, one which is correlated with but not with u, then 2SLS is an unbiased estimator. But what of overall error? If λ is the correlation between and x, and is the sample size, then 2SLS has a lower mean squared error than OLS if

ρ2 λ2 n/(1-λ2 )>1

Thus if the correlation between and u is low or the instruments are weak then OLS should be preferred in many cases. In many cases it comes down to whether the sample size is sufficient.

The same considerations could be made of predictive models for economic evaluation. An ambitious young student (as I was) may want to create an ever more complex model that captures this and that ambiguity in the world. While each addition may reduce bias in the prediction of the outcome it will increase variance. Thus beyond a certain point we will just increase the uncertainty in our predictions.

It could be argued that one of the key goals of research is a decision. Minimising the error in the estimator that informs the decision will lead to a lower probability of making the wrong decision. We should therefore consider overall error. This could be a plug for Bayesian methods; the posterior mean is an estimator that minimises the mean squared error. But, I don’t think Bayesianism is implied by the premises, we should just be less biased towards bias.

Photo credit: PeterPan23

Heterogeneity and Markov models

The big appeal of Markov models is their relative simplicity, with their focus on what happens with a whole cohort, instead of individual patients. Because of this, they are relatively bad at taking into account patient heterogeneity (true differences in outcomes between patients, which can be explained by for example disease severity, age, biomarkers). In the past, there have been several ways of dealing with patient heterogeneity. Earlier this year, I and my co-authors Dr. Lucas Goossens and Prof.Dr. Maureen Rutten-van Mölken, published a study showing the outcomes of these differences in approach. We show that three of the four methods are useful in different circumstances. The fourth one should not be used anymore.

In practice, heterogeneity is often ignored. An average value of the patient population will then be used for any variables representing patient characteristics in the model. The cost-effectiveness outcomes for this ‘average patient’ are then assumed to represent the entire patient population. In addition to ignoring available evidence, the results are difficult to interpret since the ‘average patient’ does not exist. With non-linearity being the rule rather than the exception in Markov modelling, heterogeneity should be taken into account explicitly in order to obtain a correct cost-effectiveness estimate over a heterogeneous population. This method can therefore be useful only if there is little heterogeneity, or it is expected not to have an influence on the cost-effectiveness outcomes.

An alternative is to define several subgroups of patients by defining several different combinations of patient characteristics, and to calculate the outcomes for each of these. The comparison of subgroups allows for the exploration of the effect that differences between patients have on cost-effectiveness outcomes. In our study, subgroup analyses did lead to insight in the differences between the different types of patients, but not all outcomes were useful for decision makers. After all, policy and reimbursement decisions are commonly made for an entire patient population, not subgroups. If a decision maker wants to use the subgroup analyses for decision regarding specific subgroups, equity concerns are always an issue. Patient heterogeneity in clinical characteristics, such as starting FEV1% in our study, may be acceptable for sub-group specific recommendations. Other input parameters, such as gender, race or in our case age, are not. This part of the existing heterogeneity has to be ignored if you use subgroup analyses.

In some cases, heterogeneity has been handled by simply combining it with parameter uncertainty in a probabilistic sensitivity analysis (PSA). The expected outcome for the Single Loop PSA is correct for the population, but the distribution of the expected outcome (which reflects the uncertainty in which many decision makers are interested) is not correct. The outcomes ignore the fundamental difference between the patient heterogeneity and parameter uncertainty. In our study, it even influenced the shape of the cost-effectiveness plane, leading to an overestimation of uncertainty. In our opinion, this method should never be used any more.

In order to correctly separate parameter uncertainty and heterogeneity, the analysis requires a nested Monte Carlo simulation, by drawing a number of individual patients within each PSA iteration. In this way you can investigate sampling uncertainty, while still accounting for patient heterogeneity. This method accounts sufficiently for heterogeneity, is easily interpretable and can be performed using existing software. In essence, this ‘Double Loop PSA’ uses the existing Expected Value of Partial Perfect Information (EVPPI) methodology with a different goal.

Calculation time may be a burden for this method, compared to the other options. In our study, we have chosen a small sample of 30 randomly drawn patients within each PSA draw, to avoid the rapidly increasing computation time. After testing, we concluded that 30 would be a good middle ground between accuracy and runtime. In our case, the calculation time was 9 hours (one overnight calculation) which is not a huge obstacle, in our opinion. Fortunately, since computational speed increases rapidly, it is likely that using faster, more modern computers would decrease the necessary time.

To conclude, we think that three of the methods discussed can be useful in cost-effectiveness research, each in different circumstances. When little or no heterogeneity is expected, or when it is not expected to influence the cost-effectiveness results, disregarding heterogeneity may be correct. In our case study, heterogeneity did have an impact. Subgroup analyses may inform policy decisions on each subgroup, as long as they are well defined and the characteristics of the cohort that define a subgroup truly represent the patients within that subgroup. Despite the necessary calculation time, the Double Loop PSA is a viable alternative which leads to better results and better policy decisions, when accounting for heterogeneity in a Markov model. Directly combining patient heterogeneity with parameter uncertainty in a PSA can only be used to calculate the point estimate of the expected outcome. It disregards the fundamental differences between heterogeneity and sampling uncertainty and overestimates uncertainty as a result.