Chris Sampson’s journal round-up for 11th March 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Identification, review, and use of health state utilities in cost-effectiveness models: an ISPOR Good Practices for Outcomes Research Task Force report. Value in Health [PubMed] Published 1st March 2019

When modellers select health state utility values to plug into their models, they often do it in an ad hoc and unsystematic way. This ISPOR Task Force report seeks to address that.

The authors discuss the process of searching, reviewing, and synthesising utility values. Searches need to use iterative techniques because evidence requirements develop as a model develops. Due to the scope of models, it may be necessary to develop multiple search strategies (for example, for different aspects of disease pathways). Searches needn’t be exhaustive, but they should be systematic and transparent. The authors provide a list of factors that should be considered in defining search criteria. In reviewing utility values, both quality and appropriateness should be considered. Quality is indicated by the precision of the evidence, the response rate, and missing data. Appropriateness relates to the extent to which the evidence being reviewed conforms to the context of the model in which it is to be used. This includes factors such as the characteristics of the study population, the measure used, value sets used, and the timing of data collection. When it comes to synthesis, the authors suggest it might not be meaningful in most cases, because of variation in methods. We can’t pool values if they aren’t (at least roughly) equivalent. Therefore, one approach is to employ strict inclusion criteria (e.g only EQ-5D, only a particular value set), but this isn’t likely to leave you with much. Meta-regression can be used to analyse more dissimilar utility values and provide insight into the impact of methodological differences. But the extent to which this can provide pooled values for a model is questionable, and the authors concede that more research is needed.

This paper can inform that future research. Not least in its attempt to specify minimum reporting standards. We have another checklist, with another acronym (SpRUCE). The idea isn’t so much that this will guide publications of systematic reviews of utility values, but rather that modellers (and model reviewers) can use it to assess whether the selection of utility values was adequate. The authors then go on to offer methodological recommendations for using utility values in cost-effectiveness models, considering issues such as modelling technique, comorbidities, adverse events, and sensitivity analysis. It’s early days, so the recommendations in this report ought to be changed as methods develop. Still, it’s a first step away from the ad hoc selection of utility values that (no doubt) drives the results of many cost-effectiveness models.

Estimating the marginal cost of a life year in Sweden’s public healthcare sector. The European Journal of Health Economics [PubMed] Published 22nd February 2019

It’s only recently that health economists have gained access to data that enables the estimation of the opportunity cost of health care expenditure on a national level; what is sometimes referred to as a supply-side threshold. We’ve seen studies in the UK, Spain, Australia, and here we have one from Sweden.

The authors use data on health care expenditure at the national (1970-2016) and regional (2003-2016) level, alongside estimates of remaining life expectancy by age and gender (1970-2016). First, they try a time series analysis, testing the nature of causality. Finding an apparently causal relationship between longevity and expenditure, the authors don’t take it any further. Instead, the results are based on a panel data analysis, employing similar methods to estimates generated in other countries. The authors propose a conceptual model to support their analysis, which distinguishes it from other studies. In particular, the authors assert that the majority of the impact of expenditure on mortality operates through morbidity, which changes how the model should be specified. The number of newly graduated nurses is used as an instrument indicative of a supply-shift at the national rather than regional level. The models control for socioeconomic and demographic factors and morbidity not amenable to health care.

The authors estimate the marginal cost of a life year by dividing health care expenditure by the expenditure elasticity of life expectancy, finding an opportunity cost of €38,812 (with a massive 95% confidence interval). Using Swedish population norms for utility values, this would translate into around €45,000/QALY.

The analysis is considered and makes plain the difficulty of estimating the marginal productivity of health care expenditure. It looks like a nail in the coffin for the idea of estimating opportunity costs using time series. For now, at least, estimates of opportunity cost will be based on variation according to geography, rather than time. In their excellent discussion, the authors are candid about the limitations of their model. Their instrument wasn’t perfect and it looks like there may have been important confounding variables that they couldn’t control for.

Frequentist and Bayesian meta‐regression of health state utilities for multiple myeloma incorporating systematic review and analysis of individual patient data. Health Economics [PubMed] Published 20th February 2019

The first paper in this round-up was about improving practice in the systematic review of health state utility values, and it indicated the need for more research on the synthesis of values. Here, we have some. In this study, the authors conduct a meta-analysis of utility values alongside an analysis of registry and clinical study data for multiple myeloma patients.

A literature search identified 13 ‘methodologically appropriate’ papers, providing 27 health state utility values. The EMMOS registry included data for 2,445 patients in 22 counties and the APEX clinical study included 669 patients, all with EQ-5D-3L data. The authors implement both a frequentist meta-regression and a Bayesian model. In both cases, the models were run including all values and then with a limited set of only EQ-5D values. These models predicted utility values based on the number of treatment classes received and the rate of stem cell transplant in the sample. The priors used in the Bayesian model were based on studies that reported general utility values for the presence of disease (rather than according to treatment).

The frequentist models showed that utility was low at diagnosis, higher at first treatment, and lower at each subsequent treatment. Stem cell transplant had a positive impact on utility values independent of the number of previous treatments. The results of the Bayesian analysis were very similar, which the authors suggest is due to weak priors. An additional Bayesian model was run with preferred data but vague priors, to assess the sensitivity of the model to the priors. At later stages of disease (for which data were more sparse), there was greater uncertainty. The authors provide predicted values from each of the five models, according to the number of treatment classes received. The models provide slightly different results, except in the case of newly diagnosed patients (where the difference was 0.001). For example, the ‘EQ-5D only’ frequentist model gave a value of 0.659 for one treatment, while the Bayesian model gave a value of 0.620.

I’m not sure that the study satisfies the recommendations outlined in the ISPOR Task Force report described above (though that would be an unfair challenge, given the timing of publication). We’re told very little about the nature of the studies that are included, so it’s difficult to judge whether they should have been combined in this way. However, the authors state that they have made their data extraction and source code available online, which means I could check that out (though, having had a look, I can’t find the material that the authors refer to, reinforcing my hatred for the shambolic ‘supplementary material’ ecosystem). The main purpose of this paper is to progress the methods used to synthesise health state utility values, and it does that well. Predictably, the future is Bayesian.

Credits

Chris Sampson’s journal round-up for 23rd October 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

What is the evidence from past National Institute of Health and Care Excellence single-technology appraisals regarding company submissions with base-case incremental cost-effectiveness ratios of less than £10,000/QALY? Value in Health Published 18th October 2017

NICE have been looking into diversifying their HTA processes of late. One of the newly proposed rules is that technologies with a base-case ICER estimate of less than £10,000 per QALY should be eligible for a fast-track appraisal, so that patients can benefit as early as possible from a therapy that does not pose a great risk of wasting NHS resources. But what have NICE been doing up to this point for such technologies? For this study, the researchers analysed content from all NICE single technology appraisals (STAs) between 2009 and 2016, of which there were 171 with final reports available that reported a base-case ICER. 15% (26) of the STAs reported all base-case ICERs to be below £10,000, and of these 73% (19) received a positive recommendation at the first appraisal committee meeting. A key finding is that 7 of the 26 received a ‘Minded No’ judgment in the first instance due in part to inadequate evidence and – though all got a positive decision in the end – some recommendations were restricted to subgroups. The authors also had a look at STAs with base-case ICERs up to £15,000, of which there were 5 more. All of these received a positive recommendation at the first appraisal committee meeting. Another group of (28) STAs reported multiple ICERs that included estimates both below and above £10,000. These tell a different story. Only 13 received an unrestricted positive recommendation at the first appraisal committee. Positive recommendations eventually followed for all 28, but 7 were on the basis of patient access schemes. There are a few things to consider in light of these findings. It may not be possible for NICE to adequately fast-track some sub-£10k submissions because the ICERs are not estimated on the basis of appropriate comparisons, or because the evidence is otherwise inadequate. But there may be good grounds for extending the fast-track threshold to £15,000. The study also highlights some indicators of complexity (such as the availability of patient access scheme discounts) that might be used as a basis for excluding submissions from the fast-track process.

EQ-5D-5L versus EQ-5D-3L: the impact on cost-effectiveness in the United Kingdom. Value in Health Published 18th October 2017

Despite some protest from NICE, most UK health economists working on trial-based economic evaluations are probably getting on with using the new EQ-5D-5L (and associated value set) over its 3L predecessor. This shift could bring important changes to the distribution of cost-effectiveness results for evaluated technologies. In this study, the researchers sought to identify what these changes might be, by examining a couple of datasets which included both 3L and 5L response data. One dataset was produced by the EuroQol group, with 3,551 individuals from across Europe with a range of health states, and the other was a North American dataset collected from 5,205 patients with rheumatoid disease, which switched from 3L to 5L with a wave of overlap. The analysis employs a previously developed method with a series of ordinal regressions, in which 3L-5L pairs are predicted using a copula approach. The first thing to note is that there was variation in the distribution of responses between the different dimensions and between the two datasets, and so a variety of model specifications are needed. To investigate the implications of using the 5L instead of the 3L, the authors considered 9 cost-effectiveness analysis case studies. The 9 studies reported 13 comparisons. In almost all cases where 3L was replaced with the 5L, the intervention resulted in a smaller QALY gain and higher ICER. The only study in which use of the 5L increased the incremental QALYs was one in which life extension was the key driver of QALY gains. Generally speaking, use of the 5L increases index values and reduces the range, so quality of life improvements are ‘more difficult’ to achieve, while life extension is relatively more valuable than on the 3L. Several technologies move from being clearly cost-effective within NICE’s £20,000-£30,000 threshold to being borderline cases. Different technologies for different diseases will be impacted differently by the move from the 3L to the 5L. So while we should probably still start using the 5L and its value set (because it’s methodologically superior), we mustn’t forget how different our findings might be in comparison to our old ways.

Experience-based utility and own health state valuation for a health state classification system: why and how to do it. The European Journal of Health Economics [PubMedPublished 11th October 2017

There’s debate around whose values we ought to be using to estimate QALYs when making resource allocation decisions. Generally we use societal values, but some researchers think we should be using values from people actually in those health states. I’ve written before about some of the problems with this debate. In this study, the authors try to bring some clarity to the discussion. Four types of values are considered, defined by two distinctions: hypothetical vs own current state and general public vs patient values. The notion of experienced utility is introduced and the authors explain why this cannot be captured by (for example) a TTO exercise, because such exercises require hypothetical future scenarios of health improvement. Thus, the preferred terminology becomes ‘own health state valuation’. The authors summarise some of the research that has sought to compare the 4 types of values specified, highlighting that own health state valuations tend to give higher values associated with dysfunctional health states than do general population hypothetical valuations. The main point is that valuations can differ systematically according to whose values are being elicited. The authors describe some reasons why these values may differ. These could include i) poor descriptions of hypothetical states, ii) changing internal standards (e.g. response shift), and iii) adaptation. Next, the authors consider how to go about collecting own health state values. Two key challenges are specified: i) respondents may be unwilling where questions are complex or intrusive, and ii) there may be ethical concerns, particular where people are in terminal conditions. It is therefore difficult to sample for all possible health states. Selection bias may also rear its head. The tendency for more mild health states to be observed creates problems for the econometricians trying to model value sets. The authors propose some ways forward for identifying own health state value sets. One way would be to purposively sample EQ-5D health states from people representative within the states. However, some states are rarely observed, so we’d be looking at screening millions of people to identify the necessary participants from a general survey. So the authors suggest targeting people via other methods. Though this may still prove very difficult. A more effective (and favourable) approach – the authors suggest – could be to try and obtain better informed general population values. This could involve improving descriptive systems and encouraging deliberation. Evidence suggests that this can reduce the discrepancy between hypothetical and own state valuations. In particular, the authors recommend the use of citizens’ juries and multi-criteria decision analysis. This isn’t something we see being done in the literature, and so may be a fruitful avenue for future research.

Credits

Chris Sampson’s journal round-up for 19th December 2016

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Discounting the recommendations of the Second Panel on Cost-Effectiveness in Health and Medicine. PharmacoEconomics [PubMed] Published 9th December 2016

I do enjoy a bit of academic controversy. In this paper, renowned troublemakers Paulden, O’Mahony and McCabe do what they do best. Their target is the approach to discounting recommended by the report from the new Panel on Cost-Effectiveness, which I briefly covered in a recent round-up. This paper starts out by describing what – exactly – the Panel recommends. The real concerns lie with the approach recommended for analyses from the societal perspective. According to the authors, the problems start when the Panel conflates the marginal utility of income and that of consumption, and confusingly label it with our old friend the lambda. The confusion continues with the use of other imprecise terminology. And then there are some aspects of the Panel’s calculations that just seem to be plain old errors, resulting in illogical results – for example, that future consumption should be discounted more heavily if associated with higher marginal utility. Eh? The core criticism is that the Panel recommends the same discount rate for both costs and the consumption value of health, and that this contradicts recent developments. The Panel fails to clearly explain the basis for its recommendation. Helpfully, the authors outline an alternative (correct?) approach. The 3% rate for costs and health effects that the Panel recommends is not justified. The criticisms made in this paper are technical ones. That doesn’t mean they are any less important, but all we can see is that use of the Panel’s recommended decision rule results in some vague threat to utility-maximisation. Whether or not the conflation of consumption and utility value would actually result in bad decisions is not clear. Nevertheless, considering the massive influence of the original Gold Panel that will presumably be enjoyed by the Second Panel, extreme scrutiny is needed. I hope Basu and Ganiats see it fit to respond. I also wonder whether Paulden, O’Mahony and McCabe might have other chapters in their crosshairs.

Is best–worst scaling suitable for health state valuation? A comparison with discrete choice experiments. Health Economics [PubMed] Published 4th December 2016

BWS is gaining favour as a means of valuing health states. In this paper, team DCE throw down the gauntlet for team BWS. The study uses data collected during the development of a ‘glaucoma utility index’ in which DCE and BWS exercises were completed. The first question is, do DCE and BWS give the same results? The answer is no. The models indicate relatively weak correlation. For most dimensions, the BWS gave values for different severity levels that were closer together than in the DCE. This means that large improvements in health might be associated with smaller utility gains using BWS values than using DCE values. BWS is also identified as being more prone to decision biases. The second question is, which technique is best ‘to develop health utility indices’ (as the authors put it)? We need to bear in mind that this may in part be moot. Proponents of BWS have often claimed that they are not even trying to measure utility, so to judge BWS on this basis may not be appropriate. Anyway, set aside for now the fact that your own definition of utility might be (and that the authors’ almost certainly is) at odds with the BWS approach. No surprise that the authors suggest that DCE is superior. The bases on which this judgement is made are stability, monotonicity, continuity and completeness. All of these relate to whether the respondents make the kinds of responses we might expect. BWS answers are found to be less stable, more likely to be non-continuous and tend not to satisfy monotonicity. Personally I don’t see these as objective identifiers of goodness or ability of the technique to identify ‘true’ preferences. Also, I don’t know anything about how the glaucoma measure was developed, but if the health states it defines aren’t very informative then the results of this study won’t be either. Nevertheless, the findings do indicate to me that health state valuation using BWS might be subject to more caveats that need investigating before we start to make greater use of the technique. The much larger body of research behind DCE counts in its favour. Over to you, Terry team BWS.

Preference weighting of health state values: what difference does it make, and why? Value in Health Published 23rd November 2016

When non-economists ask about the way we measure health outcomes, the crux of it all is that the EQ-5D et al are preference-based. We think – or at least have accepted – that preferences must be really very serious and important. Equal weighting of dimensions? Nothing but meaningless nonsense! That may well be true in theory, but what if our approach to preference-elicitation is actually providing us with much the same results as if we were using equal weighting? Much research energy (and some money) goes into the preference weighting project, but could it be a waste of time? I had hoped that this paper might answer that question, but while it’s a useful study I didn’t find it quite so enlightening. The authors look at the EQ-5D-5L and 15D and compared the usual preference-based index for each with one constructed using an equal weighting, rescaled to the 0-1 dead-full health scale. The rescaling takes into account the differences in scale length for the 15D (0 to 1, 1.000) and the EQ-5D-5L (-0.281 to 1, 1.281). Data are from the Multi-Instrument Comparison (MIC) study, which includes healthy people as well as subsamples with a range of chronic diseases. The authors look at the correlations between the preference-based and equal weighted index values. They find very high correlation, especially for the 15D, and agreement on the EQ-5D increases when adjusted for the scale length. Furthermore, the results are investigated for known group validity alongside a depression-specific outcome measure. The EQ-5D performs a little better. But the study doesn’t really tell me what I want to know: would the use of equal-weighting normally give us the same results, and in what cases might it not? The MIC study includes a whole range of generic and condition-specific measures and I can’t see why the study didn’t look at all of them. It also could have used alternative preference weights to see how they differ. And it could have looked at all of the different disease-based subgroups in the sample to try and determine under what circumstances preference weighting might approach equal weighting. I hope to see more research on this issue, not to undermine preference weighting but to inform its improvement.

Credits