Bad reasons not to use the EQ-5D-5L

We’ve seen a few editorials and commentaries popping up about the EQ-5D-5L recently, in Health Economics, PharmacoEconomics, and PharmacoEconomics again. All of these articles have – to varying extents – acknowledged the need for NICE to exercise caution in the adoption of the EQ-5D-5L. I don’t get it. I see no good reason not to use the EQ-5D-5L.

If you’re not familiar with the story of the EQ-5D-5L in England, read any of the linked articles, or see an OHE blog post summarising the tale. The important part of the story is that NICE has effectively recommended the use of the EQ-5D-5L descriptive system (the questionnaire), but not the new EQ-5D-5L value set for England. Of the new editorials and commentaries, Devlin et al are vaguely pro-5L, Round is vaguely anti-5L, and Brazier et al are vaguely on the fence. NICE has manoeuvred itself into a situation where it has to make a binary decision. 5L, or no 5L (which means sticking with the old EQ-5D-3L value set). Yet nobody seems keen to lay down their view on what NICE ought to decide. Maybe there’s a fear of being proven wrong.

So, herewith a list of reasons for exercising caution in the adoption of the EQ-5D-5L, which are either explicitly or implicitly cited by recent commentators, and why they shouldn’t determine NICE’s decision. The EQ-5D-5L value set for England should be recommended without hesitation.

We don’t know if the descriptive system is valid

Round argues that while the 3L has been validated in many populations, the 5L has not. Diabetes, dementia, deafness and depression are presented as cases where the 3L has been validated but the 5L has not. But the same goes for the reverse. There are plenty of situations in which the 3L has been shown to be problematic and the 5L has not. It’s simply a matter of time. This argument should only hold sway if we expect there to be more situations in which the 5L lacks validity, or if those violations are in some way more serious. I see no evidence of that. In fact, we see measurement properties improved with the 5L compared with the 3L. Devlin et al put the argument to bed in highlighting the growing body of evidence demonstrating that the 5L descriptive system is better than the 3L descriptive system in a variety of ways, without any real evidence that there are downsides to the descriptive expansion. And this – the comparison of the 3L and the 5L – is the correct comparison to be making, because the use of the 3L represents current practice. More fundamentally, it’s hard to imagine how the 5L descriptive system could be less valid than the 3L descriptive system. That there are only a limited number of validation studies using the 5L is only a problem if we can hypothesise reasons for the 5L to lack validity where the 3L held it. I can’t think of any. And anyway, NICE is apparently satisfied with the descriptive system; it’s the value set they’re worried about.

We don’t know if the preference elicitation methods are valid for states worse than dead

This argument is made by Brazier et al. The value set for England uses lead time TTO, which is a relatively new (and therefore less-tested) method. The problem is that we don’t know if any methods for valuing states worse than dead are valid because valuing states worse than dead makes no real sense. Save for pulling out a Ouija board, or perhaps holding a gun to someone’s head, we can never find out what is the most valid approach to valuing states worse than dead. And anyway, this argument fails on the same basis as the previous one: where is the evidence to suggest that the MVH approach to valuing states worse than dead (for the EQ-5D-3L) holds more validity than lead time TTO?

We don’t know if the EQ-VT was valid

As discussed by Brazier et al, it looks like there may have been some problems in the administration of the EuroQol valuation protocol (the EQ-VT) for the EQ-5D-5L value set. As a result, some of the data look a bit questionable, including large spikes in the distribution of values at 1.0, 0.5, 0.0, and -1.0. Certainly, this justifies further investigation. But it shouldn’t stall adoption of the 5L value set unless this constitutes a greater concern than the distributional characteristics of the 3L, and that’s not an argument I see anybody making. Perhaps there should have been more piloting of the EQ-VT, but that should (in itself) have no bearing on the decision of whether to use the 3L value set or the 5L value set. If the question is whether we expect the EQ-VT protocol to provide a more accurate estimation of health preferences than the MVH protocol – and it should be – then as far as I can tell there is no real basis for preferring the MVH protocol.

We don’t know if the value set (for England) is valid

Devlin et al state that, with respect to whether differences in the value sets represent improvements, “Until the external validation of the England 5L value set concludes, the jury is still out.” I’m not sure that’s true. I don’t know what the external validation is going to involve, but it’s hard to imagine a punctual piece of work that could demonstrate the ‘betterness’ of the 5L value set compared with the 3L value set. Yes, a validation exercise could tell us whether the value set is replicable. But unless validation of the comparator (i.e. the 3L value set) is also attempted and judged on the same basis, it won’t be at all informative to NICE’s decision. Devlin et al state that there is a governmental requirement to validate the 5L value set for England. But beyond checking the researchers’ sums, it’s difficult to understand what that could even mean. Given that nobody seems to have defined ‘validity’ in this context, this is a very dodgy basis for determining adoption or non-adoption of the 5L.

5L-based evaluations will be different to 3L-based evaluations

Well, yes. Otherwise, what would be the point? Brazier et al present this as a justification for a ‘pause’ for an independent review of the 5L value set. The authors present the potential shift in priority from life-improving treatments to life-extending treatments as a key reason for a pause. But this is clearly a circular argument. Pausing to look at the differences will only bring those (and perhaps new) differences into view (though notably at a slower rate than if the 5L was more widely adopted). And then what? We pause for longer? Round also mentions this point as a justification for further research. This highlights a misunderstanding of what it means for NICE to be consistent. NICE has no responsibility to make decisions in 2018 precisely as it would have in 2008. That would be foolish and ignorant of methodological and contextual developments. What NICE needs to provide is consistency in the present – precisely what is precluded by the current semi-adoption of the EQ-5D-5L.

5L data won’t be comparable to 3L data

Round mentions this. But why does it matter? This is nothing compared to the trickery that goes on in economic modelling. The whole point of modelling is to do the best we can with the data we’ve got. If we have to compare an intervention for which outcomes are measured in 3L values with an intervention for which outcomes are measured in 5L values, then so be it. That is not a problem. It is only a problem if manufacturers strategically use 3L or 5L values according to whichever provides the best results. And you know what facilitates that? A pause, where nobody really knows what is going on and NICE has essentially said that the use of both 3L and 5L descriptive systems is acceptable. If you think mapping from 5L to 3L values is preferable to consistently using the 5L values then, well, I can’t reason with you, because mapping is never anything but a fudge (albeit a useful one).

There are problems with the 3L, so we shouldn’t adopt the 5L

There’s little to say on this point beyond asserting that we mustn’t let perfect be the enemy of the good. Show me what else you’ve got that could be more readily and justifiably introduced to replace the 3L. Round suggests that shifting from the 3L to the 5L is no different to shifting from the 3L to an entirely different measure, such as the SF-6D. That’s wrong. There’s a good reason that NICE should consider the 5L as the natural successor to the 3L. And that’s because it is. This is exactly what it was designed to be: a methodological improvement on the same conceptual footing. The key point here is that the 3L and 5L contain the same domains. They’re trying to capture health-related quality of life in a consistent way; they refer to the same evaluative space. Shifting to the SF-6D (for example) would be a conceptual shift, whereas shifting to the 5L from the 3L is nothing but a methodological shift (with the added benefit of more up-to-date preference data).

To sum up

Round suggests that the pause is because of “an unexpected set of results” arising from the valuation exercise. That may be true in part. But I think it’s more likely the fault of dodgy public sector deals with the likes of Richard Branson and a consequently algorithm-fearing government. I totally agree with Round that, if NICE is considering a new outcome measure, they shouldn’t just be considering the 5L. But given that right now they are only considering the 5L, and that the decision is explicitly whether or not to adopt the 5L, there are no reasons not to do so.

The new value set is only a step change because we spent the last 25 years idling. Should we really just wait for NICE to assess the value set, accept it, and then return to our see-no-evil position for the next 25 years? No! The value set should be continually reviewed and redeveloped as methods improve and societal preferences evolve. The best available value set for England (and Wales) should be regularly considered by NICE as part of a review of the reference case. A special ‘pause’ for the new 5L value set will only serve to reinforce the longevity of compromised value sets in the future.

Yes, the EQ-5D-3L and its associated value set for the UK has been brilliantly useful over the years, but it now has a successor that – as far as we can tell – is better in many ways and at least as good in the rest. As a public body, NICE is conservative by nature. But researchers needn’t be.

Credits

Chris Sampson’s journal round-up for 19th March 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Using HTA and guideline development as a tool for research priority setting the NICE way: reducing research waste by identifying the right research to fund. BMJ Open [PubMed] Published 8th March 2018

As well as the cost-effectiveness of health care, economists are increasingly concerned with the cost-effectiveness of health research. This makes sense, given that both are usually publicly funded and so spending on one (in principle) limits spending on the other. NICE exists in part to prevent waste in the provision of health care – seeking to maximise benefit. In this paper, the authors (all current or ex-employees of NICE) consider the extent to which NICE processes are also be used to prevent waste in health research. The study focuses on the processes underlying NICE guideline development and HTA, and the work by NICE’s Science Policy and Research (SP&R) programme. Through systematic review and (sometimes) economic modelling, NICE guidelines identify research needs, and NICE works with the National Institute for Health Research to get their recommended research commissioned, with some research fast-tracked as ‘NICE Key Priorities’. Sometimes, it’s also necessary to prioritise research into methodological development, and NICE have conducted reviews to address this, with the Internal Research Advisory Group established to ensure that methodological research is commissioned. The paper also highlights the roles of other groups such as the Decision Support Unit, Technical Support Unit and External Assessment Centres. This paper is useful for two reasons. First, it gives a clear and concise explanation of NICE’s processes with respect to research prioritisation, and maps out the working groups involved. This will provide researchers with an understanding of how their work fits into this process. Second, the paper highlights NICE’s current research priorities and provides insight into how these develop. This could be helpful to researchers looking to develop new ideas and proposals that will align with NICE’s priorities.

The impact of the minimum wage on health. International Journal of Health Economics and Management [PubMed] Published 7th March 2018

The minimum wage is one of those policies that is so far-reaching, and with such ambiguous implications for different people, that research into its impact can deliver dramatically different conclusions. This study uses American data and takes advantage of the fact that different states have different minimum wage levels. The authors try to look at a broad range of mechanisms by which minimum wage can affect health. A major focus is on risky health behaviours. The study uses data from the Behavioral Risk Factor Surveillance System, which includes around 300,000 respondents per year across all states. Relevant variables from these data characterise smoking, drinking, and fruit and vegetable consumption, as well as obesity. There are also indicators of health care access and self-reported health. The authors cut their sample to include 21-64-year-olds with no more than a high school degree. Difference-in-differences are estimated by OLS according to individual states’ minimum wage changes. As is often the case for minimum wage studies, the authors find several non-significant effects: smoking and drinking don’t seem to be affected. Similarly, there isn’t much of an impact on health care access. There seems to be a small positive impact of minimum wage on the likelihood of being obese, but no impact on BMI. I’m not sure how to interpret that, but there is also evidence that a minimum wage increase leads to a reduction in fruit and vegetable consumption, which adds credence to the obesity finding. The results also demonstrate that a minimum wage increase can reduce the number of days that people report to be in poor health. But generally – on aggregate – there isn’t much going on at all. So the authors look at subgroups. Smoking is found to increase (and BMI decrease) with minimum wage for younger non-married white males. Obesity is more likely to be increased by minimum wage hikes for people who are white or married, and especially for those in older age groups. Women seem to benefit from fewer days with mental health problems. The main concerns identified in this paper are that minimum wage increases could increase smoking in young men and could reduce fruit and veg consumption. But I don’t think we should overstate it. There’s a lot going on in the data, and though the authors do a good job of trying to identify the effects, other explanations can’t be excluded. Minimum wage increases probably don’t have a major direct impact on health behaviours – positive or negative – but policymakers should take note of the potential value in providing public health interventions to those groups of people who are likely to be affected by the minimum wage.

Aligning policy objectives and payment design in palliative care. BMC Palliative Care [PubMed] Published 7th March 2018

Health care at the end of life – including palliative care – presents challenges in evaluation. The focus is on improving patients’ quality of life, but it’s also about satisfying preferences for processes of care, the experiences of carers, and providing a ‘good death’. And partly because these things can be difficult to measure, it can be difficult to design payment mechanisms to achieve desirable outcomes. Perhaps that’s why there is no current standard approach to funding for palliative care, with a lot of variation between countries, despite the common aspiration for universality. This paper tackles the question of payment design with a discussion of the literature. Traditionally, palliative care has been funded by block payments, per diems, or fee-for-service. The author starts with the acknowledgement that there are two challenges to ensuring value for money in palliative care: moral hazard and adverse selection. Providers may over-supply because of fee-for-service funding arrangements, or they may ‘cream-skim’ patients. Adverse selection may arise in an insurance-based system, with demand from high-risk people causing the market to fail. These problems could potentially be solved by capitation-based payments and risk adjustment. The market could also be warped by blunt eligibility restrictions and funding caps. Another difficulty is the challenge of achieving allocative efficiency between home-based and hospital-based services, made plain by the fact that, in many countries, a majority of people die in hospital despite a preference for dying at home. The author describes developments (particularly in Australia) in activity-based funding for palliative care. An interesting proposal – though not discussed in enough detail – is that payments could be made for each death (per mortems?). Capitation-based payment models are considered and the extent to which pay-for-performance could be incorporated is also discussed – the latter being potentially important in achieving those process outcomes that matter so much in palliative care. Yet another challenge is the question of when palliative care should come into play, because, in some cases, it’s a matter of sooner being better, because the provision of palliative care can give rise to less costly and more preferred treatment pathways. Thus, palliative care funding models will have implications for the funding of acute care. Throughout, the paper includes examples from different countries, along with a wealth of references to dig into. Helpfully, the author explicitly states in a table the models that different settings ought to adopt, given their prevailing model. As our population ages and the purse strings tighten, this is a discussion we can expect to be having more and more.

Credits

 

Chris Sampson’s journal round-up for 5th March 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Healthy working days: the (positive) effect of work effort on occupational health from a human capital approach. Social Science & Medicine Published 28th February 2018

If you look at the literature on the determinants of subjective well-being (or happiness), you’ll see that unemployment is often cited as having a big negative impact. The same sometimes applies for its impact on health, but here – of course – the causality is difficult to tease apart. Then, in research that digs deeper, looking at hours worked and different types of jobs, we see less conclusive results. In this paper, the authors start by asserting that the standard approach in labour economics (on which I’m not qualified to comment) is to assume that there is a negative association between work effort and health. This study extends the framework by allowing for positive effects of work that are related to individuals’ characteristics and working conditions, and where health is determined in a Grossman-style model of health capital that accounts for work effort in the rate of health depreciation. This model is used to examine health as a function of work effort (as indicated by hours worked) in a single wave of the European Working Conditions Survey (EWCS) from 2010 for 15 EU member states. Key items from the EWCS included in this study are questions such as “does your work affect your health or not?”, “how is your health in general?”, and “how many hours do you usually work per week?”. Working conditions are taken into account by looking at data on shift working and the need to wear protective equipment. One of the main findings of the study is that – with good working conditions – greater work effort can improve health. The Marxist in me is not very satisfied with this. We need to ask the question, compared to what? Working fewer hours? For most people, that simply isn’t an option. Aren’t the people who work fewer hours the people who can afford to work fewer hours? No attention is given to the sociological aspects of employment, which are clearly important. The study also shows that overworking or having poorer working conditions reduces health. We also see that, for many groups, longer hours do not negatively impact on health until we reach around 120 hours a week. This fails a good sense check. Who are these people?! I’d be very interested to see if these findings hold for academics. That the key variables are self-reported undermines the conclusions somewhat, as we can expect people to adjust their expectations about work effort and health in accordance with their colleagues. It would be very difficult to avoid a type 2 error (with respect to the negative impact of effort on health) using these variables to represent health and the role of work effort.

Agreement between retrospectively and contemporaneously collected patient-reported outcome measures (PROMs) in hip and knee replacement patients. Quality of Life Research [PubMed] Published 26th February 2018

The use of patient-reported outcomes (PROMs) in elective care in the NHS has been a boon for researchers in our field, providing before-and-after measurement of health-related quality of life so that we can look at the impact of these interventions. But we can’t do this in emergency care because the ‘before’ is never observed – people only show up when they’re in the middle of the emergency. But what if people could accurately recall their pre-emergency health state? There’s some evidence to suggest that people can, so long as the recall period is short. This study looks at NHS PROMs data (n=443), with generic and condition-specific outcomes collected from patients having hip or knee replacements. Patients included in the study were additionally asked to recall their health state 4 weeks prior to surgery. The authors assess the extent to which the contemporary PROM measurements agree with the retrospective measurements, and the extent to which any disagreement relates to age, socioeconomic status, or the length of time to recall. There wasn’t much difference between contemporary and retrospective measurements, though patients reported slightly lower health on the retrospective questionnaires. And there weren’t any compelling differences associated with age or socioeconomic status or the length of recall. These findings are promising, suggesting that we might be able to rely on retrospective PROMs. But the elective surgery context is very different to the emergency context, and I don’t think we can expect the two types of health care to impact recollection in the same way. In this study, responses may also have been influenced by participants’ memories of completing the contemporary questionnaire, and the recall period was very short. But the only way to find out more about the validity of retrospective PROM collection is to do more of it, so hopefully we’ll see more studies asking this question.

Adaptation or recovery after health shocks? Evidence using subjective and objective health measures. Health Economics [PubMed] Published 26th February 2018

People’s expectations about their health can influence their behaviour and determine their future health, so it’s important that we understand people’s expectations and any ways in which they diverge from reality. This paper considers the effect of a health shock on people’s expectations about how long they will live. The authors focus on survival probability, measured objectively (i.e. what actually happens to these patients) and subjectively (i.e. what the patients expect), and the extent to which the latter corresponds to the former. The arguments presented are couched within the concept of hedonic adaptation. So the question is – if post-shock expectations return to pre-shock expectations after a period of time – whether this is because people are recovering from the disease or because they are moving their reference point. Data are drawn from the Health and Retirement Study. Subjective survival probability is scaled to whether individuals expect to survive for 2 years. Cancer, stroke, and myocardial infarction are the health shocks used. The analysis uses some lagged regression models, separate for each of the three diagnoses, with objective and subjective survival probability as the dependent variable. There’s a bit of a jumble of things going on in this paper, with discussions of adaptation, survival, self-assessed health, optimism, and health behaviours. So it’s a bit difficult to see the wood for the trees. But the authors find the effect they’re looking for. Objective survival probability is negatively affected by a health shock, as is subjective survival probability. But then subjective survival starts to return to pre-shock trends whereas objective survival does not. The authors use this finding to suggest that there is adaptation. I’m not sure about this interpretation. To me it seems as if subjective life expectancy is only weakly responsive to changes in objective life expectancy. The findings seem to have more to do with how people process information about their probability of survival than with how they adapt to a situation. So while this is an interesting study about how people process changes in survival probability, I’m not sure what it has to do with adaptation.

3L, 5L, what the L? A NICE conundrum. PharmacoEconomics [PubMed] Published 26th February 2018

In my last round-up, I said I was going to write a follow-up blog post to an editorial on the EQ-5D-5L. I didn’t get round to it, but that’s probably best as there has since been a flurry of other editorials and commentaries on the subject. Here’s one of them. This commentary considers the perspective of NICE in deciding whether to support the use of the EQ-5D-5L and its English value set. The authors point out the differences between the 3L and 5L, namely the descriptive systems and the value sets. Examples of the 5L descriptive system’s advantages are provided: a reduced ceiling effect, reduced clustering, better discriminative ability, and the benefits of doing away with the ‘confined to bed’ level of the mobility domain. Great! On to the value set. There are lots of differences here, with 3 main causes: the data, the preference elicitation methods, and the modelling methods. We can’t immediately determine whether these differences are improvements or not. The authors stress the point that any differences observed will be in large part due to quirks in the original 3L value set rather than in the 5L value set. Nevertheless, the commentary is broadly supportive of a cautionary approach to 5L adoption. I’m not. Time for that follow-up blog post.

Credits