Paul Mitchell’s journal round-up for 6th November 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A longitudinal study to assess the frequency and cost of antivascular endothelial therapy, and inequalities in access, in England between 2005 and 2015. BMJ Open [PubMed] Published 22nd October 2017

I am breaking one of my unwritten rules in a journal paper round-up by talking about colleagues’ work, but I feel it is too important not to provide a summary for a number of reasons. The study highlights the problems faced by regional healthcare purchasers in England when implementing national guideline recommendations on the cost-effectiveness of new treatments. The paper focuses on anti-vascular endothelial growth factor (anti-VEGF) medicines in particular, with two drugs, ranibizumab and aflibercept, offered to patients with a range of eye conditions, costing £550-800 per injection. Another drug, bevacizumab, that is closely related to ranibizumab and performs similarly in trials, could be provided at a fraction of the cost (£50-100 per injection), but it is currently unlicensed for eye conditions in the UK. This study investigates how the regional areas in England have coped with trying to provide the recommended drugs using administrative data from Hospital Episode Statistics in England between 2005-2015 by tracking their use since they have been recommended for a number of different eye conditions over the past decade. In 2014/15 the cost of these two new drugs for treating eye conditions alone was estimated at £447 million nationally. The distribution of where these drugs are provided is not equal, varying widely across regions after controlling for socio-demographics, suggesting an inequality of access associated with the introduction of these high-cost drugs over the past decade at a time of relatively low growth in national health spending. Although there are limitations associated with using data not intended for research purposes, the study shows how the most can be made from data routinely collected for non-research purposes. On a public policy level, it raises questions over the provision of such high-cost drugs, for which the authors state the NHS are currently paying more for than US insurers. Although it is important to be careful when comparing to unlicensed drugs, the authors point to clear evidence in the paper as to why their comparison is a reasonable one in this scenario, with a large opportunity cost associated with not including this option in national guidelines. If national recommendations continue to insist that such drugs be provided, clearer guidance is also required on how to disinvest from existing services at a regional level to reduce further examples of inequality in access in the future.

In search of a common currency: a comparison of seven EQ-5D-5L value sets. Health Economics [PubMed] Published 24th October 2017

For those of us out there who like a good valuation study, you will need to set yourself aside a good piece of time to work your way through this one. The new EQ-5D-5L measure of health status, with a primary purpose of generating quality-adjusted life years (QALYs) for economic evaluations, is now starting to have valuation studies emerging from different countries, whereby the relative importance of each of the measure dimensions and levels are quantified based on general population preferences. This study offers the first comparison of value sets across seven countries: 3 Western European (England, Netherlands, Spain), 1 North American (Canada), 1 South American (Uruguay), and two East Asian (Japan and South Korea). The authors in this paper aim to describe methodological differences between the seven value sets, compare the relative importance of dimensions, level decrements and scale length (i.e. quality/quantity trade-offs for QALYs), as well as developing a common (Western) currency across four of the value sets. In brief summary, there does appear to be similar trends across the three Western European countries: level decrements from levels 3 to 4 have the largest value, followed by levels 1 to 2. There is also a pattern in these three countries’ dimensions, whereby the two “symptom” dimensions (i.e. pain/discomfort, anxiety/depression) have equal importance to the other three “functioning” dimensions (i.e. mobility, self-care and usual activities). There are also clear differences with the other four value sets. Canada, although it also has the highest level decrements between levels 3 and 4 (49%), unusually has equal decrements for the remainder (17% x 3). For the other three countries, greater weight is attached to the three functioning dimensions relative to the two symptom dimensions. Although South Korea also has the greatest level decrements between level 3 and 4, it was greatest between level 4 and level 5 in Uruguay and levels 1 and 2 in Japan. Although the authors give a number of plausible reasons as to why these differences may occur, less justification is given in the choice of the four value sets they offer as a common currency, beyond the need to have a value set for countries that do not have one already. The most in-common value sets were the three Western European countries, so a Western European value set may have been more appropriate if the criterion was to have comparable values across countries. If the aim was really for a more international common currency, there are issues with the exclusion of non-Western countries’ value sets from their common currency version. Surely differences across cultures should be reflected in a common currency if they are apparent in different cultures and settings. A common currency should also have a better spread of regions geographically, with no country from Africa, the Middle East, Central and South Asia represented in this study, as well as no lower- and middle-income countries. Though this final criticism is out of the control of the authors based on current data availability.

Quantifying the relationship between capability and health in older people: can’t map, won’t map. Medical Decision Making [PubMed] Published 23rd October 2017

The EQ-5D is one of many ways quality of life can be measured within economic evaluations. A more recent way based on Amartya Sen’s capability approach has attempted to develop outcome measures that move beyond health-related aspects of quality of life captured by EQ-5D and similar measures used in the generation of QALYs. This study examines the relationship between the EQ-5D and the ICECAP-O capability measure in three different patient populations included in the Medical Crises in Older People programme in England. The authors propose a reasonable hypothesis that health could be considered a conversion factor for a person’s broader capability set, and so it is plausible to test how well the EQ-5D-3L dimension values and overall score can map onto the ICECAP-O overall score. Through numerous regressions performed, the strongest relationship between the two measures in this sample was an R-squared of 0.35. Interestingly, the dimensions on the EQ-5D that had a significant relationship with the ICECAP-O score were a mix of dimensions with a focus on functioning (i.e. self-care, usual activities) and symptoms (anxiety/depression), so overall capability on ICECAP-O appears to be related, at least to a small degree, to both health components of EQ-5D discussed in this round-up’s previous paper. The authors suggest it provides further evidence of the complementary data provided by EQ-5D and ICECAP-O, but the causal relationship, as the authors suggest, between both measures remains under-researched. Longitudinal data analysis would provide a more definitive answer to the question of how much interaction there is between these two measures and their dimensions as health and capability changes over time in response to different treatments and care provision.

Credits

 

Chris Sampson’s journal round-up for 23rd October 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

What is the evidence from past National Institute of Health and Care Excellence single-technology appraisals regarding company submissions with base-case incremental cost-effectiveness ratios of less than £10,000/QALY? Value in Health Published 18th October 2017

NICE have been looking into diversifying their HTA processes of late. One of the newly proposed rules is that technologies with a base-case ICER estimate of less than £10,000 per QALY should be eligible for a fast-track appraisal, so that patients can benefit as early as possible from a therapy that does not pose a great risk of wasting NHS resources. But what have NICE been doing up to this point for such technologies? For this study, the researchers analysed content from all NICE single technology appraisals (STAs) between 2009 and 2016, of which there were 171 with final reports available that reported a base-case ICER. 15% (26) of the STAs reported all base-case ICERs to be below £10,000, and of these 73% (19) received a positive recommendation at the first appraisal committee meeting. A key finding is that 7 of the 26 received a ‘Minded No’ judgment in the first instance due in part to inadequate evidence and – though all got a positive decision in the end – some recommendations were restricted to subgroups. The authors also had a look at STAs with base-case ICERs up to £15,000, of which there were 5 more. All of these received a positive recommendation at the first appraisal committee meeting. Another group of (28) STAs reported multiple ICERs that included estimates both below and above £10,000. These tell a different story. Only 13 received an unrestricted positive recommendation at the first appraisal committee. Positive recommendations eventually followed for all 28, but 7 were on the basis of patient access schemes. There are a few things to consider in light of these findings. It may not be possible for NICE to adequately fast-track some sub-£10k submissions because the ICERs are not estimated on the basis of appropriate comparisons, or because the evidence is otherwise inadequate. But there may be good grounds for extending the fast-track threshold to £15,000. The study also highlights some indicators of complexity (such as the availability of patient access scheme discounts) that might be used as a basis for excluding submissions from the fast-track process.

EQ-5D-5L versus EQ-5D-3L: the impact on cost-effectiveness in the United Kingdom. Value in Health Published 18th October 2017

Despite some protest from NICE, most UK health economists working on trial-based economic evaluations are probably getting on with using the new EQ-5D-5L (and associated value set) over its 3L predecessor. This shift could bring important changes to the distribution of cost-effectiveness results for evaluated technologies. In this study, the researchers sought to identify what these changes might be, by examining a couple of datasets which included both 3L and 5L response data. One dataset was produced by the EuroQol group, with 3,551 individuals from across Europe with a range of health states, and the other was a North American dataset collected from 5,205 patients with rheumatoid disease, which switched from 3L to 5L with a wave of overlap. The analysis employs a previously developed method with a series of ordinal regressions, in which 3L-5L pairs are predicted using a copula approach. The first thing to note is that there was variation in the distribution of responses between the different dimensions and between the two datasets, and so a variety of model specifications are needed. To investigate the implications of using the 5L instead of the 3L, the authors considered 9 cost-effectiveness analysis case studies. The 9 studies reported 13 comparisons. In almost all cases where 3L was replaced with the 5L, the intervention resulted in a smaller QALY gain and higher ICER. The only study in which use of the 5L increased the incremental QALYs was one in which life extension was the key driver of QALY gains. Generally speaking, use of the 5L increases index values and reduces the range, so quality of life improvements are ‘more difficult’ to achieve, while life extension is relatively more valuable than on the 3L. Several technologies move from being clearly cost-effective within NICE’s £20,000-£30,000 threshold to being borderline cases. Different technologies for different diseases will be impacted differently by the move from the 3L to the 5L. So while we should probably still start using the 5L and its value set (because it’s methodologically superior), we mustn’t forget how different our findings might be in comparison to our old ways.

Experience-based utility and own health state valuation for a health state classification system: why and how to do it. The European Journal of Health Economics [PubMedPublished 11th October 2017

There’s debate around whose values we ought to be using to estimate QALYs when making resource allocation decisions. Generally we use societal values, but some researchers think we should be using values from people actually in those health states. I’ve written before about some of the problems with this debate. In this study, the authors try to bring some clarity to the discussion. Four types of values are considered, defined by two distinctions: hypothetical vs own current state and general public vs patient values. The notion of experienced utility is introduced and the authors explain why this cannot be captured by (for example) a TTO exercise, because such exercises require hypothetical future scenarios of health improvement. Thus, the preferred terminology becomes ‘own health state valuation’. The authors summarise some of the research that has sought to compare the 4 types of values specified, highlighting that own health state valuations tend to give higher values associated with dysfunctional health states than do general population hypothetical valuations. The main point is that valuations can differ systematically according to whose values are being elicited. The authors describe some reasons why these values may differ. These could include i) poor descriptions of hypothetical states, ii) changing internal standards (e.g. response shift), and iii) adaptation. Next, the authors consider how to go about collecting own health state values. Two key challenges are specified: i) respondents may be unwilling where questions are complex or intrusive, and ii) there may be ethical concerns, particular where people are in terminal conditions. It is therefore difficult to sample for all possible health states. Selection bias may also rear its head. The tendency for more mild health states to be observed creates problems for the econometricians trying to model value sets. The authors propose some ways forward for identifying own health state value sets. One way would be to purposively sample EQ-5D health states from people representative within the states. However, some states are rarely observed, so we’d be looking at screening millions of people to identify the necessary participants from a general survey. So the authors suggest targeting people via other methods. Though this may still prove very difficult. A more effective (and favourable) approach – the authors suggest – could be to try and obtain better informed general population values. This could involve improving descriptive systems and encouraging deliberation. Evidence suggests that this can reduce the discrepancy between hypothetical and own state valuations. In particular, the authors recommend the use of citizens’ juries and multi-criteria decision analysis. This isn’t something we see being done in the literature, and so may be a fruitful avenue for future research.

Credits

Journal Club Briefing: Dolan and Kahneman (2008)

Today’s Journal Club Briefing comes from the Academic Unit of Health Economics at the University of Leeds. At their journal club on 2nd August 2017, they discussed Dolan and Kahneman’s 2008 article from The Economic Journal: ‘Interpretations of utility and their implications for the valuation of health‘. If you’ve discussed an article at a recent journal club meeting at your own institution and would like to write a briefing for the blog, get in touch.

Why this paper?

Dolan and Kahneman (2008) is a paper which was published nearly ten years ago, was written several years before that, and was not published in a health-related journal. It’s hence, at first sight, a slightly curious choice for a health economics journal club. However, it raises issues which are at the heart of health economics practice. The questions raised by this article have not as yet been answered, and don’t look likely to be answered anytime soon.

Summary

Experienced vs. decision utility

The article’s point of departure is the distinction between experienced utility and decision utility, often a source of fruitful research in behavioural economics. Experienced utility is utility in the Benthamite sense, meaning the hedonic experience in the current moment: the pleasure and/or pain felt by a person at any given point in time. Decision utility is utility as taught in undergraduate economics textbooks: an objective function which the individual dispassionately acts to maximise. In the neoclassical framework of said undergraduate textbooks, this is a distinction without a difference. The individual correctly forecasts the expected flow of experienced utility given the available information and her actions, forms a decision utility function from it and acts to maximise it.

However, Thaler and Sunstein wouldn’t have sold as many books if things were so simple. Many systematic and significant instances of divergences between experienced and decision utility have been well documented, and several people (including one of the authors of this paper) have won Nobel prizes for it. The one which this article focuses on is adaptation.

Adaptation

The authors summarise a large body of evidence that shows that individuals suffer a large loss of utility after a traumatic event (e.g. the loss of a limb or loss of function), but that for many conditions they will adapt to their new situation and recover much of their utility loss. After as little as a year, their valuation of their health is very similar to that of the general population. Furthermore, the authors precis various studies which show that individuals routinely underestimate drastically the amount of adaptation that would occur should such a traumatic event befall them.

This improvement over time in the health-related utility experienced by people with many conditions is partly due to hedonic adaptation – the internal scale of pleasure/pain re-calibrates to their new situation – and partly due to behavioural change, such as finding new pastimes to replace those ruled out by their condition. While the causes of adaptation are fascinating, the focus here is not on the mechanisms behind it, but rather on the consequences for measuring utility and the implications for resource allocation.

Health valuation and adaptation

The methods health economists use to evaluate the utility of being in a given health state, such as time trade-off, standard gamble or discrete choice experiments, will tend to elicit decision utility. They are based on choices between hypothetical states and so will not capture the changes in experienced utility due to adaptation. Thus valuations of health states from the general public will tend to be lower than the valuations from people actually living in the health state.

At first glance, the consequences for resource allocation may not appear to be particularly severe. It may lead to more resources being devoted to healthcare as a whole (at least for life-improving treatments – life-extending treatments are a different case), but the overall healthcare budget is in practice largely a political decision. However, it will not lead to distortions between treatments for alternative conditions.

Yet adaptation is not a universal phenomenon. There are conditions for which little or no adaptation is seen (for example unexplained pain), and when it occurs, it occurs at different speeds and to differing extents for different conditions. The authors show that valuations of conditions with a greater initial utility loss are lower than conditions with a lesser initial loss but a lower degree of adaptation, and thus will receive a greater level of resources, despite the sum of experienced utility being the same for both. The authors argue that this is unfair, and that health economists should update their practices to better capture experienced utility.

Public vs. patient preference

A common argument in favour of the status quo is that (in many countries at least) it is public resources which are being allocated, and thus it is public preferences which should be respected. It appears legitimate to allocate resources to assuage public fears of health states, even if those health states are worse in their imagination than in reality. The authors consider this argument and reply that, in this case, the instruments of health economists are still not fit for purpose. General measures of health states, such as EQ-5D, go out of their way to describe states in abstract terms and to separate them from causes, such as cancer, which may carry an emotional affect. It cannot be argued that public valuations are justified because resources should be allocated according to public fears if the measurement of valuation deliberately tries not to elicit those fears.

The argument that adaptation causes serious problems for valuing health and for allocation of health resources is a persuasive one. It is undoubtedly true that changes in utility over time, and other violations of the neoclassical economic paradigm such as reference dependence, do not presently receive sufficient attention in health economics and policy decisions in general.

Discussion

Which yardstick?

Despite the stimulating discussion and the overall brilliance of the paper, there are some elements which can be challenged. One of them is that throughout, the authors’ arguments and recommendations are made from the standpoint that the sum over time of the flow of experienced utility from a health state is to be used as the sole measure of value. This would consist in what one of the authors calls the day reconstruction method (DRM) which consists in rating a range of feelings including happiness, worry, and frustration.

Despite the acknowledgement of some philosophical difficulties, the sum of the flow of experienced utility is treated as if it is the only true yardstick with which to measure health, without a convincing justification and no discussion on the qualitative aspect of the measurement as opposed to a truly cardinal measure of health allowing ranking of individuals’ health states.

Public vs. private preferences revisited

The authors raise the question of whether current practice can be justified by a desire to soothe public fears, and dismiss it since the elicitation tools are not suitable. However, they do not address the question of whether allocating public resources according to the public’s (incorrect) fears of given diseases or health states could be a legitimate health policy aim. One could imagine, for example, a discrete choice experiment eliciting how much the general public dreads cancer over other diseases, and make an argument that the welfare of the public is improved by allocating resources based on these results. There are myriad problems with such an approach, of course, but there seem to be no fewer problems with alternative approaches.

Intertemporal welfare

Intertemporal welfare judgements are notoriously difficult once the exponential discounting framework is left. It seems just as legitimate to base valuations on the ex post judgement of individuals who have fully adjusted to a health state as on an integration of past feelings, most of which are now distant memories. Most people would agree that the time to value their experience of a marathon is after completing it, not during the twenty-fifth mile or at the start line.

Indeed, this appears to be the position tacitly taken elsewhere by Kahneman in his work on the peak-end rule. In Redelmeier et al. (2003), it was found that the retrospective rating of the pain of a colonoscopy was based almost exclusively on the peak intensity of pain and on the pain felt at the end. Thus procedures which were extended by an extra three minutes were remembered as less painful than standard procedures, even though the total pain experienced was greater. Furthermore, those who underwent the extended procedure were more likely to state they would undergo it again. It would seem strange, in this case, to judge them as worse off.

Schelling (1984) ends his superlative discussion of the problems of intertemporal decision making with the following thought experiment. Just as with valuing health, there are no easy answers.

[S]ome anesthetics block transmission of the nervous impulses that constitute pain; others have the characteristic that the patient responds to the pain as if feeling it fully but has utterly no recollection afterwards. One of these is sodium pentothal. In my imaginary experiment we wish to distinguish the effects of the drug from the effects of the unremembered pain, and we want a healthy control subject in parallel with some painful operations that will be performed with the help of this drug. For a handsome fee you will be knocked out for an hour or two, allowed to sleep it off, then tested before you go home. You do this regularly, and one afternoon you walk into the lab a little early and find the experimenters viewing some videotape. On the screen is an experimental subject writhing, and though the audio is turned down the shrieks are unmistakably those of a person in pain. When the pain stops the victim pleads, “Don’t ever do that again. Please.”

The person is you.

Do you care?

Do you walk into your booth, lie on the couch, and hold out your arm for today’s injection?

Should I let you?

Credits