Alastair Canaway’s journal round-up for 27th November 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Elevated mortality among weekend hospital admissions is not associated with adoption of seven day clinical standards. Emergency Medicine Journal [PubMedPublished 8th November 2017

Our esteemed colleagues in Manchester brought more evidence to the seven-day NHS debate (debacle?). Patients who are admitted to hospital in an emergency at weekends have higher mortality rates than those during the week. Despite what our Secretary of State will have you believe, there is an increasing body of evidence suggesting that once case-mix is adequately adjusted for, the ‘weekend effect’ becomes negligible. This paper takes a slightly different angle for examining the same phenomenon. It harnesses the introduction of four priority clinical standards in England, which aim to reduce the number of deaths associated with the weekend effect. These are time to first consultant review; access to diagnostics; access to consultant-directed interventions; and on-going consultant review. The study uses publicly available data on the performance of NHS Trusts in relation to these four priority clinical standards. For the latest financial year (2015/16), Trusts’ weekend effect odds ratios were compared to their achievement against the four clinical standards. Data were available for 123 Trusts. The authors found that adoption of the four clinical standards was not associated with the extent to which mortality was elevated for patients admitted at the weekend. Furthermore, they found no association between the Trusts’ performance against any of the four standards and the magnitude of the weekend effect. The authors offer three reasons as to why this may be the case. First, data quality could be poor, second, it could be that the standards themselves are inadequate for reducing mortality, finally, it could be that the weekend effect in terms of mortality may be the wrong metric by which to judge the benefits of a seven-day service. They note that their previous research demonstrated that the weekend effect is driven by admission volumes at the weekend rather than the number of deaths, so it will not be impacted by care provision, and this is consistent with the findings in this study. The spectre of opportunity cost looms over the implementation of these standards; although no direct harm may arise from the introduction of these standards, resources will be diverted away from potentially more beneficial alternatives, this is a serious concern. The seven-day debate continues.

The effect of level overlap and color coding on attribute non-attendance in discrete choice experiments. Value in Health Published 16th November 2017

I think discrete choice experiments (DCE) are difficult to complete. That may be due to me not being the sharpest knife in the drawer, or it could be due to the nature of DCEs, or a bit of both. For this reason, I like best-worst scaling (BWS). BWS aside, DCEs are a common tool used in health economics research to assess and understand preferences. Given the difficulty of DCEs, people often resort to heuristics, that is, respondents often simplify choice tasks by taking shortcuts, e.g. ignoring one or more attribute (attribute non-attendance) or always selecting the option with the highest level of a certain attribute. This has downstream consequences leading to bias within preference estimates. Furthermore, difficulty with comprehension leads to high attrition rates. This RCT sought to examine whether participant dropout and attribute non-attendance could be reduced through two methods: level overlap, and colour coding. Level overlap refers to the DCE design whereby in each choice task a certain number of attributes are presented with the same level; in different choice tasks different attributes are overlapped. The idea of this is to prevent dominant attribute strategies whereby participants always choose the option with the highest level of one specific attribute and forces them to evaluate all attributes. The second method involves colour coding and the provision of other visual cues to reduce task complexity, e.g. colour coding levels to make it easy to see which levels are equal. There were five trial arms. The control arm featured no colour coding and no attribute overlap. The other four arms featured either colour coding (two different types were tested), attribute overlap, or a combination of them both. A nationally (Dutch) representative sample in relation to age, gender, education and geographic region were recruited online. In total 3394 respondents were recruited and each arm contained over 500 respondents. Familiarisation and warm-up questions were followed by 21 pairwise choice tasks in a randomised order. For the control arm (no overlap, no colour coding) 13.9% dropped out whilst only attending to on average 2.1 out of the five attributes. Colour coding reduced this to 9.6% with 2.8 attributes being attended. Combining level overlap with intensity colour coding reduced drop out further to 7.2% whilst increasing attribute attendance to four out of five. Thus, the combination of level overlap and colour coding nearly halved the dropout and doubled the attribute attendance within the DCE task. An additional, and perhaps most important benefit of the improvement in attribute attendance is that it reduces the need to model for potential attribute non-attendance post-hoc. Given the difficult of DCE completion, it seems colour coding in combination with level overlap should be implored for future DCE tasks.

Evidence on the longitudinal construct validity of major generic and utility measures of health-related quality of life in teens with depression. Quality of Life Research [PubMed] Published 17th November 2017

There appears to be increasing recognition of the prevalence and seriousness of youth mental health problems. Nearly 20% of young people will suffer depression during their adolescent years. To facilitate cost-utility analysis it is necessary to have a measure of preference based health-related quality of life (HRQL). However, there are few measures designed for use in adolescents. This study sought to examine various existing HRQL measures in relation to their responsiveness for the evaluation of interventions targeting depression in young people. This builds on previous work conducted by Brazier et al that found the EQ-5D and SF-6D performed adequately for depression in adults. In total 392 adolescents aged between 13 and 17 years joined the study, 376 of these completed follow up assessments. Assessments were taken at baseline and 12 weeks. The justification for 12 weeks is that it represented the modal time to clinical change. The following utility instruments were included: the HUI suite, the EQ-5D-3L, Quality of Well-Being Scale (QWB), and the SF-6D (derived from SF-36). Other non-preference based HRQL measures were also included: disease-specific ratings and scales, and the PedsQL 4.0. All (yes, you read that correctly) measures were found to be responsive to change in depression symptomology over the 12-week follow up period and each of the multi-attribute utility instruments was able to detect clinically meaningful change. In terms of comparing the utility instruments, the HUI-3, the QWB and the SF-6D were the most responsive whilst the EQ-5D-3L was the least responsive. In summary, any of the utility instruments could be used. One area of disappointment for me was that the CHU-9D was not included within this study – it’s one of the few instruments that has been developed by and for children and would have very much been a worthy addition. Regardless, this is an informative study for those of us working within the youth mental health sphere.

Credits

Paul Mitchell’s journal round-up for 6th November 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A longitudinal study to assess the frequency and cost of antivascular endothelial therapy, and inequalities in access, in England between 2005 and 2015. BMJ Open [PubMed] Published 22nd October 2017

I am breaking one of my unwritten rules in a journal paper round-up by talking about colleagues’ work, but I feel it is too important not to provide a summary for a number of reasons. The study highlights the problems faced by regional healthcare purchasers in England when implementing national guideline recommendations on the cost-effectiveness of new treatments. The paper focuses on anti-vascular endothelial growth factor (anti-VEGF) medicines in particular, with two drugs, ranibizumab and aflibercept, offered to patients with a range of eye conditions, costing £550-800 per injection. Another drug, bevacizumab, that is closely related to ranibizumab and performs similarly in trials, could be provided at a fraction of the cost (£50-100 per injection), but it is currently unlicensed for eye conditions in the UK. This study investigates how the regional areas in England have coped with trying to provide the recommended drugs using administrative data from Hospital Episode Statistics in England between 2005-2015 by tracking their use since they have been recommended for a number of different eye conditions over the past decade. In 2014/15 the cost of these two new drugs for treating eye conditions alone was estimated at £447 million nationally. The distribution of where these drugs are provided is not equal, varying widely across regions after controlling for socio-demographics, suggesting an inequality of access associated with the introduction of these high-cost drugs over the past decade at a time of relatively low growth in national health spending. Although there are limitations associated with using data not intended for research purposes, the study shows how the most can be made from data routinely collected for non-research purposes. On a public policy level, it raises questions over the provision of such high-cost drugs, for which the authors state the NHS are currently paying more for than US insurers. Although it is important to be careful when comparing to unlicensed drugs, the authors point to clear evidence in the paper as to why their comparison is a reasonable one in this scenario, with a large opportunity cost associated with not including this option in national guidelines. If national recommendations continue to insist that such drugs be provided, clearer guidance is also required on how to disinvest from existing services at a regional level to reduce further examples of inequality in access in the future.

In search of a common currency: a comparison of seven EQ-5D-5L value sets. Health Economics [PubMed] Published 24th October 2017

For those of us out there who like a good valuation study, you will need to set yourself aside a good piece of time to work your way through this one. The new EQ-5D-5L measure of health status, with a primary purpose of generating quality-adjusted life years (QALYs) for economic evaluations, is now starting to have valuation studies emerging from different countries, whereby the relative importance of each of the measure dimensions and levels are quantified based on general population preferences. This study offers the first comparison of value sets across seven countries: 3 Western European (England, Netherlands, Spain), 1 North American (Canada), 1 South American (Uruguay), and two East Asian (Japan and South Korea). The authors in this paper aim to describe methodological differences between the seven value sets, compare the relative importance of dimensions, level decrements and scale length (i.e. quality/quantity trade-offs for QALYs), as well as developing a common (Western) currency across four of the value sets. In brief summary, there does appear to be similar trends across the three Western European countries: level decrements from levels 3 to 4 have the largest value, followed by levels 1 to 2. There is also a pattern in these three countries’ dimensions, whereby the two “symptom” dimensions (i.e. pain/discomfort, anxiety/depression) have equal importance to the other three “functioning” dimensions (i.e. mobility, self-care and usual activities). There are also clear differences with the other four value sets. Canada, although it also has the highest level decrements between levels 3 and 4 (49%), unusually has equal decrements for the remainder (17% x 3). For the other three countries, greater weight is attached to the three functioning dimensions relative to the two symptom dimensions. Although South Korea also has the greatest level decrements between level 3 and 4, it was greatest between level 4 and level 5 in Uruguay and levels 1 and 2 in Japan. Although the authors give a number of plausible reasons as to why these differences may occur, less justification is given in the choice of the four value sets they offer as a common currency, beyond the need to have a value set for countries that do not have one already. The most in-common value sets were the three Western European countries, so a Western European value set may have been more appropriate if the criterion was to have comparable values across countries. If the aim was really for a more international common currency, there are issues with the exclusion of non-Western countries’ value sets from their common currency version. Surely differences across cultures should be reflected in a common currency if they are apparent in different cultures and settings. A common currency should also have a better spread of regions geographically, with no country from Africa, the Middle East, Central and South Asia represented in this study, as well as no lower- and middle-income countries. Though this final criticism is out of the control of the authors based on current data availability.

Quantifying the relationship between capability and health in older people: can’t map, won’t map. Medical Decision Making [PubMed] Published 23rd October 2017

The EQ-5D is one of many ways quality of life can be measured within economic evaluations. A more recent way based on Amartya Sen’s capability approach has attempted to develop outcome measures that move beyond health-related aspects of quality of life captured by EQ-5D and similar measures used in the generation of QALYs. This study examines the relationship between the EQ-5D and the ICECAP-O capability measure in three different patient populations included in the Medical Crises in Older People programme in England. The authors propose a reasonable hypothesis that health could be considered a conversion factor for a person’s broader capability set, and so it is plausible to test how well the EQ-5D-3L dimension values and overall score can map onto the ICECAP-O overall score. Through numerous regressions performed, the strongest relationship between the two measures in this sample was an R-squared of 0.35. Interestingly, the dimensions on the EQ-5D that had a significant relationship with the ICECAP-O score were a mix of dimensions with a focus on functioning (i.e. self-care, usual activities) and symptoms (anxiety/depression), so overall capability on ICECAP-O appears to be related, at least to a small degree, to both health components of EQ-5D discussed in this round-up’s previous paper. The authors suggest it provides further evidence of the complementary data provided by EQ-5D and ICECAP-O, but the causal relationship, as the authors suggest, between both measures remains under-researched. Longitudinal data analysis would provide a more definitive answer to the question of how much interaction there is between these two measures and their dimensions as health and capability changes over time in response to different treatments and care provision.

Credits

 

Chris Sampson’s journal round-up for 23rd October 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

What is the evidence from past National Institute of Health and Care Excellence single-technology appraisals regarding company submissions with base-case incremental cost-effectiveness ratios of less than £10,000/QALY? Value in Health Published 18th October 2017

NICE have been looking into diversifying their HTA processes of late. One of the newly proposed rules is that technologies with a base-case ICER estimate of less than £10,000 per QALY should be eligible for a fast-track appraisal, so that patients can benefit as early as possible from a therapy that does not pose a great risk of wasting NHS resources. But what have NICE been doing up to this point for such technologies? For this study, the researchers analysed content from all NICE single technology appraisals (STAs) between 2009 and 2016, of which there were 171 with final reports available that reported a base-case ICER. 15% (26) of the STAs reported all base-case ICERs to be below £10,000, and of these 73% (19) received a positive recommendation at the first appraisal committee meeting. A key finding is that 7 of the 26 received a ‘Minded No’ judgment in the first instance due in part to inadequate evidence and – though all got a positive decision in the end – some recommendations were restricted to subgroups. The authors also had a look at STAs with base-case ICERs up to £15,000, of which there were 5 more. All of these received a positive recommendation at the first appraisal committee meeting. Another group of (28) STAs reported multiple ICERs that included estimates both below and above £10,000. These tell a different story. Only 13 received an unrestricted positive recommendation at the first appraisal committee. Positive recommendations eventually followed for all 28, but 7 were on the basis of patient access schemes. There are a few things to consider in light of these findings. It may not be possible for NICE to adequately fast-track some sub-£10k submissions because the ICERs are not estimated on the basis of appropriate comparisons, or because the evidence is otherwise inadequate. But there may be good grounds for extending the fast-track threshold to £15,000. The study also highlights some indicators of complexity (such as the availability of patient access scheme discounts) that might be used as a basis for excluding submissions from the fast-track process.

EQ-5D-5L versus EQ-5D-3L: the impact on cost-effectiveness in the United Kingdom. Value in Health Published 18th October 2017

Despite some protest from NICE, most UK health economists working on trial-based economic evaluations are probably getting on with using the new EQ-5D-5L (and associated value set) over its 3L predecessor. This shift could bring important changes to the distribution of cost-effectiveness results for evaluated technologies. In this study, the researchers sought to identify what these changes might be, by examining a couple of datasets which included both 3L and 5L response data. One dataset was produced by the EuroQol group, with 3,551 individuals from across Europe with a range of health states, and the other was a North American dataset collected from 5,205 patients with rheumatoid disease, which switched from 3L to 5L with a wave of overlap. The analysis employs a previously developed method with a series of ordinal regressions, in which 3L-5L pairs are predicted using a copula approach. The first thing to note is that there was variation in the distribution of responses between the different dimensions and between the two datasets, and so a variety of model specifications are needed. To investigate the implications of using the 5L instead of the 3L, the authors considered 9 cost-effectiveness analysis case studies. The 9 studies reported 13 comparisons. In almost all cases where 3L was replaced with the 5L, the intervention resulted in a smaller QALY gain and higher ICER. The only study in which use of the 5L increased the incremental QALYs was one in which life extension was the key driver of QALY gains. Generally speaking, use of the 5L increases index values and reduces the range, so quality of life improvements are ‘more difficult’ to achieve, while life extension is relatively more valuable than on the 3L. Several technologies move from being clearly cost-effective within NICE’s £20,000-£30,000 threshold to being borderline cases. Different technologies for different diseases will be impacted differently by the move from the 3L to the 5L. So while we should probably still start using the 5L and its value set (because it’s methodologically superior), we mustn’t forget how different our findings might be in comparison to our old ways.

Experience-based utility and own health state valuation for a health state classification system: why and how to do it. The European Journal of Health Economics [PubMedPublished 11th October 2017

There’s debate around whose values we ought to be using to estimate QALYs when making resource allocation decisions. Generally we use societal values, but some researchers think we should be using values from people actually in those health states. I’ve written before about some of the problems with this debate. In this study, the authors try to bring some clarity to the discussion. Four types of values are considered, defined by two distinctions: hypothetical vs own current state and general public vs patient values. The notion of experienced utility is introduced and the authors explain why this cannot be captured by (for example) a TTO exercise, because such exercises require hypothetical future scenarios of health improvement. Thus, the preferred terminology becomes ‘own health state valuation’. The authors summarise some of the research that has sought to compare the 4 types of values specified, highlighting that own health state valuations tend to give higher values associated with dysfunctional health states than do general population hypothetical valuations. The main point is that valuations can differ systematically according to whose values are being elicited. The authors describe some reasons why these values may differ. These could include i) poor descriptions of hypothetical states, ii) changing internal standards (e.g. response shift), and iii) adaptation. Next, the authors consider how to go about collecting own health state values. Two key challenges are specified: i) respondents may be unwilling where questions are complex or intrusive, and ii) there may be ethical concerns, particular where people are in terminal conditions. It is therefore difficult to sample for all possible health states. Selection bias may also rear its head. The tendency for more mild health states to be observed creates problems for the econometricians trying to model value sets. The authors propose some ways forward for identifying own health state value sets. One way would be to purposively sample EQ-5D health states from people representative within the states. However, some states are rarely observed, so we’d be looking at screening millions of people to identify the necessary participants from a general survey. So the authors suggest targeting people via other methods. Though this may still prove very difficult. A more effective (and favourable) approach – the authors suggest – could be to try and obtain better informed general population values. This could involve improving descriptive systems and encouraging deliberation. Evidence suggests that this can reduce the discrepancy between hypothetical and own state valuations. In particular, the authors recommend the use of citizens’ juries and multi-criteria decision analysis. This isn’t something we see being done in the literature, and so may be a fruitful avenue for future research.

Credits