Simon McNamara’s journal round-up for 24th June 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Manipulating the 5 dimensions of the EuroQoL instrument: the effects on self-reporting actual health and valuing hypothetical health states. Medical Decision Making [PubMed] Published 4th June 2019

EQ-5D is the Rocky Balboa of health economics. A left-hook here, a jab there, vicious undercuts straight to the chin – it takes the hits, it never stays down. Every man and his dog is ganging up on it, yet, it still stands, proudly resolute in its undefeated record.

When you are the champ” it thinks to itself, “everyone wants a piece you”. The door opens. Out the darkness emerges four mysterious figures. “No… not…”, the instrument stumbles over its words. A bead of sweat rolls slowly down its glistening forehead. Its thumping heartbeat pierces the silence like a drum being thrashed by spear-wielding members of an ancient tribe. “It can’t beNo.” A clear, precise, voice emerges from the darkness, “taken at face value” it states, “our results suggest that economic evaluations that use EQ-5D-5L are systematically biased.” EQ-5D stares blankly, its pupils dilated. It responds, “I’ve been waiting for you”. The gloom clears. Tsuchiya et al (2019) stand there proudly: “bring it on… punk”.

The first paper in this week’s round-up is a surgical probing of a sample of potential issues with EQ-5D. Whilst the above paragraph contains a fair amount of poetic license (read: this is the product of an author who would rather be writing dystopian health-economics short stories than doing their actual work), this paper by Tsuchiya et al. does seems to land a number of strong blows squarely on the chin of EQ-5D. The authors employ a large discrete choice experiment (n=2,494 members of the UK general public), in order to explore the impact of three issues on the way people both report and value health. Specifically: (1) the order the five dimensions are presented; (2) the use of composite dimensions (dimensions that pool two things – e.g. pain or discomfort) rather than separate dimensions; (3) “bolting-off” domains (the reverse of a bolt-on: removing domains from the EQ-5D).

If you are interested in these issues, I suggest you read the paper in full. In brief, the authors find that splitting anxiety/depression into two dimensions had a significant effect on the way people reported their health; that splitting level 5 of the pain/discomfort and anxiety/depression dimensions (e.g. I have extreme pain or discomfort) into individual dimensions significantly impacted the way people valued health; and, that “bolting off” dimensions impacted valuation of the remaining dimensions. Personally, I think the composite domain findings are most interesting here. The authors find that that extreme pain/discomfort is perceived as being a more severe state than extreme discomfort alone, and similarly, that being extremely depressed/anxious is perceived as a more severe state than simply being extremely anxious. The authors suggest this means the EQ-5D-5L may be systematically biased, as an individual who reports extreme discomfort (or anxiety) will have their health state valued based upon the composite domains for each of these, and subsequently have the severity of their health-state over-estimated.

I like this paper, and think it has a lot to contribute to the refinement of EQ-5D, and the development of new instruments. I suggest the champ uses Tsuchiya et al as a sparring partner, gets back to the gym and works on some new moves – I sense a training montage coming on.

Methods for public health economic evaluation: A Delphi survey of decision makers in English and Welsh local government. Health Economics [PubMed] Published 7th June 2019

Imagine the government in your local city is considering a major new public health initiative. Politicians plan to destroy a number of out of date social housing blocks in deprived communities, and building 10,000 new high-quality homes in their place. This will cost a significant amount of money and, as a result, you have been asked to do an economic evaluation of this intervention. How would you go about doing this?

This is clearly a complicated task. You are unlikely to find a randomised controlled trial on which to base your evaluation, the costs and benefits of the programme are likely to fall on multiple sectors, and you will likely have to balance health gains with a wide range of other non-health outcomes (e.g. reductions in crime). If you somehow managed to model the impact of the intervention perfectly, you would then be faced with the challenge of how to value these benefits. Equally, you would have to consider whether or not to weight the benefits of this programme more highly than programmes in alternative parts of the city, because it benefits people in deprived communities – note that inequalities in health seem to be a much larger issue in public health than in ‘normal health’ (e.g. the bread and butter of health economics evaluation). This complexity, and concern for inequalities, makes public health economic evaluation a completely different beast to traditional economic evaluation. This has led some to question the value of QALY-based cost-utility analysis in public health, and to calls for methods that better meet the needs of the field.  

The second paper in this week’s round-up contributes to the development of these methods, by providing information on what public health decision makers in England and Wales think about different economic evaluation methodologies. The authors fielded an online, two-round, Delphi-panel study featuring 26 to 36 statements (round 1 and 2 respectively). For each statement, participants were asked to rank their level of agreement with the statement on a five-point scale (e.g. 1 = strongly agree and 5 = strongly disagree). In the first round, participants (n=66) simply responded to the statements, and in the second, they (n=29) were presented with the median response from the prior round, and asked to consider their response in light of this feedback. The statements tested covered a wide range of issues, including: the role distributional concerns should play in public health economic evaluation (e.g. economic evaluation should formally weight outcomes by population subgroup); the type of outcomes considered (e.g. economic evidence should use a single outcome that captures length of life and quality of life); and, the budgets to be considered (e.g. economic evaluation should take account of multi-sectoral budgets available).

Interestingly, the decision-makers rejected the idea of focusing solely on maximising outcomes (the current norm for health economic evaluations), and supported placing an equal focus on minimising inequality and maximising outcomes. Furthermore, they supported formal weighting of outcomes by population subgroup, the use of multiple outcomes to capture health, wellbeing and broader outcomes, and failed to support use of a single outcome that captures well-being gain. These findings suggest cost-consequence analysis may provide a better fit to the needs of these decision makers than simply attempting to apply the QALY model in public health – particularly if augmented by some form of multi-criteria decision analysis (MCDA) that can reflect distributional concerns and allow comparison across outcome types. I think this is a great paper and expect to be citing it for years to come.

I AM IMMORTAL. Economic Enquiry [RePEc] Published 16th November 2016

I love this paper. It isn’t a recent one, but it hasn’t been covered in the AHE blog before, and I think everyone should know about it, so – luckily for you – it has made it in to this week’s round-up.

In this groundbreaking work, Riccardo Trezzi fits a series of “state of the art”, complex, econometric models to his own electrocardiogram (ECG) signal – a measure of the electrical function of the heart. He then compares these models, identifies the one that best fits his data, and uses the model to predict his future ECG signal, and subsequently his life expectancy. This provides an astonishing result  – “the n steps ahead forecast remains bounded and well above zero even after one googol period, implying that my life expectancy tends to infinite. I therefore conclude that I am immortal”.

I think this is genius. If you haven’t already realised the point of the paper by the time you have reached this part of my write-up, I suggest you think very carefully about the face-validity of this result. If you still don’t get it after that, have a look at the note on the front page – specifically the bit that says “this paper is intended to be a joke”. If you still don’t get it – the author measured their heart activity for 10 seconds, and then applied lots of complex statistical methods, which (obviously) when extrapolated suggested his heart would keep beating forever, and subsequently that he would live forever.

Whilst the paper is a parody, it makes an important point. If we fit models to data, and attempt to predict the future without considering external evidence, we may well make a hash of that prediction – despite the apparent sophistication of our econometric methods. This is clearly an extreme example, but resonates with me, because this is what many people continue to do when modelling oncology data. This is certainly less prevalent than it was a few years ago, and I expect it will become a thing of the past, but for now, whenever I meet someone who does this, I will be sure to send them a copy of this paper. That being said, as far as I am aware the author is still alive, so maybe he will have the last laugh – perhaps even the last laugh of all of humankind if his model is to be believed.

Credits

Paul Mitchell’s journal round-up for 25th December 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Consensus-based cross-European recommendations for the identification, measurement and valuation of costs in health economic evaluations: a European Delphi study. European Journal of Health Economics [PubMedPublished 19th December 2017

The primary aim of this study was to develop guidelines for costing in economic evaluation studies conducted across more than one European country. The starting point of the societal perspective as the benchmark for costing was not entirely obvious from the abstract, where this broadest approach to costing is not recommended uniformly across all European countries. Recommendations following this starting point looked at the identification, measurement and valuation of resource use, discount rate and discounting of future costs. A three-step Delphi study was used to gain consensus on what should be included in an economic evaluation from a societal perspective, based initially on findings from a review of costing methodologies adopted across European country-specific guidelines. Consensus required at least two thirds (67%) agreement across those participating in the Delphi study at all 3 stages. Where no agreement was reached after the three stages, a panel of four of the co-authors made a final decision on what should be recommended. In total, 26 of the 110 invited to participate completed at least one Delphi round, with all Delphi rounds having at least 16 participants. It remains unclear to me if 16 for a Delphi round is sufficient to reach a European wide consensus on costing methodologies. There were a number of key areas where no consensus was reached (e.g. including costs unrelated to the intervention, measurement of resource use and absenteeism, and valuation of opportunity costs of patient time and informal care), so the four-strong author panel had a leading role on some of the main recommendations. Notwithstanding the limitations associated with the reference perspective taken and sample for the Delphi study and panel, the paper provides a useful illustration of the different approaches to costing across European countries. It also provides a good coverage of costing issues that need to be explained in detail in economic evaluations to allow for clear understanding of methods used and the underpinning rationale for those decisions where a choice is required on the costing methodology applied.

A (five-)level playing field for mental health conditions?: exploratory analysis of EQ-5D-5L derived utility values. Quality of Life Research [PubMedPublished 16th December 2017

The UK health economics community has been reeling from the decision made earlier this year by UK guidelines developer, the National Institute for Health and Care Excellence (NICE), who recommended to not adopt the new population values developed for the EQ-5D-5L version when calculating QALYs and instead rely on a crosswalk of the values developed over 20 years ago for the 3 level EQ-5D version. This paper provides a timely comparison of how these two value sets perform for the EQ-5D-5L descriptive system in patient groups with mental health conditions, groups often thought to be disadvantaged by the physical health functioning focus of the EQ-5D descriptive system. Using baseline data from three trials, the authors find that the new utility values produce a higher mean EQ-5D score of 0.08 compared to the old crosswalk values, with a 0.225 difference for those reporting extreme problems with the anxiety/depression dimension on EQ-5D. Although, the authors of this study highlight using these new values would increase cost per QALY results in this sample using scenario analysis, when improvements are in the depression/anxiety category only, such improvements are relatively better than across the whole EQ-5D-5L descriptive system due to the relative additional value placed on the anxiety/depression dimension in the new values. This paper makes for interesting reading and one that NICE should take into consideration when reviewing their decision on this issue next year. Although I would disagree with the authors when they state that this study would be a primary reason for revising the NICE cost-effectiveness threshold (more compelling arguments for this elsewhere in my view), it does clearly highlight the influence of the choice of descriptive system and the values used in the outcomes produced for economic analysis such as QALYs, even when the two descriptive systems in question (EQ-5D-3L and EQ-5D-5L) are roughly the same.

What characteristics of nursing homes are most valued by customers? A discrete choice experiment with residents and family members. Value in Health Published 1st December 2017

Our final paper for review in 2017 looks at the characteristics that are of most importance to individuals and their family members when it comes to nursing home provision. The authors conducted a valuation exercise using a discrete choice experiment (DCE) to calculate the relative importance of the attributes contained on the Consumer Choice Index-Six Dimension (CCI-6D), a measure developed to assess the quality of nursing home care across 3 levels on six domains: 1. level of time care staff spent with residents; 2. homeliness of shared spaces; 3. homeliness of room setup; 4. access to outside and garden; 5. frequency of meaningful activities; and 6. flexibility with care routines. Those who lived in a nursing home for at least a year with low levels of cognitive impairment completed the DCE themselves, whereas family members were asked to proxy for their close relative with more severe cognitive impairment. 126 residents and 416 family member proxies completed the DCE comparisons of nursing homes with different qualities in these six areas. The results of the DCE show differences in preferences across the two groups. Although similar importance is placed on some dimensions across both groups (i.e. “homeliness of room set up” ranked highly, whereas “frequency of meaningful activities” ranked lower), residents value access to outside and garden four times as much as the family proxies do (second most important dimension for residents, lowest for family proxies), family members value level of time care staff spent with residents twice as much as residents themselves (most important attribute for family proxies, third most important for residents). Although residents in both groups may have important differences in characteristics that might explain some of this difference, it is probably a good time of year to remember family preferences may be inconsistent with individuals within them, so make sure to take account of this variation when preparing those Christmas dinners.

Happy holidays all.

Credits

Chris Sampson’s journal round-up for 11th September 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Core items for a standardized resource use measure (ISRUM): expert Delphi consensus survey. Value in Health Published 1st September 2017

Trial-based collection of resource use data, for the purpose of economic evaluation, is wild. Lots of studies use bespoke questionnaires. Some use off-the-shelf measures, but many of these are altered to suit the context. Validity rarely gets a mention. Some of you may already be aware of this research; I’m sure I’m not the only one here who participated. The aim of the study is to establish a core set of resource use items that should be included in all studies to aid comparability, consistency and validity. The researchers identified a long list of 60 candidate items for inclusion, through a review of 59 resource use instruments. An NHS and personal social services perspective was adopted, and any similar items were merged. This list was constructed into a Delphi survey. Members of the HESG mailing list – as well as 111 other identified experts – were invited to complete the survey, for which there were two rounds. The first round asked participants to rate the importance of including each item in the core set, using a scale from 1 (not important) to 9 (very important). Participants were then asked to select their ‘top 10’. Items survived round 1 if they scored at least 7 with more than 50% of respondents, and less than 3 by no more than 15%, either overall or within two or more participant subgroups. In round 2, participants were presented with the results of round 1 and asked to re-rate 34 remaining items. There was a sample of 45 usable responses in round 1 and 42 in round 2. Comments could also be provided, which were subsequently subject to content analysis. After all was said and done, a meeting was held for final item selection based on the findings, to which some survey participants were invited but only one attended (sorry I couldn’t make it). The final 10 items were: i) hospital admissions, ii) length of stay, iii) outpatient appointments, iv) A&E visits, v) A&E admissions, vi) number of appointments in the community, vii) type of appointments in the community, viii) number of home visits, ix) type of home visits and x) name of medication. The measure isn’t ready to use just yet. There is still research to be conducted to identify the ideal wording for each item. But it looks promising. Hopefully, this work will trigger a whole stream of research to develop bolt-ons in specific contexts for a modular system of resource use measurement. I also think that this work should form the basis of alignment between costing and resource use measurement. Resource use is often collected in a way that is very difficult to ‘map’ onto costs or prices. I’m sure the good folk at the PSSRU are paying attention to this work, and I hope they might help us all out by estimating unit costs for each of the core items (as well as any bolt-ons, once they’re developed). There’s some interesting discussion in the paper about the parallels between this work and the development of core outcome sets. Maybe analysis of resource use can be as interesting as the analysis of quality of life outcomes.

A call for open-source cost-effectiveness analysis. Annals of Internal Medicine [PubMed] Published 29th August 2017

Yes, this paper is behind a paywall. Yes, it is worth pointing out this irony over and over again until we all start practising what we preach. We’re all guilty; we all need to keep on keeping on at each other. Now, on to the content. The authors argue in favour of making cost-effectiveness analysis (and model-based economic evaluation in particular) open to scrutiny. The key argument is that there is value in transparency, and analogies are drawn with clinical trial reporting and epidemiological studies. This potential additional value is thought to derive from i) easy updating of models with new data and ii) less duplication of efforts. The main challenges are thought to be the need for new infrastructure – technical and regulatory – and preservation of intellectual property. Recently, I discussed similar issues in a call for a model registry. I’m clearly in favour of cost-effectiveness analyses being ‘open source’. My only gripe is that the authors aren’t the first to suggest this, and should have done some homework before publishing this call. Nevertheless, it is good to see this issue being raised in a journal such as Annals of Internal Medicine, which could be an indication that the tide is turning.

Differential item functioning in quality of life measurement: an analysis using anchoring vignettes. Social Science & Medicine [PubMed] [RePEc] Published 26th August 2017

Differential item functioning (DIF) occurs when different groups of people have different interpretations of response categories. For example, in response to an EQ-5D questionnaire, the way that two groups of people understand ‘slight problems in walking about’ might not be the same. If that were the case, the groups wouldn’t be truly comparable. That’s a big problem for resource allocation decisions, which rely on trade-offs between different groups of people. This study uses anchoring vignettes to test for DIF, whereby respondents are asked to rate their own health alongside some health descriptions for hypothetical individuals. The researchers conducted 2 online surveys, which together recruited a representative sample of 4,300 Australians. Respondents completed the EQ-5D-5L, some vignettes, some other health outcome measures and a bunch of sociodemographic questions. The analysis uses an ordered probit model to predict responses to the EQ-5D dimensions, with the vignettes used to identify the model’s thresholds. This is estimated for each dimension of the EQ-5D-5L, in the hope that the model can produce coefficients that facilitate ‘correction’ for DIF. But this isn’t a guaranteed approach to identifying the effect of DIF. Two important assumptions are inherent; first, that individuals rate the hypothetical vignette states on the same latent scale as they rate their own health (AKA response consistency) and, second, that everyone values the vignettes on an equivalent latent scale (AKA vignette equivalence). Only if these assumptions hold can anchoring vignettes be used to adjust for DIF and make different groups comparable. The researchers dedicate a lot of effort to testing these assumptions. To test response consistency, separate (condition-specific) measures are used to assess each domain of the EQ-5D. The findings suggest that responses are consistent. Vignette equivalence is assessed by the significance of individual characteristics in determining vignette values. In this study, the vignette equivalence assumption didn’t hold, which prevents the authors from making generalisable conclusions. However, the researchers looked at whether the assumptions were satisfied in particular age groups. For 55-65 year olds (n=914), they did, for all dimensions except anxiety/depression. That might be because older people are better at understanding health problems, having had more experience of them. So the authors can tell us about DIF in this older group. Having corrected for DIF, the mean health state value in this group increases from 0.729 to 0.806. Various characteristics explain the heterogeneous response behaviour. After correcting for DIF, the difference in EQ-5D index values between high and low education groups increased from 0.049 to 0.095. The difference between employed and unemployed respondents increased from 0.077 to 0.256. In some cases, the rankings changed. The difference between those divorced or widowed and those never married increased from -0.028 to 0.060. The findings hint at a trade-off between giving personalised vignettes to facilitate response consistency and generalisable vignettes to facilitate vignette equivalence. It may be that DIF can only be assessed within particular groups (such as the older sample in this study). But then, if that’s the case, what hope is there for correcting DIF in high-level resource allocation decisions? Clearly, DIF in the EQ-5D could be a big problem. Accounting for it could flip resource allocation decisions. But this study shows that there isn’t an easy answer.

How to design the cost-effectiveness appraisal process of new healthcare technologies to maximise population health: a conceptual framework. Health Economics [PubMed] Published 22nd August 2017

The starting point for this paper is that, when it comes to reimbursement decisions, the more time and money spent on the appraisal process, the more precise the cost-effectiveness estimates are likely to be. So the question is, how much should be committed to the appraisal process in the way of resources? The authors set up a framework in which to consider a variety of alternatively defined appraisal processes, how these might maximise population health and which factors are key drivers in this. The appraisal process is conceptualised as a diagnostic tool to identify which technologies are cost-effective (true positives) and which aren’t (true negatives). The framework builds on the fact that manufacturers can present a claimed ICER that makes their technology more attractive, but that the true ICER can never be known with certainty. As a diagnostic test, there are four possible outcomes: true positive, false positive, true negative, or false negative. Each outcome is associated with an expected payoff in terms of population health and producer surplus. Payoffs depend on the accuracy of the appraisal process (sensitivity and specificity), incremental net benefit per patient, disease incidence, time of relevance for an approval, the cost of the process and the price of the technology. The accuracy of the process can be affected by altering the time and resources dedicated to it or by adjusting the definition of cost-effectiveness in terms of the acceptable level of uncertainty around the ICER. So, what determines an optimal level of accuracy in the appraisal process, assuming that producers’ price setting is exogenous? Generally, the process should have greater sensitivity (at the expense of specificity) when there is more to gain: when a greater proportion of technologies are cost-effective or when the population or time of relevance is greater. There is no fixed optimum for all situations. If we relax the assumption of exogenous pricing decisions, and allow pricing to be partly determined by the appraisal process, we can see that a more accurate process incentivises cost-effective price setting. The authors also consider the possibility of there being multiple stages of appraisal, with appeals, re-submissions and price agreements. The take-home message is that the appraisal process should be re-defined over time and with respect to the range of technologies being assessed, or even an individualised process for each technology in each setting. At least, it seems clear that technologies with exceptional characteristics (with respect to their potential impact on population health), should be given a bespoke appraisal. NICE is already onto these ideas – they recently introduced a fast track process for technologies with a claimed ICER below £10,000 and now give extra attention to technologies with major budget impact.

Credits