Thesis Thursday: Lidia Engel

On the third Thursday of every month, we speak to a recent graduate about their thesis and their studies. This month’s guest is Dr Lidia Engel who graduated with a PhD from Simon Fraser University. If you would like to suggest a candidate for an upcoming Thesis Thursday, get in touch.

Title
Going beyond health-related quality of life for outcome measurement in economic evaluation
Supervisors
David Whitehurst, Scott Lear, Stirling Bryan
Repository link
https://theses.lib.sfu.ca/thesis/etd10264

Your thesis explores the potential for expanding the ‘evaluative space’ in economic evaluation. Why is this important?

I think there are two answers to this question. Firstly, methods for economic evaluation of health care interventions have existed for a number of years but these evaluations have mainly been applied to more narrowly defined ‘clinical’ interventions, such as drugs. Interventions nowadays are more complex, where benefits cannot be simply measured in terms of health. You can think of areas such as public health, mental health, social care, and end-of-life care, where interventions may result in broader benefits, such as increased control over daily life, independence, or aspects related to the process of health care delivery. Therefore, I believe there is a need to re-think the way we measure and value outcomes when we conduct an economic evaluation. Secondly, ignoring broader outcomes of health care interventions that go beyond the narrow focus of health-related quality of life can potentially lead to misallocation of scarce health care resources. Evidence has shown that the choice of outcome measure (such as a health outcome or a broader measure of wellbeing) can have a significant influence on the conclusions drawn from an economic evaluation.

You use both qualitative and quantitative approaches. Was this key to answering your research questions?

I mainly applied quantitative methods in my thesis research. However, Chapter 3 draws upon some qualitative methodology. To gain a better understanding of ‘benefits beyond health’, I came across a novel approach, called Critical Interpretive Synthesis. It is similar to meta-ethnography (i.e. a synthesis of qualitative research), with the difference that the synthesis is not of qualitative literature but of methodologically diverse literature. It involves an iterative approach, where searching, sampling, and synthesis go hand in hand. It doesn’t only produce a summary of existing literature but enables the development of new interpretations that go beyond those originally offered in the literature. I really liked this approach because it enabled me to synthesise the evidence in a more effective way compared with a conventional systematic review. Defining and applying codes and themes, as it is traditionally done in qualitative research, allowed me to organize the general idea of non-health benefits into a coherent thematic framework, which in the end provided me with a better understanding of the topic overall.

What data did you analyse and what quantitative methods did you use?

I conducted three empirical analyses in my thesis research, which all made use of data from the ICECAP measures (ICECAP-O and ICECAP-A). In my first paper, I used data from the ‘Walk the Talk (WTT)‘ project to investigate the complementarity of the ICECAP-O and the EQ-5D-5L in a public health context using regression analyses. My second paper used exploratory factor analysis to investigate the extent of overlap between the ICECAP-A and five preference-based health-related quality of life measures, using data from the Multi Instrument Comparison (MIC) project. I am currently finalizing submission of my third empirical analysis, which reports findings from a path analysis using cross-sectional data from a web-based survey. The path analysis explores three outcome measurement approaches (health-related quality of life, subjective wellbeing, and capability wellbeing) through direct and mediated pathways in individuals living with spinal cord injury. Each of the three studies addressed different components of the overall research question, which, collectively, demonstrated the added value of broader outcome measures in economic evaluation when compared with existing preference-based health-related quality of life measures.

Thinking about the different measures that you considered in your analyses, were any of your findings surprising or unexpected?

In my first paper, I found that the ICECAP-O is more sensitive to environmental features (i.e. social cohesion and street connectivity) when compared with the EQ-5D-5L. As my second paper has shown, this was not surprising, as the ICECAP-A (a measure for adults rather than older adults) and the EQ-5D-5L measure different constructs and had only limited overlap in their descriptive classification systems. While a similar observation was made when comparing the ICECAP-A with three other preference-based health-related quality of life measures (15D, HUI-3, and SF-6D), a substantial overlap was observed between the ICECAP-A and the AQoL-8D, which suggests that it is possible for broader benefits to be captured by preference-based health-related measures (although some may not consider the AQoL-8D to be exclusively ‘health-related’, despite the label). The findings from the path analysis confirmed the similarities between the ICECAP-A and the AQoL-8D. However, the findings do not imply that the AQoL-8D and ICECAP-A are interchangeable instruments, as a mediation effect was found that requires further research.

How would you like to see your research inform current practice in economic evaluation? Is the QALY still in good health?

I am aware of the limitations of the QALY and although there are increasing concerns that the QALY framework does not capture all benefits of health care interventions, it is important to understand that the evaluative space of the QALY is determined by the dimensions included in preference-based measures. From a theoretical point of view, the QALY can embrace any characteristics that are important for the allocation of health care resources. However, in practice, it seems that QALYs are currently defined by what is measured (e.g. the dimensions and response options of EQ-5D instruments) rather than the conceptual origin. Therefore, although non-health benefits have been largely ignored when estimating QALYs, one should not dismiss the QALY framework but rather develop appropriate instruments that capture such broader benefits. I believe the findings of my thesis have particular relevance for national HTA bodies that set guidelines for the conduct of economic evaluation. While the need to maintain methodological consistency is important, the assessment of the real benefits of some health care interventions would be more accurate if we were less prescriptive in terms of which outcome measure to use when conducting an economic evaluation. As my thesis has shown, some preference-based measures already adopt a broad evaluative space but are less frequently used.

Alastair Canaway’s journal round-up for 18th September 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Selection of key health domains from PROMIS® for a generic preference-based scoring system. Quality of Life Research [PubMedPublished 19th August 2017

The US Panel on Cost-Effectiveness recommends the use of QALYs. It doesn’t, however, instruct (unlike the UK) as to what measure should be used. This leaves the door ajar for both new and established measures. This paper sets about developing a new preference-based measure from the Patient-Reported Outcomes Measurement System (PROMIS). PROMIS is a US National Institutes of Health funded suite of person-centred measures of physical, mental, and social health. Across all the PROMIS measures there exist over 70 domains of health relevant to adult health. For all its promise, the PROMIS system does not produce a summary score amenable to the calculation of QALYs, nor for general descriptive purposes such as measuring HRQL over time. This study aimed to reduce the 70 items down to a number suitable for valuation. To do this, Delphi methods were used. The Delphi approach is something that seems to be increasing in popularity in the health economics world. For those unfamiliar, it essentially involves obtaining the opinions of experts independently and iteratively conducting rounds of questioning to reach a consensus (over two or more rounds). In this case nine health outcomes experts were recruited, they were presented with ‘all 37 domains’ (no mention is made of how they got from 70 to 37!) and asked to remove any domains that were not appropriate for inclusion in a general health utility measure or were redundant due to another PROMIS domain. If more than seven experts agreed, then the domain was removed. Responses were combined and presented until consensus was reached. This left 10 domains. They then used a community sample of 50 participants to test for independence of domains using a pairwise independence evaluation test. They were given the option of removing a domain they felt was not important to overall HRQL and asked to rate the importance of remaining domains using a VAS. These findings were used by the research team to whittle down from nine domains to seven. The final domains were: Cognitive function- abilities; Depression; Fatigue; Pain Interference; Physical Function; Ability to participate in social roles and activities; and Sleep disturbance. Many of these are common to existing measures but I did rather like the inclusion of cognitive function and fatigue – something that is missing in many, and to me appear important. The next step is valuation. Upon valuation, this is a promising candidate for use in economic evaluation – particularly in the US where the PROMIS measurement suite is already established.

Predictive validation and the re-analysis of cost-effectiveness: do we dare to tread? PharmacoEconomics [PubMedPublished 22nd August 2017

PharmacoEconomics treated us to a provocative editorial regarding predictive validation and re-analysis of cost-effectiveness models – a call to arms of sorts. For those (like me) who are not modelling experts, predictive validation (aka 4th order validation) refers to the comparison of model outputs with data that are collected after the initial analysis of the model. So essentially you’re comparing what you modelled would happen with what actually happened. The literature suggests that predictive validation is widely ignored. The importance of predictive validity is highlighted with a case study where predictive-validity was examined three years after the end of a trial – upon reanalysis the model was poor. This was then revised, which led to a much better fit of the prospective data. Predictive validation can, therefore, be used to identify sources of inaccuracies in models. If predictive validity was examined more commonly, improvements in model quality more generally are possible. Furthermore, it might be possible to identify specific contexts where poor predictive validity is prevalent and thus require further research. The authors highlight the field of advanced cancers as a particularly relevant context where uncertainty around survival curves is prevalent. By actively scheduling further data collection and updating the survival curves we can reduce the uncertainty surrounding the value of high-cost drugs. Predictive validation can also inform other aspects of the modelling process, such as the best choice of time point from which to extrapolate, or credible rates of change in predicted hazards. The authors suggest using expected value of information analysis to identify technologies with the largest costs of uncertainty to prioritise where predictive validity could be assessed. NICE and other reimbursement bodies require continued data collection for ‘some’ new technologies, the processes are therefore in place for future studies to be designed and implemented in a way to capture such data which allows later re-analysis. Assessing predictive validity seems eminently sensible, there are however barriers. Money is the obvious issue, extended prospective data collection and re-analysis of models requires resources. It does, however, have the potential to save money and improve health in the long run. The authors note how in a recent study they demonstrated that a drug for osteoporosis that had been recommended by Australia’s Pharmaceutical Benefits Advisory Committee was not actually cost-effective when further data were examined. There is clearly value to be achieved in predictive validation and re-analysis – it’s hard to disagree with the authors and we should probably be campaigning for longer term follow-ups, re-analysis and increased acknowledgement of the desirability of predictive validity.

How should cost-of-illness studies be interpreted? The Lancet Psychiatry [PubMed] Published 7th September 2017

It’s a good question – cost of illness studies are commonplace, but are they useful from a health economics perspective? A comment piece in The Lancet Psychiatry examines this issue using the case study of self-harm and suicide. It focuses on a recent publication by Tsiachristas et al, which examines the hospital resource use and care costs for all presentations of self-harm in a UK hospital. Each episode of self-harm cost £809, and when extrapolated to the UK cost £162 million. Over 30% of these costs were psychological assessments which despite being recommended by NICE only 75% of self-harming patients received. If all self-harming patients received assessments as recommended by NICE then another £51 million would be added to the bill. The author raises the question of how much use is this information for health economists. Nearly all cost of illness studies end up concluding that i) they cost a lot, and ii) money could be saved by reducing or ameliorating the underlying factors that cause the illness. Is this helpful? Well, not particularly, by focusing only on one illness there is no consideration of the opportunity cost: if you spend money preventing one condition then that money will be displacing resources elsewhere, likewise, resources spent reducing one illness will likely be balanced by increased spending on another illness. The author highlights this with a thought experiment: “imagine a world where a cost of illness study has been done for every possible diseases and that the total cost of illness was aggregated. The counterfactual from such an exercise is a world where nobody gets sick and everybody dies suddenly at some pre-determined age”. Another issue is that more often than not, cost of illness studies identify that more, not less should be spent on a problem, in the self-harm example it was that an extra £51 million should be spent on psychological assessments. Similarly, it highlights the extra cost of psychological assessments, rather than the glaring issue that 25% who attend hospital for self-harm are not getting the required psychological assessments. This very much links into the final point that cost of illness studies neglect the benefits being achieved. Now all the negatives are out the way, there are at least a couple of positives I can think of off the top of my head i) identification of key cost drivers, and ii) information for use in economic models. The take home message is that although there is some use to cost of illness studies, from a health economics perspective we (as a field) would be better off spending our time steering clear.

Credits

Chris Sampson’s journal round-up for 11th September 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Core items for a standardized resource use measure (ISRUM): expert Delphi consensus survey. Value in Health Published 1st September 2017

Trial-based collection of resource use data, for the purpose of economic evaluation, is wild. Lots of studies use bespoke questionnaires. Some use off-the-shelf measures, but many of these are altered to suit the context. Validity rarely gets a mention. Some of you may already be aware of this research; I’m sure I’m not the only one here who participated. The aim of the study is to establish a core set of resource use items that should be included in all studies to aid comparability, consistency and validity. The researchers identified a long list of 60 candidate items for inclusion, through a review of 59 resource use instruments. An NHS and personal social services perspective was adopted, and any similar items were merged. This list was constructed into a Delphi survey. Members of the HESG mailing list – as well as 111 other identified experts – were invited to complete the survey, for which there were two rounds. The first round asked participants to rate the importance of including each item in the core set, using a scale from 1 (not important) to 9 (very important). Participants were then asked to select their ‘top 10’. Items survived round 1 if they scored at least 7 with more than 50% of respondents, and less than 3 by no more than 15%, either overall or within two or more participant subgroups. In round 2, participants were presented with the results of round 1 and asked to re-rate 34 remaining items. There was a sample of 45 usable responses in round 1 and 42 in round 2. Comments could also be provided, which were subsequently subject to content analysis. After all was said and done, a meeting was held for final item selection based on the findings, to which some survey participants were invited but only one attended (sorry I couldn’t make it). The final 10 items were: i) hospital admissions, ii) length of stay, iii) outpatient appointments, iv) A&E visits, v) A&E admissions, vi) number of appointments in the community, vii) type of appointments in the community, viii) number of home visits, ix) type of home visits and x) name of medication. The measure isn’t ready to use just yet. There is still research to be conducted to identify the ideal wording for each item. But it looks promising. Hopefully, this work will trigger a whole stream of research to develop bolt-ons in specific contexts for a modular system of resource use measurement. I also think that this work should form the basis of alignment between costing and resource use measurement. Resource use is often collected in a way that is very difficult to ‘map’ onto costs or prices. I’m sure the good folk at the PSSRU are paying attention to this work, and I hope they might help us all out by estimating unit costs for each of the core items (as well as any bolt-ons, once they’re developed). There’s some interesting discussion in the paper about the parallels between this work and the development of core outcome sets. Maybe analysis of resource use can be as interesting as the analysis of quality of life outcomes.

A call for open-source cost-effectiveness analysis. Annals of Internal Medicine [PubMed] Published 29th August 2017

Yes, this paper is behind a paywall. Yes, it is worth pointing out this irony over and over again until we all start practising what we preach. We’re all guilty; we all need to keep on keeping on at each other. Now, on to the content. The authors argue in favour of making cost-effectiveness analysis (and model-based economic evaluation in particular) open to scrutiny. The key argument is that there is value in transparency, and analogies are drawn with clinical trial reporting and epidemiological studies. This potential additional value is thought to derive from i) easy updating of models with new data and ii) less duplication of efforts. The main challenges are thought to be the need for new infrastructure – technical and regulatory – and preservation of intellectual property. Recently, I discussed similar issues in a call for a model registry. I’m clearly in favour of cost-effectiveness analyses being ‘open source’. My only gripe is that the authors aren’t the first to suggest this, and should have done some homework before publishing this call. Nevertheless, it is good to see this issue being raised in a journal such as Annals of Internal Medicine, which could be an indication that the tide is turning.

Differential item functioning in quality of life measurement: an analysis using anchoring vignettes. Social Science & Medicine [PubMed] [RePEc] Published 26th August 2017

Differential item functioning (DIF) occurs when different groups of people have different interpretations of response categories. For example, in response to an EQ-5D questionnaire, the way that two groups of people understand ‘slight problems in walking about’ might not be the same. If that were the case, the groups wouldn’t be truly comparable. That’s a big problem for resource allocation decisions, which rely on trade-offs between different groups of people. This study uses anchoring vignettes to test for DIF, whereby respondents are asked to rate their own health alongside some health descriptions for hypothetical individuals. The researchers conducted 2 online surveys, which together recruited a representative sample of 4,300 Australians. Respondents completed the EQ-5D-5L, some vignettes, some other health outcome measures and a bunch of sociodemographic questions. The analysis uses an ordered probit model to predict responses to the EQ-5D dimensions, with the vignettes used to identify the model’s thresholds. This is estimated for each dimension of the EQ-5D-5L, in the hope that the model can produce coefficients that facilitate ‘correction’ for DIF. But this isn’t a guaranteed approach to identifying the effect of DIF. Two important assumptions are inherent; first, that individuals rate the hypothetical vignette states on the same latent scale as they rate their own health (AKA response consistency) and, second, that everyone values the vignettes on an equivalent latent scale (AKA vignette equivalence). Only if these assumptions hold can anchoring vignettes be used to adjust for DIF and make different groups comparable. The researchers dedicate a lot of effort to testing these assumptions. To test response consistency, separate (condition-specific) measures are used to assess each domain of the EQ-5D. The findings suggest that responses are consistent. Vignette equivalence is assessed by the significance of individual characteristics in determining vignette values. In this study, the vignette equivalence assumption didn’t hold, which prevents the authors from making generalisable conclusions. However, the researchers looked at whether the assumptions were satisfied in particular age groups. For 55-65 year olds (n=914), they did, for all dimensions except anxiety/depression. That might be because older people are better at understanding health problems, having had more experience of them. So the authors can tell us about DIF in this older group. Having corrected for DIF, the mean health state value in this group increases from 0.729 to 0.806. Various characteristics explain the heterogeneous response behaviour. After correcting for DIF, the difference in EQ-5D index values between high and low education groups increased from 0.049 to 0.095. The difference between employed and unemployed respondents increased from 0.077 to 0.256. In some cases, the rankings changed. The difference between those divorced or widowed and those never married increased from -0.028 to 0.060. The findings hint at a trade-off between giving personalised vignettes to facilitate response consistency and generalisable vignettes to facilitate vignette equivalence. It may be that DIF can only be assessed within particular groups (such as the older sample in this study). But then, if that’s the case, what hope is there for correcting DIF in high-level resource allocation decisions? Clearly, DIF in the EQ-5D could be a big problem. Accounting for it could flip resource allocation decisions. But this study shows that there isn’t an easy answer.

How to design the cost-effectiveness appraisal process of new healthcare technologies to maximise population health: a conceptual framework. Health Economics [PubMed] Published 22nd August 2017

The starting point for this paper is that, when it comes to reimbursement decisions, the more time and money spent on the appraisal process, the more precise the cost-effectiveness estimates are likely to be. So the question is, how much should be committed to the appraisal process in the way of resources? The authors set up a framework in which to consider a variety of alternatively defined appraisal processes, how these might maximise population health and which factors are key drivers in this. The appraisal process is conceptualised as a diagnostic tool to identify which technologies are cost-effective (true positives) and which aren’t (true negatives). The framework builds on the fact that manufacturers can present a claimed ICER that makes their technology more attractive, but that the true ICER can never be known with certainty. As a diagnostic test, there are four possible outcomes: true positive, false positive, true negative, or false negative. Each outcome is associated with an expected payoff in terms of population health and producer surplus. Payoffs depend on the accuracy of the appraisal process (sensitivity and specificity), incremental net benefit per patient, disease incidence, time of relevance for an approval, the cost of the process and the price of the technology. The accuracy of the process can be affected by altering the time and resources dedicated to it or by adjusting the definition of cost-effectiveness in terms of the acceptable level of uncertainty around the ICER. So, what determines an optimal level of accuracy in the appraisal process, assuming that producers’ price setting is exogenous? Generally, the process should have greater sensitivity (at the expense of specificity) when there is more to gain: when a greater proportion of technologies are cost-effective or when the population or time of relevance is greater. There is no fixed optimum for all situations. If we relax the assumption of exogenous pricing decisions, and allow pricing to be partly determined by the appraisal process, we can see that a more accurate process incentivises cost-effective price setting. The authors also consider the possibility of there being multiple stages of appraisal, with appeals, re-submissions and price agreements. The take-home message is that the appraisal process should be re-defined over time and with respect to the range of technologies being assessed, or even an individualised process for each technology in each setting. At least, it seems clear that technologies with exceptional characteristics (with respect to their potential impact on population health), should be given a bespoke appraisal. NICE is already onto these ideas – they recently introduced a fast track process for technologies with a claimed ICER below £10,000 and now give extra attention to technologies with major budget impact.

Credits