My quality-adjusted life year

Why did I do it?

I have evaluated lots of services and been involved in trials where I have asked people to collect EQ-5D data. During this time several people have complained to me about having to collect EQ-5D data so I thought I would have a ‘taste of my own medicine’. I measured my health-related quality of life (HRQoL) using EQ-5D-3L, EQ-5D-VAS, and EQ-5D-5L, every day for a year (N=1). I had the EQ-5D on a spreadsheet on my smartphone and prompted myself to do it at 9 p.m. every night. I set a target of never being more than three days late in doing it, which I missed twice through the year. I also recorded health-related notes for some days, for instance, 21st January said “tired, dropped a keytar on toe (very 1980s injury)”.

By doing this I wanted to illuminate issues around anchoring, ceiling effects and ideas of health and wellness. With a big increase in wearable tech and smartphone health apps this type of big data collection might become a lot more commonplace. I have not kept a diary since I was about 13 so it was an interesting way of keeping track on what was happening, with a focus on health. Starting the year I knew I had one big life event coming up: a new baby due in early March. I am generally quite healthy, a bit overweight, don’t get enough sleep. I have been called a hypochondriac by people before, typically complaining of headaches, colds and sore throats around six months of the year. I usually go running once or twice a week.

From the start I was very conscious that I felt I shouldn’t grumble too much, that EQ-5D was mainly used to measure functional health in people with disease, not in well people (and ceiling effects were a feature of the EQ-5D). I immediately felt a ‘freedom’ of the greater sensitivity of the EQ-5D-5L when compared to the 3L so I could score myself as having slight problems with the 5L, but not that they were bad enough to be ‘some problems’ on the 3L.

There were days when I felt a bit achey or tired because I had been for a run, but unless I had an actual injury I did not score myself as having problems with pain or mobility because of this; generally if I feel achey from running I think of that as a good thing as having pushed myself hard, ‘no pain no gain’. I also started doing yoga this year which made me feel great but also a bit achey sometimes. But in general I noticed that one of the main problems I had was fatigue which is not explicitly covered in the EQ-5D but was reflected sometimes as being slightly impaired on usual activities. I also thought that usual activities could be impaired if you are working and travelling a lot, as you don’t get to do any of the things you enjoy doing like hobbies or spending time with family, but this is more of a capability question whereas the EQ-5D is more functional.

How did my HRQoL compare?

I matched up my levels on the individual domains to EQ-5D-3L and 5L index scores based on UK preference scores. The final 5L value set may still change; I used the most recent published scores. I also matched my levels to a personal 5L value set which I did using this survey which uses discrete choice experiments and involves comparing a set of pairs of EQ-5D-5L health states. I found doing this fascinating and it made me think about how mutually exclusive the EQ-5D dimensions are, and whether some health states are actually implausible: for instance, is it possible to be in extreme pain but not have any impairment on usual activities?

Surprisingly, my average EQ-5D-3L index score (0.982) was higher than the population averages for my age group (for England age 35-44 it is 0.888 based on Szende et al 2014); I expected them to be lower. In fact my average index scores were higher than the average for 18-24 year olds (0.922). I thought that measuring EQ-5D more often and having more granularity would lead to lower average scores but it actually led to high average scores.

My average score from the personal 5L value set was slightly higher than the England population value set (0.983 vs 0.975). Digging into the data, the main differences were that I thought that usual activities were slightly more important, and pain slightly less important, than the general population. The 5L (England tariff) correlated more closely with the VAS than the 3L (r2 =0.746 vs. r2 =0.586) but the 5L (personal tariff) correlated most closely with the VAS (r2 =0.792). So based on my N=1 sample, this suggests that the 5L is a better predictor of overall health than the 3L, and that the personal value set has validity in predicting VAS scores.

Figure 1. My EQ-5D-3L index score [3L], EQ-5D-5L index score (England value set) [5L], EQ-5DL-5L index score (personal value set) [5LP], and visual analogue scale (VAS) score divided by 100 [VAS/100].

Reflection

I definitely regretted doing the EQ-5D every day and was glad when the year was over! I would have preferred to have done it every week but I think that would have missed a lot of subtleties in how I felt from day to day. On reflection the way I was approaching it was that the end of each day I would try to recall if I was stressed, or if anything hurt, and adjust the level on the relevant dimension. But I wonder if I was prompted at any moment during the day as to whether I was stressed, had some mobility issues, or pain, would I say I did? It makes me think about Kahneman and Riis’s ‘remembering brain’ and ‘experiencing brain’. Was my EQ-5D profile a slave to my ‘remembering brain’ rather than my ‘experiencing brain’?

One thing when my score was low for a few days was when I had a really painful abscess on my tooth. At the time I felt like the pain was unbearable so had a high pain score, but looking back I wonder if it was that bad, but I didn’t want to retrospectively change my score. Strangely, I had the flu twice in this year which gave me some health decrements, which I don’t think has ever happened to me before (I don’t think it was just ‘man flu’!).

I knew that I was going to have a baby this year but I didn’t know that I would spend 18 days in hospital, despite not being ill myself. This has led me to think a lot more about ‘caregiver effects‘ – the impact of close relatives being ill; it is unnerving spending night after night in hospital, in this case because my wife was very ill after giving birth, and then when my baby son was two months old, he got very ill (both are doing a lot better now). Being in hospital with a sick relative is a strange feeling, stressful and boring at the same time. I spent a long time staring out of the window or scrolling through Twitter. When my baby son was really ill he would not sleep and did not want to be put down, so my arms were aching after holding him all night. I was lucky that I had understanding managers in work and I was not significantly financially disadvantaged by caring for sick relatives. And glad of the NHS and not getting a huge bill when family members are discharged from hospital.

Health, wellbeing & exercise

Doing this made me think more about the difference between health and wellbeing; there might be days where I was really happy but it wasn’t reflected in my EQ-5D index score. I noticed that doing exercise always led to a higher VAS score – maybe subconsciously I was thinking exercise was increasing my ‘health stock‘. I probably used the VAS score more like an overall wellbeing score rather than just health which is not correct – but I wonder if other people do this as well, and that is why there are less pronounced ceiling effects with the VAS score.

Could trials measure EQ-5D every day?

One advantage of EQ-5D and QALYs over other health outcomes is that they should be measured over a schedule and use the area under the curve. Completing an EQ5D every day has shown me that health does vary every day, but I still think it might be impractical for trial participants to complete an EQ-5D questionnaire every day. Perhaps EQ-5D data could be combined with a simple daily VAS score, possibly out of ten rather than 100 for simplicity.

Joint worst day: 6th and 7th October: EQ-5D-3L index 0.264, EQ-5D-5L index 0.724; personal EQ-5D-5L index 0.824; VAS score 60 – ‘abscess on tooth, couldn’t sleep, face swollen’.

Joint best day: 27th January, 7th September, 11th September, 18th November, 4th December, 30th December: EQ-5D-3L index 1.00;  both EQ-5D-5L index scores 1.00; VAS score 95 – notes include ‘lovely day with family’, ‘went for a run’, ‘holiday’, ‘met up with friends’.

Chris Sampson’s journal round-up for 4th February 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Patient choice and provider competition – quality enhancing drivers in primary care? Social Science & Medicine Published 29th January 2019

There’s no shortage of studies in economics claiming to identify the impact (or lack of impact) of competition in the market for health care. The evidence has brought us close to a consensus that greater competition might improve quality, so long as providers don’t compete on price. However, many of these studies aren’t able to demonstrate the mechanism through which competition might improve quality, and the causality is therefore speculative. The research reported in this article was an attempt to see whether the supposed mechanisms for quality improvement actually exist. The authors distinguish between the demand-side mechanisms of competition-increasing quality-improving reforms (i.e. changes in patient behaviour) and the supply-side mechanisms (i.e. changes in provider behaviour), asserting that the supply-side has been neglected in the research.

The study is based on primary care in Sweden’s two largest cities, where patients can choose their primary care practice, which could be a private provider. Key is the fact that patients can switch between providers as often as they like, and with fewer barriers to doing so than in the UK. Prospective patients have access to some published quality indicators. With the goal of maximum variation, the researchers recruited 13 primary health care providers for semi-structured interviews with the practice manager and (in most cases) one or more of the practice GPs. The interview protocol included questions about the organisation of patient visits, information received about patients’ choices, market situation, reimbursement, and working conditions. Interview transcripts were coded and a framework established. Two overarching themes were ‘local market conditions’ and ‘feedback from patient choice’.

Most interviewees did not see competitors in the local market as a threat – conversely, providers are encouraged to cooperate on matters such as public health. Where providers did talk about competing, it was in terms of (speed of) access for patients, or in competition to recruit and keep staff. None of the interviewees were automatically informed of patients being removed from their list, and some managers reported difficulties in actually knowing which patients on their list were still genuinely on it. Even where these data were more readily available, nobody had access to information on reasons for patients leaving. Managers saw greater availability of this information as useful for quality improvement, while GPs tended to think it could be useful in ensuring continuity of care. Still, most expressed no desire to expand their market share. Managers reported using marketing efforts in response to greater competition generally, rather than as a response to observed changes within their practice. But most relied on reputation. Some reported becoming more service-minded as a result of choice reforms.

It seems that practices need more information to be able to act on competitive pressures. But, most practices don’t care about it because they don’t want to expand and they face no risk of there being a shortage of patients (in cities, at least). And, even if they did want to act on the information, chances are it would just create an opportunity for them to improve access as a way of cherry-picking younger and healthier people who demand convenience. Primary care providers (in this study, at least) are not income maximisers, but satisficers (they want to break-even), so there isn’t much scope for reforms to encourage providers to compete for new patients. Patient choice reforms may improve quality, but it isn’t clear that this has anything to do with competitive pressure.

Maximising the impact of patient reported outcome assessment for patients and society. BMJ [PubMed] Published 24th January 2019

Patient-reported outcome measures (PROMs) have been touted as a way of improving patient care. Yet, their use around the world is fragmented. In this paper, the authors make some recommendations about how we might use PROMs to improve patient care. The authors summarise some of the benefits of using PROMs and discuss some of the ways that they’ve been used in the UK.

Five key challenges in the use of PROMs are specified: i) appropriate and consistent selection of the best measures; ii) ethical collection and reporting of PROM data; iii) data collection, analysis, reporting, and interpretation; iv) data logistics; and v) a lack of coordination and efficiency. To address these challenges, the authors recommend an ‘integrated’ approach. To achieve this, stakeholder engagement is important and a governance framework needs to be developed. A handy table of current uses is provided.

I can’t argue with what the paper proposes, but it outlines an idealised scenario rather than any firm and actionable recommendations. What the authors don’t discuss is the fact that the use of PROMs in the UK is flailing. The NHS PROMs programme has been scaled back, measures have been dropped from the QOF, the EQ-5D has been dropped from the GP Patient Survey. Perhaps we need bolder recommendations and new ideas to turn the tide.

Check your checklist: the danger of over- and underestimating the quality of economic evaluations. PharmacoEconomics – Open [PubMed] Published 24th January 2019

This paper outlines the problems associated with misusing methodological and reporting checklists. The author argues that the current number of checklists available in the context of economic evaluation and HTA (13, apparently) is ‘overwhelming’. Three key issues are discussed. First, researchers choose the wrong checklist. A previous review found that the Drummond, CHEC, and Philips checklists were regularly used in the wrong context. Second, checklists can be overinterpreted, resulting in incorrect conclusions. A complete checklist does not mean that a study is perfect, and different features are of varying importance in different studies. Third, checklists are misused, with researchers deciding which items are or aren’t relevant to their study, without guidance.

The author suggests that more guidance is needed and that a checklist for selecting the correct checklist could be the way to go. The issue of updating checklists over time – and who ought to be responsible for this – is also raised.

In general, the tendency seems to be to broaden the scope of general checklists and to develop new checklists for specific methodologies, requiring the application of multiple checklists. As methods develop, they become increasingly specialised and heterogeneous. I think there’s little hope for checklists in this context unless they’re pared down and used as a reminder of the more complex guidance that’s needed to specify suitable methods and achieve adequate reporting. ‘Check your checklist’ is a useful refrain, though I reckon ‘chuck your checklist’ can sometimes be a better strategy.

A systematic review of dimensions evaluating patient experience in chronic illness. Health and Quality of Life Outcomes [PubMed] Published 21st January 2019

Back to PROMs and PRE(xperience)Ms. This study sets out to understand what it is that patient-reported measures are being used to capture in the context of chronic illness. The authors conducted a systematic review, screening 2,375 articles and ultimately including 107 articles that investigated the measurement properties of chronic (physical) illness PROMs and PREMs.

29 questionnaires were about (health-related) quality of life, 19 about functional status or symptoms, 20 on feelings and attitudes about illness, 19 assessing attitudes towards health care, and 20 on patient experience. The authors provide some nice radar charts showing the percentage of questionnaires that included each of 12 dimensions: i) physical, ii) functional, iii) social, iv) psychological, v) illness perceptions, vi) behaviours and coping, vii) effects of treatment, viii) expectations and satisfaction, ix) experience of health care, x) beliefs and adherence to treatment, xi) involvement in health care, and xii) patient’s knowledge.

The study supports the idea that a patient’s lived experience of illness and treatment, and adaptation to that, has been judged to be important in addition to quality of life indicators. The authors recommend that no measure should try to capture everything because there are simply too many concepts that could be included. Rather, researchers should specify the domains of interest and clearly define them for instrument development.

Credits

 

Chris Sampson’s journal round-up for 7th January 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Overview, update, and lessons learned from the international EQ-5D-5L valuation work: version 2 of the EQ-5D-5L valuation protocol. Value in Health Published 2nd January 2019

Insofar as there is any drama in health economics, the fallout from the EQ-5D-5L value set for England was pretty dramatic. If you ask me, the criticisms are entirely ill-conceived. Regardless of that, one of the main sticking points was that the version of the EQ-5D-5L valuation protocol that was used was flawed. England was one of the first countries to get a valuation, so it used version 1.0 of the EuroQol Valuation Technique (EQ-VT). We’re now up to version 2.1. This article outlines the issues that arose in using the first version, what EuroQol did to try and solve them, and describes the current challenges in valuation.

EQ-VT 1.0 includes the composite time trade-off (cTTO) task to elicit values for health states better and worse than dead. Early valuation studies showed some unusual patterns. Research into the causes of this showed that in many cases there was very little time spent on the task. Some interviewers had a tendency to skip parts of the explanation for completing the worse-than-dead bit of the cTTO, resulting in no values worse than dead. EQ-VT 1.1 added three practise valuations along with greater monitoring of interviewer performance and a quality control procedure. This dramatically reduced interviewer effects and the likelihood of inconsistent responses. Yet further improvements could be envisioned. And so EQ-VT 2.0 added a feedback module. The feedback module shows respondents the ranking of states implied by their valuations, with which respondents can then agree or disagree. 2.0 was tested against 1.1 and showed further reductions in inconsistencies thanks to the feedback module. Other modifications were not supported by the evaluation. EQ-VT 2.1 added a dynamic question to further improve the warm-up tasks.

There are ongoing challenges with the cTTO, mostly to do with how to model the data. The authors provide a table setting out causes, consequences, and possible solutions for various issues that might arise in the modelling of cTTO data. And then there’s the discrete choice experiment (DCE), which is included in addition to the cTTO, but which different valuation studies used (or did not use) differently in modelling values. Research is ongoing that will probably lead to developments beyond EQ-VT 2.1. This might involve abandoning the cTTO altogether. Or, at least, there might be a reduction in cTTO tasks and a greater reliance on DCE. But more research is needed before duration can be adequately incorporated into DCEs.

Helpfully, the paper includes a table with a list of countries and specification of the EQ-VT versions used. This demonstrates the vast amount of knowledge that has been accrued about EQ-5D-5L valuation and the lack of wisdom in continuing to support the (relatively under-interrogated) EQ-5D-3L MVH valuation.

Do time trade-off values fully capture attitudes that are relevant to health-related choices? The European Journal of Health Economics [PubMed] Published 31st December 2018

Different people have different preferences, so values for health states elicited using TTO should vary from person to person. This study is concerned with how personal circumstances and beliefs influence TTO values and whether TTO entirely captures the impact of these on preferences for health states.

The authors analysed data from an online survey with a UK-representative sample of 1,339. Participants were asked about their attitudes towards quality and quantity of life, before completing some TTO tasks based on the EQ-5D-5L. Based on their response, they were shown two ‘lives’ that – given their TTO response – they should have considered to be of equivalent value. The researchers constructed generalised estimating equations to model the TTO values and logit models for the subsequent choices between states. Age, marital status, education, and attitudes towards trading quality and quantity of life all determined TTO values in addition to the state that was being valued. In the modelling of the decisions about the two lives, attitudes influenced decisions through the difference between the two lives in the number of life years available. That is, an interaction term between the attitudes variable and years variables showed that people who prefer quantity of life over quality of life were more likely to choose the state with a greater number of years.

The authors’ interpretation from this is that TTO reflects people’s attitudes towards quality and quantity of life, but only partially. My interpretation would be that the TTO exercise would have benefitted from the kind of refinement described above. The choice between the two lives is similar to the feedback module of the EQ-VT 2.0. People often do not understand the implications of their TTO valuations. The study could also be interpreted as supportive of ‘head-to-head’ choice methods (such as DCE) rather than making choices involving full health and death. But the design of the TTO task used in this study was quite dissimilar to others, which makes it difficult to say anything generally about TTO as a valuation method.

Exploring the item sets of the Recovering Quality of Life (ReQoL) measures using factor analysis. Quality of Life Research [PubMed] Published 21st December 2018

The ReQoL is a patient-reported outcome measure for use with people experiencing mental health difficulties. The ReQoL-10 and ReQoL-20 both ask questions relating to seven domains: six mental, one physical. There’s been a steady stream of ReQoL research published in recent years and the measures have been shown to have acceptable psychometric properties. This study concerns the factorial structure of the ReQoL item sets, testing internal construct validity and informing scoring procedures. There’s also a more general methodological contribution relating to the use of positive and negative factors in mental health outcome questionnaires.

At the outset of this study, the ReQoL was based on 61 items. These were reduced to 40 on the basis of qualitative and quantitative analysis reported in other papers. This paper reports on two studies – the first group (n=2,262) completed the 61 items and the second group (n=4,266) completed 40 items. Confirmatory factor analysis and exploratory factor analysis were conducted. Six-factor (according to ReQoL domains), two-factor (negative/positive) and bi-factor (global/negative/positive) models were tested. In the second study, participants were either presented with a version that jumbled up the positively and negatively worded questions or a version that showed a block of negatives followed by a block of positives. The idea here is that if a two-factor structure is simply a product of the presentation of questions, it should be more pronounced in the jumbled version.

The results were much the same from the two study samples. The bi-factor model demonstrated acceptable fit, with much higher factor loadings on the general quality of life factor that loaded on all items. The results indicated sufficient unidimensionality to go ahead with reducing the number of items and the two ordering formats didn’t differ, suggesting that the negative and positive loadings weren’t just an artefact of the presentation. The findings show that the six dimensions of the ReQoL don’t stand as separate factors. The justification for maintaining items from each of the six dimensions, therefore, seems to be a qualitative one.

Some outcome measurement developers have argued that items should all be phrased in the same direction – as either positive or negative – to obtain high-quality data. But there’s good reason to think that features of mental health can’t reliably be translated from negative to positive, and this study supports the inclusion (and intermingling) of both within a measure.

Credits