My quality-adjusted life year

Why did I do it?

I have evaluated lots of services and been involved in trials where I have asked people to collect EQ-5D data. During this time several people have complained to me about having to collect EQ-5D data so I thought I would have a ‘taste of my own medicine’. I measured my health-related quality of life (HRQoL) using EQ-5D-3L, EQ-5D-VAS, and EQ-5D-5L, every day for a year (N=1). I had the EQ-5D on a spreadsheet on my smartphone and prompted myself to do it at 9 p.m. every night. I set a target of never being more than three days late in doing it, which I missed twice through the year. I also recorded health-related notes for some days, for instance, 21st January said “tired, dropped a keytar on toe (very 1980s injury)”.

By doing this I wanted to illuminate issues around anchoring, ceiling effects and ideas of health and wellness. With a big increase in wearable tech and smartphone health apps this type of big data collection might become a lot more commonplace. I have not kept a diary since I was about 13 so it was an interesting way of keeping track on what was happening, with a focus on health. Starting the year I knew I had one big life event coming up: a new baby due in early March. I am generally quite healthy, a bit overweight, don’t get enough sleep. I have been called a hypochondriac by people before, typically complaining of headaches, colds and sore throats around six months of the year. I usually go running once or twice a week.

From the start I was very conscious that I felt I shouldn’t grumble too much, that EQ-5D was mainly used to measure functional health in people with disease, not in well people (and ceiling effects were a feature of the EQ-5D). I immediately felt a ‘freedom’ of the greater sensitivity of the EQ-5D-5L when compared to the 3L so I could score myself as having slight problems with the 5L, but not that they were bad enough to be ‘some problems’ on the 3L.

There were days when I felt a bit achey or tired because I had been for a run, but unless I had an actual injury I did not score myself as having problems with pain or mobility because of this; generally if I feel achey from running I think of that as a good thing as having pushed myself hard, ‘no pain no gain’. I also started doing yoga this year which made me feel great but also a bit achey sometimes. But in general I noticed that one of the main problems I had was fatigue which is not explicitly covered in the EQ-5D but was reflected sometimes as being slightly impaired on usual activities. I also thought that usual activities could be impaired if you are working and travelling a lot, as you don’t get to do any of the things you enjoy doing like hobbies or spending time with family, but this is more of a capability question whereas the EQ-5D is more functional.

How did my HRQoL compare?

I matched up my levels on the individual domains to EQ-5D-3L and 5L index scores based on UK preference scores. The final 5L value set may still change; I used the most recent published scores. I also matched my levels to a personal 5L value set which I did using this survey which uses discrete choice experiments and involves comparing a set of pairs of EQ-5D-5L health states. I found doing this fascinating and it made me think about how mutually exclusive the EQ-5D dimensions are, and whether some health states are actually implausible: for instance, is it possible to be in extreme pain but not have any impairment on usual activities?

Surprisingly, my average EQ-5D-3L index score (0.982) was higher than the population averages for my age group (for England age 35-44 it is 0.888 based on Szende et al 2014); I expected them to be lower. In fact my average index scores were higher than the average for 18-24 year olds (0.922). I thought that measuring EQ-5D more often and having more granularity would lead to lower average scores but it actually led to high average scores.

My average score from the personal 5L value set was slightly higher than the England population value set (0.983 vs 0.975). Digging into the data, the main differences were that I thought that usual activities were slightly more important, and pain slightly less important, than the general population. The 5L (England tariff) correlated more closely with the VAS than the 3L (r2 =0.746 vs. r2 =0.586) but the 5L (personal tariff) correlated most closely with the VAS (r2 =0.792). So based on my N=1 sample, this suggests that the 5L is a better predictor of overall health than the 3L, and that the personal value set has validity in predicting VAS scores.

Figure 1. My EQ-5D-3L index score [3L], EQ-5D-5L index score (England value set) [5L], EQ-5DL-5L index score (personal value set) [5LP], and visual analogue scale (VAS) score divided by 100 [VAS/100].


I definitely regretted doing the EQ-5D every day and was glad when the year was over! I would have preferred to have done it every week but I think that would have missed a lot of subtleties in how I felt from day to day. On reflection the way I was approaching it was that the end of each day I would try to recall if I was stressed, or if anything hurt, and adjust the level on the relevant dimension. But I wonder if I was prompted at any moment during the day as to whether I was stressed, had some mobility issues, or pain, would I say I did? It makes me think about Kahneman and Riis’s ‘remembering brain’ and ‘experiencing brain’. Was my EQ-5D profile a slave to my ‘remembering brain’ rather than my ‘experiencing brain’?

One thing when my score was low for a few days was when I had a really painful abscess on my tooth. At the time I felt like the pain was unbearable so had a high pain score, but looking back I wonder if it was that bad, but I didn’t want to retrospectively change my score. Strangely, I had the flu twice in this year which gave me some health decrements, which I don’t think has ever happened to me before (I don’t think it was just ‘man flu’!).

I knew that I was going to have a baby this year but I didn’t know that I would spend 18 days in hospital, despite not being ill myself. This has led me to think a lot more about ‘caregiver effects‘ – the impact of close relatives being ill; it is unnerving spending night after night in hospital, in this case because my wife was very ill after giving birth, and then when my baby son was two months old, he got very ill (both are doing a lot better now). Being in hospital with a sick relative is a strange feeling, stressful and boring at the same time. I spent a long time staring out of the window or scrolling through Twitter. When my baby son was really ill he would not sleep and did not want to be put down, so my arms were aching after holding him all night. I was lucky that I had understanding managers in work and I was not significantly financially disadvantaged by caring for sick relatives. And glad of the NHS and not getting a huge bill when family members are discharged from hospital.

Health, wellbeing & exercise

Doing this made me think more about the difference between health and wellbeing; there might be days where I was really happy but it wasn’t reflected in my EQ-5D index score. I noticed that doing exercise always led to a higher VAS score – maybe subconsciously I was thinking exercise was increasing my ‘health stock‘. I probably used the VAS score more like an overall wellbeing score rather than just health which is not correct – but I wonder if other people do this as well, and that is why there are less pronounced ceiling effects with the VAS score.

Could trials measure EQ-5D every day?

One advantage of EQ-5D and QALYs over other health outcomes is that they should be measured over a schedule and use the area under the curve. Completing an EQ5D every day has shown me that health does vary every day, but I still think it might be impractical for trial participants to complete an EQ-5D questionnaire every day. Perhaps EQ-5D data could be combined with a simple daily VAS score, possibly out of ten rather than 100 for simplicity.

Joint worst day: 6th and 7th October: EQ-5D-3L index 0.264, EQ-5D-5L index 0.724; personal EQ-5D-5L index 0.824; VAS score 60 – ‘abscess on tooth, couldn’t sleep, face swollen’.

Joint best day: 27th January, 7th September, 11th September, 18th November, 4th December, 30th December: EQ-5D-3L index 1.00;  both EQ-5D-5L index scores 1.00; VAS score 95 – notes include ‘lovely day with family’, ‘went for a run’, ‘holiday’, ‘met up with friends’.

Chris Sampson’s journal round-up for 27th August 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Ethically acceptable compensation for living donations of organs, tissues, and cells: an unexploited potential? Applied Health Economics and Health Policy [PubMed] Published 25th August 2018

Around the world, there are shortages of organs for transplantation. In economics, the debate around the need to increase organ donation can be frustratingly ignorant of ethical and distributional concerns. So it’s refreshing to see this article attempting to square concerns about efficiency and equity. The authors do so by using a ‘spheres of justice’ framework. This is the idea that different social goods should be distributed according to different principles. So, while we might be happy for brocolli and iPhones to be distributed on the basis of free exchange, we might want health to be distributed on the basis of need. The argument can be extended to state that – for a just situation to prevail – certain exchanges between these spheres of justice (e.g. health for iPhones) should never take place. This idea might explain why – as the authors demonstrate with a review of European countries – policy tends not to allow monetary compensation for organ donation.

The paper cleverly sets out to taxonomise monetary and non-monetary reimbursement and compensation with reference to individuals’ incentives and the spheres of justice principles. From this, the authors reach two key conclusions. Firstly, that (monetary) reimbursement of donors’ expenses (e.g. travel costs or lost earnings) is ethically sound as this does not constitute an incentive to donate but rather removes existing disincentives. Secondly, that non-monetary compensation could be deemed ethical.

Three possible forms of non-monetary compensation are discussed: i) prioritisation, ii) free access, and iii) non-health care-related benefits. The first could involve being given priority for receiving organs, or it could extend to the jumping of other health care waiting lists. I think this is more problematic than the authors let on because it asserts that health care should – at least in part – be distributed according to desert rather than need. The second option – free access – could mean access to health care that people would otherwise have to pay for. The third option could involve access to other social goods such as education or housing.

This is an interesting article and an enjoyable read, but I don’t think it provides a complete solution. Maybe I’m just too much of a Marxist, but I think that this – as all other proposals – fails to distribute from each according to ability. That is, we’d still expect non-monetary compensation to incentivise poorer (and on average less healthy) people to donate organs, thus exacerbating health inequality. This is because i) poorer people are more likely to need the non-monetary benefits and ii) we live in a capitalist society in which there is almost nothing that money can’t by and which is strictly non-monetary. Show me a proposal that increases donation rates from those who can most afford to donate them (i.e. the rich and healthy).

Selecting bolt-on dimensions for the EQ-5D: examining their contribution to health-related quality of life. Value in Health Published 18th August 2018

Measures such as the EQ-5D are used to describe health-related quality of life as completely and generically as possible. But there is a trade-off between completeness and the length of the questionnaire. Necessarily, there are parts of the evaluative space that measures will not capture because they are a simplification. If the bit they’re missing is important to your patient group, that’s a problem. You might fancy a bolt-on. But how do we decide which areas of the evaluative space should be more completely included in the measure? Which bolt-ons should be used? This paper seeks to provide means of answering these questions.

The article builds on an earlier piece of work that was included in an earlier journal round-up. In the previous paper, the authors used factor analysis to identify candidate bolt-ons. The goal of this paper is to outline an approach for specifying which of these candidates ought to be used. Using data from the Multi-Instrument Comparison study, the authors fit linear regressions to see how well 37 candidate bolt-on items explain differences in health-related quality of life. The 37 items correspond to six different domains: energy/vitality, satisfaction, relationships, hearing, vision, and speech. In a second test, the authors explored whether the bolt-on candidates could explain differences in health-related quality of life associated with six chronic conditions. Health-related quality of life is defined according to a visual analogue scale, which notably does not correspond to that used in the EQ-5D but rather uses a broader measure of physical, mental, and social health.

The results suggest that items related to energy/vitality, relationships, and satisfaction explained a significant part of health-related quality of life on top of the existing EQ-5D dimensions. The implication is that these could be good candidates for bolt-ons. The analysis of the different conditions was less clear.

For me, there’s a fundamental problem with this study. It moves the goals posts. Bolt-ons are about improving the extent to which a measure can more accurately represent the evaluative space that it is designed to characterise. In this study, the authors use a broader definition of health-related quality of life that – as far as I can tell – the EQ-5D is not designed to capture. We’re not dealing with bolt-ons, we’re dealing with extensions to facilitate expansions to the evaluative space. Nevertheless, the method could prove useful if combined with a more thorough consideration of the evaluative space.

Sources of health financing and health outcomes: a panel data analysis. Health Economics [PubMed] [RePEc] Published 15th August 2018

There is a growing body of research looking at the impact that health (care) spending has on health outcomes. Usually, these studies don’t explicitly look at who is doing the spending. In this study, the author distinguishes between public and private spending and attempts to identify which type of spending (if either) results in greater health improvements.

The author uses data from the World Bank’s World Development Indicators for 1995-2014. Life expectancy at birth is adopted as the primary health outcome and the key expenditure variables are health expenditure as a share of GDP and private health expenditure as a share of total health expenditure. Controlling for a variety of other variables, including some determinants of health such as income and access to an improved water source, a triple difference analysis is described. The triple difference estimator corresponds to the difference in health outcomes arising from i) differences in the private expenditure level, given ii) differences in total expenditure, over iii) time.

The key finding from the study is that, on average, private expenditure is more effective in increasing life expectancy at birth than public expenditure. The author also looks at government effectiveness, which proves crucial. The finding in favour of private expenditure entirely disappears when only countries with effective government are considered. There is some evidence that public expenditure is more effective in these countries, and this is something that future research should investigate further. For countries with ineffective governments, the implication is that policy should be directed towards increasing overall health care expenditure by increasing private expenditure.


Thesis Thursday: Lidia Engel

On the third Thursday of every month, we speak to a recent graduate about their thesis and their studies. This month’s guest is Dr Lidia Engel who graduated with a PhD from Simon Fraser University. If you would like to suggest a candidate for an upcoming Thesis Thursday, get in touch.

Going beyond health-related quality of life for outcome measurement in economic evaluation
David Whitehurst, Scott Lear, Stirling Bryan
Repository link

Your thesis explores the potential for expanding the ‘evaluative space’ in economic evaluation. Why is this important?

I think there are two answers to this question. Firstly, methods for economic evaluation of health care interventions have existed for a number of years but these evaluations have mainly been applied to more narrowly defined ‘clinical’ interventions, such as drugs. Interventions nowadays are more complex, where benefits cannot be simply measured in terms of health. You can think of areas such as public health, mental health, social care, and end-of-life care, where interventions may result in broader benefits, such as increased control over daily life, independence, or aspects related to the process of health care delivery. Therefore, I believe there is a need to re-think the way we measure and value outcomes when we conduct an economic evaluation. Secondly, ignoring broader outcomes of health care interventions that go beyond the narrow focus of health-related quality of life can potentially lead to misallocation of scarce health care resources. Evidence has shown that the choice of outcome measure (such as a health outcome or a broader measure of wellbeing) can have a significant influence on the conclusions drawn from an economic evaluation.

You use both qualitative and quantitative approaches. Was this key to answering your research questions?

I mainly applied quantitative methods in my thesis research. However, Chapter 3 draws upon some qualitative methodology. To gain a better understanding of ‘benefits beyond health’, I came across a novel approach, called Critical Interpretive Synthesis. It is similar to meta-ethnography (i.e. a synthesis of qualitative research), with the difference that the synthesis is not of qualitative literature but of methodologically diverse literature. It involves an iterative approach, where searching, sampling, and synthesis go hand in hand. It doesn’t only produce a summary of existing literature but enables the development of new interpretations that go beyond those originally offered in the literature. I really liked this approach because it enabled me to synthesise the evidence in a more effective way compared with a conventional systematic review. Defining and applying codes and themes, as it is traditionally done in qualitative research, allowed me to organize the general idea of non-health benefits into a coherent thematic framework, which in the end provided me with a better understanding of the topic overall.

What data did you analyse and what quantitative methods did you use?

I conducted three empirical analyses in my thesis research, which all made use of data from the ICECAP measures (ICECAP-O and ICECAP-A). In my first paper, I used data from the ‘Walk the Talk (WTT)‘ project to investigate the complementarity of the ICECAP-O and the EQ-5D-5L in a public health context using regression analyses. My second paper used exploratory factor analysis to investigate the extent of overlap between the ICECAP-A and five preference-based health-related quality of life measures, using data from the Multi Instrument Comparison (MIC) project. I am currently finalizing submission of my third empirical analysis, which reports findings from a path analysis using cross-sectional data from a web-based survey. The path analysis explores three outcome measurement approaches (health-related quality of life, subjective wellbeing, and capability wellbeing) through direct and mediated pathways in individuals living with spinal cord injury. Each of the three studies addressed different components of the overall research question, which, collectively, demonstrated the added value of broader outcome measures in economic evaluation when compared with existing preference-based health-related quality of life measures.

Thinking about the different measures that you considered in your analyses, were any of your findings surprising or unexpected?

In my first paper, I found that the ICECAP-O is more sensitive to environmental features (i.e. social cohesion and street connectivity) when compared with the EQ-5D-5L. As my second paper has shown, this was not surprising, as the ICECAP-A (a measure for adults rather than older adults) and the EQ-5D-5L measure different constructs and had only limited overlap in their descriptive classification systems. While a similar observation was made when comparing the ICECAP-A with three other preference-based health-related quality of life measures (15D, HUI-3, and SF-6D), a substantial overlap was observed between the ICECAP-A and the AQoL-8D, which suggests that it is possible for broader benefits to be captured by preference-based health-related measures (although some may not consider the AQoL-8D to be exclusively ‘health-related’, despite the label). The findings from the path analysis confirmed the similarities between the ICECAP-A and the AQoL-8D. However, the findings do not imply that the AQoL-8D and ICECAP-A are interchangeable instruments, as a mediation effect was found that requires further research.

How would you like to see your research inform current practice in economic evaluation? Is the QALY still in good health?

I am aware of the limitations of the QALY and although there are increasing concerns that the QALY framework does not capture all benefits of health care interventions, it is important to understand that the evaluative space of the QALY is determined by the dimensions included in preference-based measures. From a theoretical point of view, the QALY can embrace any characteristics that are important for the allocation of health care resources. However, in practice, it seems that QALYs are currently defined by what is measured (e.g. the dimensions and response options of EQ-5D instruments) rather than the conceptual origin. Therefore, although non-health benefits have been largely ignored when estimating QALYs, one should not dismiss the QALY framework but rather develop appropriate instruments that capture such broader benefits. I believe the findings of my thesis have particular relevance for national HTA bodies that set guidelines for the conduct of economic evaluation. While the need to maintain methodological consistency is important, the assessment of the real benefits of some health care interventions would be more accurate if we were less prescriptive in terms of which outcome measure to use when conducting an economic evaluation. As my thesis has shown, some preference-based measures already adopt a broad evaluative space but are less frequently used.