My quality-adjusted life year

Why did I do it?

I have evaluated lots of services and been involved in trials where I have asked people to collect EQ-5D data. During this time several people have complained to me about having to collect EQ-5D data so I thought I would have a ‘taste of my own medicine’. I measured my health-related quality of life (HRQoL) using EQ-5D-3L, EQ-5D-VAS, and EQ-5D-5L, every day for a year (N=1). I had the EQ-5D on a spreadsheet on my smartphone and prompted myself to do it at 9 p.m. every night. I set a target of never being more than three days late in doing it, which I missed twice through the year. I also recorded health-related notes for some days, for instance, 21st January said “tired, dropped a keytar on toe (very 1980s injury)”.

By doing this I wanted to illuminate issues around anchoring, ceiling effects and ideas of health and wellness. With a big increase in wearable tech and smartphone health apps this type of big data collection might become a lot more commonplace. I have not kept a diary since I was about 13 so it was an interesting way of keeping track on what was happening, with a focus on health. Starting the year I knew I had one big life event coming up: a new baby due in early March. I am generally quite healthy, a bit overweight, don’t get enough sleep. I have been called a hypochondriac by people before, typically complaining of headaches, colds and sore throats around six months of the year. I usually go running once or twice a week.

From the start I was very conscious that I felt I shouldn’t grumble too much, that EQ-5D was mainly used to measure functional health in people with disease, not in well people (and ceiling effects were a feature of the EQ-5D). I immediately felt a ‘freedom’ of the greater sensitivity of the EQ-5D-5L when compared to the 3L so I could score myself as having slight problems with the 5L, but not that they were bad enough to be ‘some problems’ on the 3L.

There were days when I felt a bit achey or tired because I had been for a run, but unless I had an actual injury I did not score myself as having problems with pain or mobility because of this; generally if I feel achey from running I think of that as a good thing as having pushed myself hard, ‘no pain no gain’. I also started doing yoga this year which made me feel great but also a bit achey sometimes. But in general I noticed that one of the main problems I had was fatigue which is not explicitly covered in the EQ-5D but was reflected sometimes as being slightly impaired on usual activities. I also thought that usual activities could be impaired if you are working and travelling a lot, as you don’t get to do any of the things you enjoy doing like hobbies or spending time with family, but this is more of a capability question whereas the EQ-5D is more functional.

How did my HRQoL compare?

I matched up my levels on the individual domains to EQ-5D-3L and 5L index scores based on UK preference scores. The final 5L value set may still change; I used the most recent published scores. I also matched my levels to a personal 5L value set which I did using this survey which uses discrete choice experiments and involves comparing a set of pairs of EQ-5D-5L health states. I found doing this fascinating and it made me think about how mutually exclusive the EQ-5D dimensions are, and whether some health states are actually implausible: for instance, is it possible to be in extreme pain but not have any impairment on usual activities?

Surprisingly, my average EQ-5D-3L index score (0.982) was higher than the population averages for my age group (for England age 35-44 it is 0.888 based on Szende et al 2014); I expected them to be lower. In fact my average index scores were higher than the average for 18-24 year olds (0.922). I thought that measuring EQ-5D more often and having more granularity would lead to lower average scores but it actually led to high average scores.

My average score from the personal 5L value set was slightly higher than the England population value set (0.983 vs 0.975). Digging into the data, the main differences were that I thought that usual activities were slightly more important, and pain slightly less important, than the general population. The 5L (England tariff) correlated more closely with the VAS than the 3L (r2 =0.746 vs. r2 =0.586) but the 5L (personal tariff) correlated most closely with the VAS (r2 =0.792). So based on my N=1 sample, this suggests that the 5L is a better predictor of overall health than the 3L, and that the personal value set has validity in predicting VAS scores.

Figure 1. My EQ-5D-3L index score [3L], EQ-5D-5L index score (England value set) [5L], EQ-5DL-5L index score (personal value set) [5LP], and visual analogue scale (VAS) score divided by 100 [VAS/100].

Reflection

I definitely regretted doing the EQ-5D every day and was glad when the year was over! I would have preferred to have done it every week but I think that would have missed a lot of subtleties in how I felt from day to day. On reflection the way I was approaching it was that the end of each day I would try to recall if I was stressed, or if anything hurt, and adjust the level on the relevant dimension. But I wonder if I was prompted at any moment during the day as to whether I was stressed, had some mobility issues, or pain, would I say I did? It makes me think about Kahneman and Riis’s ‘remembering brain’ and ‘experiencing brain’. Was my EQ-5D profile a slave to my ‘remembering brain’ rather than my ‘experiencing brain’?

One thing when my score was low for a few days was when I had a really painful abscess on my tooth. At the time I felt like the pain was unbearable so had a high pain score, but looking back I wonder if it was that bad, but I didn’t want to retrospectively change my score. Strangely, I had the flu twice in this year which gave me some health decrements, which I don’t think has ever happened to me before (I don’t think it was just ‘man flu’!).

I knew that I was going to have a baby this year but I didn’t know that I would spend 18 days in hospital, despite not being ill myself. This has led me to think a lot more about ‘caregiver effects‘ – the impact of close relatives being ill; it is unnerving spending night after night in hospital, in this case because my wife was very ill after giving birth, and then when my baby son was two months old, he got very ill (both are doing a lot better now). Being in hospital with a sick relative is a strange feeling, stressful and boring at the same time. I spent a long time staring out of the window or scrolling through Twitter. When my baby son was really ill he would not sleep and did not want to be put down, so my arms were aching after holding him all night. I was lucky that I had understanding managers in work and I was not significantly financially disadvantaged by caring for sick relatives. And glad of the NHS and not getting a huge bill when family members are discharged from hospital.

Health, wellbeing & exercise

Doing this made me think more about the difference between health and wellbeing; there might be days where I was really happy but it wasn’t reflected in my EQ-5D index score. I noticed that doing exercise always led to a higher VAS score – maybe subconsciously I was thinking exercise was increasing my ‘health stock‘. I probably used the VAS score more like an overall wellbeing score rather than just health which is not correct – but I wonder if other people do this as well, and that is why there are less pronounced ceiling effects with the VAS score.

Could trials measure EQ-5D every day?

One advantage of EQ-5D and QALYs over other health outcomes is that they should be measured over a schedule and use the area under the curve. Completing an EQ5D every day has shown me that health does vary every day, but I still think it might be impractical for trial participants to complete an EQ-5D questionnaire every day. Perhaps EQ-5D data could be combined with a simple daily VAS score, possibly out of ten rather than 100 for simplicity.

Joint worst day: 6th and 7th October: EQ-5D-3L index 0.264, EQ-5D-5L index 0.724; personal EQ-5D-5L index 0.824; VAS score 60 – ‘abscess on tooth, couldn’t sleep, face swollen’.

Joint best day: 27th January, 7th September, 11th September, 18th November, 4th December, 30th December: EQ-5D-3L index 1.00;  both EQ-5D-5L index scores 1.00; VAS score 95 – notes include ‘lovely day with family’, ‘went for a run’, ‘holiday’, ‘met up with friends’.

Sam Watson’s journal round-up for 10th September 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Probabilistic sensitivity analysis in cost-effectiveness models: determining model convergence in cohort models. PharmacoEconomics [PubMed] Published 27th July 2018

Probabilistic sensitivity analysis (PSA) is rightfully a required component of economic evaluations. Deterministic sensitivity analyses are generally biased; averaging the outputs of a model based on a choice of values from a complex joint distribution is not likely to be a good reflection of the true model mean. PSA involves repeatedly sampling parameters from their respective distributions and analysing the resulting model outputs. But how many times should you do this? Most times, an arbitrary number is selected that seems “big enough”, say 1,000 or 10,000. But these simulations themselves exhibit variance; so-called Monte Carlo error. This paper discusses making the choice of the number of simulations more formal by assessing the “convergence” of simulation output.

In the same way as sample sizes are chosen for trials, the number of simulations should provide an adequate level of precision, anything more wastes resources without improving inferences. For example, if the statistic of interest is the net monetary benefit, then we would want the confidence interval (CI) to exclude zero as this should be a sufficient level of certainty for an investment decision. The paper, therefore, proposed conducting a number of simulations, examining the CI for when it is ‘narrow enough’, and conducting further simulations if it is not. However, I see a problem with this proposal: the variance of a statistic from a sequence of simulations itself has variance. The stopping points at which we might check CI are themselves arbitrary: additional simulations can increase the width of the CI as well as reduce them. Consider the following set of simulations from a simple ratio of random variables ICER = gamma(1,0.01)/normal(0.01,0.01):ciwidthThe “stopping rule” therefore proposed doesn’t necessarily indicate “convergence” as a few more simulations could lead to a wider, as well as narrower, CI. The heuristic approach is undoubtedly an improvement on the current way things are usually done, but I think there is scope here for a more rigorous method of assessing convergence in PSA.

Mortality due to low-quality health systems in the universal health coverage era: a systematic analysis of amenable deaths in 137 countries. The Lancet [PubMed] Published 5th September 2018

Richard Horton, the oracular editor-in-chief of the Lancet, tweeted last week:

There is certainly an argument that academic journals are good forums to make advocacy arguments. Who better to interpret the analyses presented in these journals than the authors and audiences themselves? But, without a strict editorial bulkhead between analysis and opinion, we run the risk that the articles and their content are influenced or dictated by the political whims of editors rather than scientific merit. Unfortunately, I think this article is evidence of that.

No-one debates that improving health care quality will improve patient outcomes and experience. It is in the very definition of ‘quality’. This paper aims to estimate the numbers of deaths each year due to ‘poor quality’ in low- and middle-income countries (LMICs). The trouble with this is two-fold: given the number of unknown quantities required to get a handle on this figure, the definition of quality notwithstanding, the uncertainty around this figure should be incredibly high (see below); and, attributing these deaths in a causal way to a nebulous definition of ‘quality’ is tenuous at best. The approach of the article is, in essence, to assume that the differences in fatality rates of treatable conditions between LMICs and the best performing health systems on Earth, among people who attend health services, are entirely caused by ‘poor quality’. This definition of quality would therefore seem to encompass low resourcing, poor supply of human resources, a lack of access to medicines, as well as everything else that’s different in health systems. Then, to get to this figure, the authors have multiple sources of uncertainty including:

  • Using a range of proxies for health care utilisation;
  • Using global burden of disease epidemiology estimates, which have associated uncertainty;
  • A number of data slicing decisions, such as truncating case fatality rates;
  • Estimating utilisation rates based on a predictive model;
  • Estimating the case-fatality rate for non-users of health services based on other estimated statistics.

Despite this, the authors claim to estimate a 95% uncertainty interval with a width of only 300,000 people, with a mean estimate of 5.0 million, due to ‘poor quality’. This seems highly implausible, and yet it is claimed to be a causal effect of an undefined ‘poor quality’. The timing of this article coincides with the Lancet Commission on care quality in LMICs and, one suspects, had it not been for the advocacy angle on care quality, it would not have been published in this journal.

Embedding as a pitfall for survey‐based welfare indicators: evidence from an experiment. Journal of the Royal Statistical Society: Series A Published 4th September 2018

Health economists will be well aware of the various measures used to evaluate welfare and well-being. Surveys are typically used that are comprised of questions relating to a number of different dimensions. These could include emotional and social well-being or physical functioning. Similar types of surveys are also used to collect population preferences over states of the world or policy options, for example, Kahneman and Knetsch conducted a survey of WTP for different environmental policies. These surveys can exhibit what is called an ’embedding effect’, which Kahneman and Knetsch described as when the value of a good varies “depending on whether the good is assessed on its own or embedded as part of a more inclusive package.” That is to say that the way people value single dimensional attributes or qualities can be distorted when they’re embedded as part of a multi-dimensional choice. This article reports the results of an experiment involving students who were asked to weight the relative importance of different dimensions of the Better Life Index, including jobs, housing, and income. The randomised treatment was whether they rated ‘jobs’ as a single category, or were presented with individual dimensions, such as the unemployment rate and job security. The experiment shows strong evidence of embedding – the overall weighting substantially differed by treatment. This, the authors conclude, means that the Better Life Index fails to accurately capture preferences and is subject to manipulation should a researcher be so inclined – if you want evidence to say your policy is the most important, just change the way the dimensions are presented.

Credits

Alastair Canaway’s journal round-up for 28th May 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Information, education, and health behaviours: evidence from the MMR vaccine autism controversy. Health Economics [PubMed] Published 2nd May 2018

In 1998, Andrew Wakefield published (in the Lancet) his infamous and later retracted research purportedly linking the measles-mumps-rubella (MMR) vaccine and autism. Despite the thorough debunking and exposure of academic skulduggery, a noxious cloud of misinformation remained in the public mind, particularly in the US. This study examined several facets of the MMR fake news including: what impact did this have on vaccine uptake in the US (both MMR and other vaccines); how did state level variation in media coverage impact uptake; and what role did education play in subsequent decisions about whether to vaccinate or not. This study harnessed the National Immunization Survey from 1995 to 2006 to answer these questions. This is a yearly dataset of over 200,000 children aged between 19 to 35 months with detailed information on not just immunisation, but also maternal education, income and other sociodemographics. The NewsLibrary database was used to identify stories published in national and state media relating to vaccines and autism. Various regression methods were implemented to examine these data. The paper found that, unsurprisingly, for the year following the Wakefield publication the MMR vaccine take-up declined by between 1.1%-1.5% (notably less than 3% in the UK), likewise this fall in take-up spilled over into other vaccines take-up. The most interesting finding related to education: MMR take-up for children of college-educated mothers declined significantly compared to those without a degree. This can be explained by the education gradient where more-educated individuals absorb and respond to health information more quickly. However, in the US, this continued for many years beyond 2003 despite proliferation of research refuting the autism-MMR link. This contrasts to the UK where educational link closed soon after the findings were refuted, that is, in the UK, the educated responded to the new information refuting the MMR-Autism link. In the US, despite the research being debunked, MMR uptake was lower in the children of those with higher levels of education for many more years. The author speculates that this contrast to the UK may be a result of the media influencing parents’ decisions. Whilst the media buzz in the UK peaked in 2002, it had largely subsided by 2003. In the US however, the media attention was constant, if not increasing till 2006, and so this may have been the reason the link remained within the US. So, we have Andrew Wakefield and arguably fearmongering media to blame for causing a long-term reduction in MMR take-up in the US. Overall, an interesting study leaning on multiple datasets that could be of interest for those working with big data.

Can social care needs and well-being be explained by the EQ-5D? Analysis of the Health Survey for England. Value in Health Published 23rd May 2018

There is increasing discussion about integrating health and social care to provide a more integrated approach to fulfilling health and social care needs. This creates challenges for health economists and decision makers when allocating resources, particularly when comparing benefits from different sectors. NICE itself recognises that the EQ-5D may be inappropriate in some situations. With the likes of ASCOT, ICECAP and WEMWBS frequenting the health economics world this isn’t an unknown issue. To better understand the relationship between health and social care measures, this EuroQol Foundation funded study examined the relationship between social care needs as measured by the Barthel Index, well-being measured using WEMWBS and also the GGH-12, and the EQ-5D as the measure of health. Data was obtained through the Health Survey for England (HSE) and contained 3354 individuals aged over 65 years. Unsurprisingly the authors found that higher health and wellbeing scores were associated with an increased probability of no social care needs. Those who are healthier or at higher levels of wellbeing are less likely to need social care. Of all the instruments, it was the self-care and the pain/discomfort dimensions of the EQ-5D that were most strongly associated with the need for social care. No GHQ-12 dimensions were statistically significant, and for the WEMWBS only the ‘been feeling useful’ and ‘had energy to spare’ were statistically significantly associated with social care need. The authors also investigated various other associations between the measures with many unsurprising findings e.g. EQ-5D anxiety/depression dimension was negatively associated with wellbeing as measured using the GHQ-12. Although the findings are favourable for the EQ-5D in terms of it capturing to some extent social care needs, there is clearly still a gap whereby some outcomes are not necessarily captured. Considering this, the authors suggest that it might be appropriate to strap on an extra dimension to the EQ-5D (known as a ‘bolt on’) to better capture important ‘other’ dimensions, for example, to capture dignity or any other important social care outcomes. Of course, a significant limitation with this paper relates to the measures available in the data. Measures such as ASCOT and ICECAP have been developed and operationalised for economic evaluation with social care in mind, and a comparison against these would have been more informative.

The health benefits of a targeted cash transfer: the UK Winter Fuel Payment. Health Economics [PubMed] [RePEc] Published 9th May 2018

In the UK, each winter is accompanied by an increase in mortality, often known as ‘excess winter mortality’ (EWM). To combat this, the UK introduced the Winter Fuel Payment (WFP), the purpose of the WFP is an unconditional cash transfer to households containing an older person (those most vulnerable to EWM) above the female state pension age with the intent for this to used to help the elderly deal with the cost of keeping their dwelling warm. The purpose of this paper was to examine whether the WFP policy has improved the health of elderly people. The authors use the Health Surveys for England (HSE), the Scottish health Survey (SHeS) and the English Longitudinal Study of Ageing (ELSA) and employ a regression discontinuity design to estimate causal effects of the WFP. To measure impact (benefit) they focus on circulatory and respiratory illness as measured by: self-reports of chest infection, nurse measured hypertension, and two blood biomarkers for infection and inflammation. The authors found that for those living in a household receiving the payment there was a 6% point reduction (p<0.01) in the incidence of high levels of serum fibrinogen (biomarker) which are considered to be a marker of current infection and are associated with chronic pulmonary disease. For the other health outcomes, although positive, the estimated effects were less robust and not statistically significant. The authors investigated the impact of increasing the age of eligibility for the WFP (in line with the increase of women’s pension age). Their findings suggest there may be some health cost associated with the increase in age of eligibility for WFP. To surmise, the paper highlights that there may be some health benefits from the receipt of the WFP. What it doesn’t however consider is opportunity cost. With WFP costing about £2 billion per year, as a health economist, I can’t help but wonder if the money could have been better spent through other avenues.

Credits