Chris Sampson’s journal round-up for 23rd December 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The Internet and children’s psychological wellbeing. Journal of Health Economics Published 13th December 2019

Here at the blog, we like the Internet. We couldn’t exist without it. We vie for your attention along with all of the other content factories (or “friends”). But there’s a well-established sense that people – especially children – should moderate their consumption of Internet content. The Internet is pervasive and is now a fundamental part of our day-to-day lives, not simply an information source to which we turn when we need it. Almost all 12-15 year olds in the UK use the Internet. The ubiquity of the Internet makes it difficult to test its effects. But this paper has a good go at it.

This study is based on the idea that broadband speeds are a good proxy for Internet use. In England, a variety of public and private sector initiatives have resulted in a distorted market with quasi-random assigment of broadband speeds. The authors provide a very thorough explanation of children’s wellbeing in relation to the Internet, outlining a range of potential mechanisms.

The analysis combines data from the UK’s pre-eminent household panel survey (Understanding Society) with broadband speed data published by the UK regulator Ofcom. Six wellbeing outcomes are analysed from children’s self-reported responses. The questions ask children how they feel about their lives – measured on a seven-point scale – in relation to school work, appearance, family, friends, school attended, and life as a whole. An unbalanced panel of 6,310 children from 2012-2017 provides 13,938 observations from 3,765 different Lower Layer Super Output Areas (LSOA), with average broadband speeds for each LSOA for each year. Each of the six wellbeing outcomes is modelled with child-, neighbourhood- and time-specific fixed effects. The models’ covariates include a variety of indicators relating to the child, their parents, their household, and their local area.

A variety of models are tested, and the overall finding is that higher broadband speeds are negatively associated with all of the six wellbeing indicators. Wellbeing in relation to appearance shows the strongest effect; a 1% increase in broadband speed reduces happiness with appearance by around 0.6%. The authors explore a variety of potential mechanisms by running pairs of models between broadband speeds and the mechanism and between the mechanism and the outcomes. A key finding is that the data seem to support the ‘crowding out’ hypothesis. Higher broadband speeds are associated with children spending less time on activities such as sports, clubs, and real world social interactions, and these activities are in turn positively associated with wellbeing. The authors also consider different subgroups, finding that the effects are more detrimental for girls.

Where the paper falls down is that it doesn’t do anything to convince us that broadband speeds represent a good proxy for Internet use. It’s also not clear exactly what the proxy is meant to be for – use (e.g. time spent on the Internet) or access (i.e. having the option to use the Internet) – though the authors seem to be interested in the former. If that’s the case, the logic of the proxy is not obvious. If I want to do X on the Internet then higher speeds will enable me to do it in less time, in which case the proxy would capture the inverse of the desired indicator. The other problem I think we have is in the use of self-reported measures in this context. A key supposed mechanism for the effect is through ‘social comparison theory’, which we might reasonably expect to influence the way children respond to questions as well as – or instead of – their underlying wellbeing.

One-way sensitivity analysis for probabilistic cost-effectiveness analysis: conditional expected incremental net benefit. PharmacoEconomics [PubMed] Published 16th December 2019

Here we have one of those very citable papers that clearly specifies a part of cost-effectiveness analysis methodology. A better title for this paper could be Make one-way sensitivity analysis great again. The authors start out by – quite rightly – bashing the tornado diagram, mostly on the basis that it does not intuitively characterise the information that a decision-maker needs. Instead, the authors propose an approach to probabilistic one-way sensitivity analysis (POSA) that is a kind of simplified version of EVPPI (expected value of partially perfect information) analysis. Crucially, this approach does not assume that the various parameters of the analysis are independent.

The key quantity created by this analysis is the conditional expected incremental net monetary benefit (cINMB), conditional, that is, on the value of the parameter of interest. There are three steps to creating a plot of the POSA results: 1) rank the costs and outcomes for the sampled values of the parameter – say from the first to the last centile; 2) plug in a cost-effectiveness threshold value to calculate the cINMB at each sampled value; and 3) record the probability of observing each value of the parameter. You could use this information to present a tornado-style diagram, plotting the credible range of the cINMB. But it’s more useful to plot a line graph showing the cINMB at the different values of the parameter of interest, taking into account the probability that the values will actually be observed.

The authors illustrate their method using three different parameters from a previously published cost-effectiveness analysis, in each case simulating 15,000 Monte Carlo ‘inner loops’ for each of the 99 centiles. It took me a little while to get my head around the results that are presented, so there’s still some work to do around explaining the visuals to decision-makers. Nevertheless, this approach has the potential to become standard practice.

A head-on ordinal comparison of the composite time trade-off and the better-than-dead method. Value in Health Published 19th December 2019

For years now, methodologists have been trying to find a reliable way to value health states ‘worse than dead’. The EQ-VT protocol, used to value the EQ-5D-5L, includes the composite time trade-off (cTTO). The cTTO task gives people the opportunity to trade away life years in good health to avoid having to subsequently live in a state that they have identified as being ‘worse than dead’ (i.e. they would prefer to die immediately than to live in it). An alternative approach to this is the better-than-dead method, whereby people simply compare given durations in a health state to being dead. But are these two approaches measuring the same thing? This study sought to find out.

The authors recruited a convenience sample of 200 students and asked them to value seven different EQ-5D-5L health states that were close to zero in the Dutch tariff. Each respondent completed both a cTTO task and a better-than-dead task (the order varied) for each of the seven states. The analysis then looked at the extent to which there was agreement between the two methods in terms of whether states were identified as being better or worse than dead. Agreement was measured using counts and using polychoric correlations. Unsurprisingly, agreement was higher for those states that lay further from zero in the Dutch tariff. Around zero, there was quite a bit of disagreement – only 65% agreed for state 44343. Both approaches performed similarly with respect to consistency and test-retest reliability. Overall, the authors interpret these findings as meaning that the two methods are measuring the same underlying preferences.

I don’t find that very convincing. States were more often identified as worse than dead in the better-than-dead task, with 55% valued as such, compared with 37% in the cTTO. That seems like a big difference. The authors provide a variety of possible explanations for the differences, mostly relating to the way the tasks are framed. Or it might be that the complexity of the worse-than-dead task in the cTTO is so confusing and counterintuitive that respondents (intentionally or otherwise) avoid having to do it. For me, the findings reinforce the futility of trying to value health states in relation to being dead. If a slight change in methodology prevents a group of biomedical students from giving consistent assessments of whether or not a state is worse than being dead, what hope do we have?

Credits

Chris Sampson’s journal round-up for 23rd September 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Can you repeat that? Exploring the definition of a successful model replication in health economics. PharmacoEconomics [PubMed] Published 18th September 2019

People talk a lot about replication and its role in demonstrating the validity and reliability of analyses. But what does a successful replication in the context of cost-effectiveness modelling actually mean? Does it mean coming up with precisely the same estimates of incremental costs and effects? Does it mean coming up with a model that recommends the same decision? The authors of this study sought to bring us closer to an operational definition of replication success.

There is potentially much to learn from other disciplines that have a more established history of replication. The authors reviewed literature on the definition of ‘successful replication’ across all disciplines, and used their findings to construct a variety of candidate definitions for use in the context of cost-effectiveness modelling in health. Ten definitions of a successful replication were pulled out of the cross-disciplinary review, which could be grouped into ‘data driven’ replications and ‘experimental’ replications – the former relating to the replication of analyses and the latter relating to the replication of specific observed effects. The ten definitions were from economics, biostatistics, cognitive science, psychology, and experimental philosophy. The definitions varied greatly, with many involving subjective judgments about the proximity of findings. A few studies were found that reported on replications of cost-effectiveness models and which provided some judgment on the level of success. Again, these were inconsistent and subjective.

Quite reasonably, the authors judge that the lack of a fixed definition of successful replication in any scientific field is not just an oversight. The threshold for ‘success’ depends on the context of the replication and on how the evidence will be used. This paper provides six possible definitions of replication success for use in cost-effectiveness modelling, ranging from an identical replication of the results, through partial success in replicating specific pathways within a given margin of error, to simply replicating the same implied decision.

Ultimately, ‘data driven’ replications are a solution to a problem that shouldn’t exist, namely, poor reporting. This paper mostly convinced me that overall ‘success’ isn’t a useful thing to judge in the context of replicating decision models. Replication of certain aspects of a model is useful to evaluate. Whether the replication implied the same decision is a key thing to consider. Beyond this, it is probably worth considering partial success in replicating specific parts of a model.

Differential associations between interpersonal variables and quality-of-life in a sample of college students. Quality of Life Research [PubMed] Published 18th September 2019

There is growing interest in the well-being of students and the distinct challenges involved in achieving good mental health and addressing high levels of demand for services in this group. Students go through many changes that might influence their mental health, prominent among these is the change to their social situation.

This study set out to identify the role of key interpersonal variables on students’ quality of life. The study recruited 1,456 undergraduate students from four universities in the US. The WHOQOL measure was used for quality of life and a barrage of measures were used to collect information on loneliness, social connectedness, social support, emotional intelligence, intimacy, empathic concern, and more. Three sets of analyses of increasing sophistication were conducted, from zero-order correlations between each measure and the WHOQOL, to a network analysis using a Gaussian Graphical Model to identify both direct and indirect relationships while accounting for shared variance.

In all analyses, loneliness stuck out as the strongest driver of quality of life. Social support, social connectedness, emotional intelligence, intimacy with one’s romantic partner, and empathic concern were also significantly associated with quality of life. But the impact of loneliness was greatest, with other interpersonal variables influencing quality of life through their impact on loneliness.

This is a well-researched and reported study. The findings are informative to student support and other services that seek to improve the well-being of students. There is reason to believe that such services should recognise the importance of interpersonal determinants of well-being and in particular address loneliness. But it’s important to remember that this study is only as good as the measures it uses. If you don’t think WHOQOL is adequately measuring student well-being, or you don’t think the UCLA Loneliness Scale tells us what we need to know, you might not want these findings to influence practice. And, of course, the findings may not be generalisable, as the extent to which different interpersonal variables affect quality of life is very likely dependent on the level of service provision, which varies greatly between different universities, let alone countries.

Affordability and non-perfectionism in moral action. Ethical Theory and Moral Practice [PhilPapers] Published 14th September 2019

The ‘cost-effective but unaffordable’ challenge has been bubbling for a while now, at least since sofosbuvir came on the scene. This study explores whether “we can’t afford it” is a justifiable position to take. The punchline is that, no, affordability is not a sound ethical basis on which to support or reject the provision of a health technology. I was extremely sceptical when I first read the claim. If we can’t afford it, it’s impossible, and how can there by a moral imperative in an impossibility? But the authors proceeded to convince me otherwise.

The authors don’t go into great detail on this point, but it all hinges on divisibility. The reason that a drug like sofosbuvir might be considered unaffordable is that loads of people would be eligible to receive it. If sofosbuvir was only provided to a subset of this population, it could be affordable. On this basis, the authors propose the ‘principle of non-perfectionism’. This states that not being able to do all the good we can do (e.g. provide everyone who needs it with sofosbuvir) is not a reason for not doing some of the good we can do. Thus, if we cannot support provision of a technology to everyone who could benefit from it, it does not follow (ethically) to provide it to nobody, but rather to provide it to some people. The basis for selecting people is not of consequence to this argument but could be based on a lottery, for example.

Building on this, the authors explain to us why this is wrong, with the notion of ‘numerical discrimination’. They argue that it is not OK to prioritise one group over another simply because we can meet the needs of everyone within that group as opposed to only some members of the other group. This is exactly what’s happening when we are presented with notions of (un)affordability. If the population of people who could benefit from sofosbuvir was much smaller, there wouldn’t be an issue. But the simple fact that the group is large does not make it morally permissible to deny cost-effective treatment to any individual member within that group. You can’t discriminate against somebody because they are from a large population.

I think there are some tenuous definitions in the paper and some questionable analogies. Nevertheless, the authors succeeded in convincing me that total cost has no moral weight. It is irrelevant to moral reasoning. We should not refuse any health technology to an entire population on the grounds that it is ‘unaffordable’. The authors frame it as a ‘mistake in moral mathematics’. For this argument to apply in the HTA context, it relies wholly on the divisibility of health technologies. To some extent, NICE and their counterparts are in the business of defining models of provision, which might result in limited use criteria to get around the affordability issue. Though these issues are often handled by payers such as NHS England.

The authors of this paper don’t consider the implications for cost-effectiveness thresholds, but this is where my thoughts turned. Does the principle of non-perfectionism undermine the morality of differentiating cost-effectiveness thresholds according to budget impact? I think it probably does. Reducing the threshold because the budget impact is great will result in discrimination (‘numerical discrimination’) against individuals simply because they are part of a large population that could benefit from treatment. This seems to be the direction in which we’re moving. Maybe the efficiency cart is before the ethical horse.

Credits

My quality-adjusted life year

Why did I do it?

I have evaluated lots of services and been involved in trials where I have asked people to collect EQ-5D data. During this time several people have complained to me about having to collect EQ-5D data so I thought I would have a ‘taste of my own medicine’. I measured my health-related quality of life (HRQoL) using EQ-5D-3L, EQ-5D-VAS, and EQ-5D-5L, every day for a year (N=1). I had the EQ-5D on a spreadsheet on my smartphone and prompted myself to do it at 9 p.m. every night. I set a target of never being more than three days late in doing it, which I missed twice through the year. I also recorded health-related notes for some days, for instance, 21st January said “tired, dropped a keytar on toe (very 1980s injury)”.

By doing this I wanted to illuminate issues around anchoring, ceiling effects and ideas of health and wellness. With a big increase in wearable tech and smartphone health apps this type of big data collection might become a lot more commonplace. I have not kept a diary since I was about 13 so it was an interesting way of keeping track on what was happening, with a focus on health. Starting the year I knew I had one big life event coming up: a new baby due in early March. I am generally quite healthy, a bit overweight, don’t get enough sleep. I have been called a hypochondriac by people before, typically complaining of headaches, colds and sore throats around six months of the year. I usually go running once or twice a week.

From the start I was very conscious that I felt I shouldn’t grumble too much, that EQ-5D was mainly used to measure functional health in people with disease, not in well people (and ceiling effects were a feature of the EQ-5D). I immediately felt a ‘freedom’ of the greater sensitivity of the EQ-5D-5L when compared to the 3L so I could score myself as having slight problems with the 5L, but not that they were bad enough to be ‘some problems’ on the 3L.

There were days when I felt a bit achey or tired because I had been for a run, but unless I had an actual injury I did not score myself as having problems with pain or mobility because of this; generally if I feel achey from running I think of that as a good thing as having pushed myself hard, ‘no pain no gain’. I also started doing yoga this year which made me feel great but also a bit achey sometimes. But in general I noticed that one of the main problems I had was fatigue which is not explicitly covered in the EQ-5D but was reflected sometimes as being slightly impaired on usual activities. I also thought that usual activities could be impaired if you are working and travelling a lot, as you don’t get to do any of the things you enjoy doing like hobbies or spending time with family, but this is more of a capability question whereas the EQ-5D is more functional.

How did my HRQoL compare?

I matched up my levels on the individual domains to EQ-5D-3L and 5L index scores based on UK preference scores. The final 5L value set may still change; I used the most recent published scores. I also matched my levels to a personal 5L value set which I did using this survey which uses discrete choice experiments and involves comparing a set of pairs of EQ-5D-5L health states. I found doing this fascinating and it made me think about how mutually exclusive the EQ-5D dimensions are, and whether some health states are actually implausible: for instance, is it possible to be in extreme pain but not have any impairment on usual activities?

Surprisingly, my average EQ-5D-3L index score (0.982) was higher than the population averages for my age group (for England age 35-44 it is 0.888 based on Szende et al 2014); I expected them to be lower. In fact my average index scores were higher than the average for 18-24 year olds (0.922). I thought that measuring EQ-5D more often and having more granularity would lead to lower average scores but it actually led to high average scores.

My average score from the personal 5L value set was slightly higher than the England population value set (0.983 vs 0.975). Digging into the data, the main differences were that I thought that usual activities were slightly more important, and pain slightly less important, than the general population. The 5L (England tariff) correlated more closely with the VAS than the 3L (r2 =0.746 vs. r2 =0.586) but the 5L (personal tariff) correlated most closely with the VAS (r2 =0.792). So based on my N=1 sample, this suggests that the 5L is a better predictor of overall health than the 3L, and that the personal value set has validity in predicting VAS scores.

Figure 1. My EQ-5D-3L index score [3L], EQ-5D-5L index score (England value set) [5L], EQ-5DL-5L index score (personal value set) [5LP], and visual analogue scale (VAS) score divided by 100 [VAS/100].

Reflection

I definitely regretted doing the EQ-5D every day and was glad when the year was over! I would have preferred to have done it every week but I think that would have missed a lot of subtleties in how I felt from day to day. On reflection the way I was approaching it was that the end of each day I would try to recall if I was stressed, or if anything hurt, and adjust the level on the relevant dimension. But I wonder if I was prompted at any moment during the day as to whether I was stressed, had some mobility issues, or pain, would I say I did? It makes me think about Kahneman and Riis’s ‘remembering brain’ and ‘experiencing brain’. Was my EQ-5D profile a slave to my ‘remembering brain’ rather than my ‘experiencing brain’?

One thing when my score was low for a few days was when I had a really painful abscess on my tooth. At the time I felt like the pain was unbearable so had a high pain score, but looking back I wonder if it was that bad, but I didn’t want to retrospectively change my score. Strangely, I had the flu twice in this year which gave me some health decrements, which I don’t think has ever happened to me before (I don’t think it was just ‘man flu’!).

I knew that I was going to have a baby this year but I didn’t know that I would spend 18 days in hospital, despite not being ill myself. This has led me to think a lot more about ‘caregiver effects‘ – the impact of close relatives being ill; it is unnerving spending night after night in hospital, in this case because my wife was very ill after giving birth, and then when my baby son was two months old, he got very ill (both are doing a lot better now). Being in hospital with a sick relative is a strange feeling, stressful and boring at the same time. I spent a long time staring out of the window or scrolling through Twitter. When my baby son was really ill he would not sleep and did not want to be put down, so my arms were aching after holding him all night. I was lucky that I had understanding managers in work and I was not significantly financially disadvantaged by caring for sick relatives. And glad of the NHS and not getting a huge bill when family members are discharged from hospital.

Health, wellbeing & exercise

Doing this made me think more about the difference between health and wellbeing; there might be days where I was really happy but it wasn’t reflected in my EQ-5D index score. I noticed that doing exercise always led to a higher VAS score – maybe subconsciously I was thinking exercise was increasing my ‘health stock‘. I probably used the VAS score more like an overall wellbeing score rather than just health which is not correct – but I wonder if other people do this as well, and that is why there are less pronounced ceiling effects with the VAS score.

Could trials measure EQ-5D every day?

One advantage of EQ-5D and QALYs over other health outcomes is that they should be measured over a schedule and use the area under the curve. Completing an EQ5D every day has shown me that health does vary every day, but I still think it might be impractical for trial participants to complete an EQ-5D questionnaire every day. Perhaps EQ-5D data could be combined with a simple daily VAS score, possibly out of ten rather than 100 for simplicity.

Joint worst day: 6th and 7th October: EQ-5D-3L index 0.264, EQ-5D-5L index 0.724; personal EQ-5D-5L index 0.824; VAS score 60 – ‘abscess on tooth, couldn’t sleep, face swollen’.

Joint best day: 27th January, 7th September, 11th September, 18th November, 4th December, 30th December: EQ-5D-3L index 1.00;  both EQ-5D-5L index scores 1.00; VAS score 95 – notes include ‘lovely day with family’, ‘went for a run’, ‘holiday’, ‘met up with friends’.