Chris Sampson’s journal round-up for 23rd September 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Can you repeat that? Exploring the definition of a successful model replication in health economics. PharmacoEconomics [PubMed] Published 18th September 2019

People talk a lot about replication and its role in demonstrating the validity and reliability of analyses. But what does a successful replication in the context of cost-effectiveness modelling actually mean? Does it mean coming up with precisely the same estimates of incremental costs and effects? Does it mean coming up with a model that recommends the same decision? The authors of this study sought to bring us closer to an operational definition of replication success.

There is potentially much to learn from other disciplines that have a more established history of replication. The authors reviewed literature on the definition of ‘successful replication’ across all disciplines, and used their findings to construct a variety of candidate definitions for use in the context of cost-effectiveness modelling in health. Ten definitions of a successful replication were pulled out of the cross-disciplinary review, which could be grouped into ‘data driven’ replications and ‘experimental’ replications – the former relating to the replication of analyses and the latter relating to the replication of specific observed effects. The ten definitions were from economics, biostatistics, cognitive science, psychology, and experimental philosophy. The definitions varied greatly, with many involving subjective judgments about the proximity of findings. A few studies were found that reported on replications of cost-effectiveness models and which provided some judgment on the level of success. Again, these were inconsistent and subjective.

Quite reasonably, the authors judge that the lack of a fixed definition of successful replication in any scientific field is not just an oversight. The threshold for ‘success’ depends on the context of the replication and on how the evidence will be used. This paper provides six possible definitions of replication success for use in cost-effectiveness modelling, ranging from an identical replication of the results, through partial success in replicating specific pathways within a given margin of error, to simply replicating the same implied decision.

Ultimately, ‘data driven’ replications are a solution to a problem that shouldn’t exist, namely, poor reporting. This paper mostly convinced me that overall ‘success’ isn’t a useful thing to judge in the context of replicating decision models. Replication of certain aspects of a model is useful to evaluate. Whether the replication implied the same decision is a key thing to consider. Beyond this, it is probably worth considering partial success in replicating specific parts of a model.

Differential associations between interpersonal variables and quality-of-life in a sample of college students. Quality of Life Research [PubMed] Published 18th September 2019

There is growing interest in the well-being of students and the distinct challenges involved in achieving good mental health and addressing high levels of demand for services in this group. Students go through many changes that might influence their mental health, prominent among these is the change to their social situation.

This study set out to identify the role of key interpersonal variables on students’ quality of life. The study recruited 1,456 undergraduate students from four universities in the US. The WHOQOL measure was used for quality of life and a barrage of measures were used to collect information on loneliness, social connectedness, social support, emotional intelligence, intimacy, empathic concern, and more. Three sets of analyses of increasing sophistication were conducted, from zero-order correlations between each measure and the WHOQOL, to a network analysis using a Gaussian Graphical Model to identify both direct and indirect relationships while accounting for shared variance.

In all analyses, loneliness stuck out as the strongest driver of quality of life. Social support, social connectedness, emotional intelligence, intimacy with one’s romantic partner, and empathic concern were also significantly associated with quality of life. But the impact of loneliness was greatest, with other interpersonal variables influencing quality of life through their impact on loneliness.

This is a well-researched and reported study. The findings are informative to student support and other services that seek to improve the well-being of students. There is reason to believe that such services should recognise the importance of interpersonal determinants of well-being and in particular address loneliness. But it’s important to remember that this study is only as good as the measures it uses. If you don’t think WHOQOL is adequately measuring student well-being, or you don’t think the UCLA Loneliness Scale tells us what we need to know, you might not want these findings to influence practice. And, of course, the findings may not be generalisable, as the extent to which different interpersonal variables affect quality of life is very likely dependent on the level of service provision, which varies greatly between different universities, let alone countries.

Affordability and non-perfectionism in moral action. Ethical Theory and Moral Practice [PhilPapers] Published 14th September 2019

The ‘cost-effective but unaffordable’ challenge has been bubbling for a while now, at least since sofosbuvir came on the scene. This study explores whether “we can’t afford it” is a justifiable position to take. The punchline is that, no, affordability is not a sound ethical basis on which to support or reject the provision of a health technology. I was extremely sceptical when I first read the claim. If we can’t afford it, it’s impossible, and how can there by a moral imperative in an impossibility? But the authors proceeded to convince me otherwise.

The authors don’t go into great detail on this point, but it all hinges on divisibility. The reason that a drug like sofosbuvir might be considered unaffordable is that loads of people would be eligible to receive it. If sofosbuvir was only provided to a subset of this population, it could be affordable. On this basis, the authors propose the ‘principle of non-perfectionism’. This states that not being able to do all the good we can do (e.g. provide everyone who needs it with sofosbuvir) is not a reason for not doing some of the good we can do. Thus, if we cannot support provision of a technology to everyone who could benefit from it, it does not follow (ethically) to provide it to nobody, but rather to provide it to some people. The basis for selecting people is not of consequence to this argument but could be based on a lottery, for example.

Building on this, the authors explain to us why this is wrong, with the notion of ‘numerical discrimination’. They argue that it is not OK to prioritise one group over another simply because we can meet the needs of everyone within that group as opposed to only some members of the other group. This is exactly what’s happening when we are presented with notions of (un)affordability. If the population of people who could benefit from sofosbuvir was much smaller, there wouldn’t be an issue. But the simple fact that the group is large does not make it morally permissible to deny cost-effective treatment to any individual member within that group. You can’t discriminate against somebody because they are from a large population.

I think there are some tenuous definitions in the paper and some questionable analogies. Nevertheless, the authors succeeded in convincing me that total cost has no moral weight. It is irrelevant to moral reasoning. We should not refuse any health technology to an entire population on the grounds that it is ‘unaffordable’. The authors frame it as a ‘mistake in moral mathematics’. For this argument to apply in the HTA context, it relies wholly on the divisibility of health technologies. To some extent, NICE and their counterparts are in the business of defining models of provision, which might result in limited use criteria to get around the affordability issue. Though these issues are often handled by payers such as NHS England.

The authors of this paper don’t consider the implications for cost-effectiveness thresholds, but this is where my thoughts turned. Does the principle of non-perfectionism undermine the morality of differentiating cost-effectiveness thresholds according to budget impact? I think it probably does. Reducing the threshold because the budget impact is great will result in discrimination (‘numerical discrimination’) against individuals simply because they are part of a large population that could benefit from treatment. This seems to be the direction in which we’re moving. Maybe the efficiency cart is before the ethical horse.

Credits

Chris Sampson’s journal round-up for 5th August 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The barriers and facilitators to model replication within health economics. Value in Health Published 16th July 2019

Replication is a valuable part of the scientific process, especially if there are uncertainties about the validity of research methods. When it comes to cost-effectiveness modelling, there are endless opportunities for researchers to do things badly, even with the best intentions. Attempting to replicate modelling studies can therefore support health care decision-making. But replication studies are rarely conducted, or, at least, rarely reported. The authors of this study sought to understand the factors that can make replication easy or difficult, with a view to informing reporting standards.

The authors attempted to replicate five published cost-effectiveness modelling studies, with the aim of recreating the key results. Each replication attempt was conducted by a different author and we’re even given a rating of the replicator’s experience level. The characteristics of the models were recorded and each replicator detailed – anecdotally – the things that helped or hindered their attempt. Some replications were a resounding failure. In one case, the replicated cost per patient was more than double the original, at more than £1,000 wide of the mark. Replicators reported that having a clear diagram of the model structure was a big help, as was the provision of example calculations and explicit listing of the key assumptions. Various shortcomings made replication difficult, all relating to a lack of clarity or completeness in reporting. The impact of this on the validation attempt was exacerbated if the model either involved lots of scenarios that weren’t clearly described or if the model had a long time horizon.

The quality of each study was assessed using the Philips checklist, and all did pretty well, suggesting that the checklist is not sufficient for ensuring replicability. If you develop and report cost-effectiveness models, this paper could help you better understand how end-users will interpret your reporting and make your work more replicable. This study focusses on Markov models. They’re definitely the most common approach, so perhaps that’s OK. It might be useful to produce prescriptive guidance specific to Markov models, informed by the findings of this study.

US integrated delivery networks perspective on economic burden of patients with treatment-resistant depression: a retrospective matched-cohort study. PharmacoEconomics – Open [PubMed] Published 28th June 2019

Treatment-resistant depression can be associated high health care costs, as multiple lines of treatment are tried, with patients experiencing little or no benefit. New treatments and models of care can go some way to addressing these challenges. In the US, there’s some reason to believe that integrated delivery networks (IDNs) could be associated with lower care costs, because IDNs are based on collaborative care models and constitute a single point of accountability for patient costs. They might be particularly useful in the case of treatment-resistant depression, but evidence is lacking. The authors of this study investigated the difference in health care resource use and costs for patients with and without treatment-resistant depression, in the context of IDNs.

The researchers conducted a retrospective cohort study using claims data for people receiving care from IDNs, with up to two years follow-up from first antidepressant use. 1,582 people with treatment-resistant depression were propensity score matched to two other groups – patients without depression and patients with depression that was not classified as treatment-resistant. Various regression models were used to compare the key outcomes of all-cause and specific categories of resource use and costs. Unfortunately, there is no assessment of whether the selected models are actually any good at estimating differences in costs.

The average costs and resource use levels in the three groups ranked as you would expect: $25,807 per person per year for the treatment-resistant group versus $13,701 in the non-resistant group and $8,500 in the non-depression group. People with treatment-resistant depression used a wider range of antidepressants and for a longer duration. They also had twice as many inpatient visits as people with depression that wasn’t treatment-resistant, which seems to have been the main driver of the adjusted differences in costs.

We don’t know (from this study) whether or not IDNs provide a higher quality of care. And the study isn’t able to compare IDN and non-IDN models of care. But it does show that IDNs probably aren’t a full solution to the high costs of treatment-resistant depression.

Rabin’s paradox for health outcomes. Health Economics [PubMed] [RePEc] Published 19th June 2019

Rabin’s paradox arises from the theoretical demonstration that a risk-averse individual who turns down a 50:50 gamble of gaining £110 or losing £100 would, if expected utility theory is correct, turn down a 50:50 gamble of losing £1,000 or gaining millions. This is because of the assumed concave utility function over wealth that is used to model risk aversion and it is probably not realistic. But we don’t know about the relevance of this paradox in the health domain… until now.

A key contribution of this paper is that it considers both decision-making about one’s own health and decision-making from a societal perspective. Three different scenarios are set-up in each case, relating to gains and losses in life expectancy with different levels of health functioning. 201 students were recruited as part of a larger study on preferences and each completed all six gamble-pairs (three individual, three societal). To test for Rabin’s paradox, the participants were asked whether they would accept each gamble involving a moderate stake and a large stake.

In short, the authors observe Rabin’s proposed failure of expected utility theory. Many participants rejected small gambles but did not reject the larger gambles. The effect was more pronounced for societal preferences. Though there was a large minority for whom expected utility theory was not violated. The upshot of all this is that our models of health preferences that are based on expected utility may be flawed where uncertain outcomes are involved – as they often are in health. This study adds to a growing body of literature supporting the relevance of alternative utility theories, such as prospect theory, to health and health care.

My only problem here is that life expectancy is not health. Life expectancy is everything. It incorporates the monetary domain, which this study did not want to consider, as well as every other domain of life. When you die, your stock of cash is as useful to you as your stock of health. I think it would have been more useful if the study focussed only on health status and outcomes and excluded all considerations of death.

Credits

Chris Sampson’s journal round-up for 23rd April 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

What should we know about the person behind a TTO? The European Journal of Health Economics [PubMed] Published 18th April 2018

The time trade-off (TTO) is a staple of health state valuation. Ask someone to value a health state with respect to time and – hey presto! – you have QALYs. This editorial suggests that completing a TTO can be a difficult task for respondents and that, more importantly, individuals’ characteristics may determine the way that they respond and therefore the nature of the results. One of the most commonly demonstrated differences, in this respect, is the fact that valuations of people’s own health states tend to be higher than health states valued hypothetically. But this paper focuses on indirect (hypothetical) valuations. The authors highlight mixed evidence for the influence of age, gender, marital status, having children, education, income, expectations about the future, and of one’s own health state. But why should we try and find out more about respondents when conducting TTOs? The authors offer 3 reasons: i) to inform sampling, ii) to inform the design and standardisation of TTO exercises, and iii) to inform the analysis. I agree – we need to better understand these sources of heterogeneity. Not to over-engineer responses, but to aid our interpretation, even if we want societally-representative valuations that include all of these variations in response behaviour. TTO valuation studies should collect data relating to the individual respondents. Unfortunately, what those data should be aren’t listed in this study, so the research question in the title isn’t really answered. But maybe that’s something the authors have in hand.

Computer modeling of diabetes and its transparency: a report on the eighth Mount Hood Challenge. Value in Health Published 9th April 2018

The Mount Hood Challenge is a get-together for people working on the (economic) modelling of diabetes. The subject of the 2016 meeting was transparency, with two specific goals: i) to evaluate the transparency of two published studies, and ii) to develop a diabetes-specific checklist for transparent reporting of modelling studies. Participants were tasked (in advance of the meeting) with replicating the two published studies and using the replicated models to evaluate some pre-specified scenarios. Both of the studies had some serious shortcomings in the reporting of the necessary data for replication, including the baseline characteristics of the population. Five modelling groups replicated the first model and seven groups replicated the second model. Naturally, the different groups made different assumptions about what should be used in place of missing data. For the first paper, none of the models provided results that matched the original. Not even close. And the differences between the results of the replications – in terms of costs incurred and complications avoided – were huge. The performance was a bit better on the second paper, but hardly worth celebrating. In general, the findings were fear-confirming. Informed by these findings, the Diabetes Modeling Input Checklist was created, designed to complement existing checklists with more general applications. It includes specific data requirements for the reporting of modelling studies, relating to the simulation cohort, treatments, costs, utilities, and model characteristics. If you’re doing some modelling in diabetes, you should have this paper to hand.

Setting dead at zero: applying scale properties to the QALY model. Medical Decision Making [PubMed] Published 9th April 2018

In health state valuation, whether or not a state is considered ‘worse than dead’ is heavily dependent on methodological choices. This paper reviews the literature to answer two questions: i) what are the reasons for anchoring at dead=0, and ii) how does the position of ‘dead’ on the utility-scale impact on decision making? The authors took a standard systematic approach to identify literature from databases, with 7 papers included. Then the authors discuss scale properties and the idea that there are interval scales (such as temperature) and ratio scales (such as distance). The difference between these is the meaningfulness of the reference point (or origin). This means that you can talk about distance doubling, but you can’t talk about temperature doubling, because 0 metres is not arbitrary, whereas 0 degrees Celsius is. The paper summarises some of the arguments put forward for using dead=0. They aren’t compelling. The authors argue that the duration part of the QALY (i.e. time) needs to have ratio properties for the QALY model to function. Time obviously holds this property and it’s clear that duration can be anchored at zero. The authors then demonstrate that, for the QALY model to work, the health-utility scale must also exhibit ratio scale properties. The basis for this is the assumption that zero duration nullifies health states and that ‘dead’ nullifies duration. But the paper doesn’t challenge the conceptual basis for using dead in health state valuation exercises. Rather, it considers the mathematical properties that must hold to allow for dead=0, and asserts them. The authors’ conclusion that dead “needs to have the value of 0 in a QALY model” is correct, but only within the existing restrictions and assumptions underlying current practice. Nevertheless, this is a very useful study for understanding the challenge of anchoring and explicating the assumptions underlying the QALY model.

Credits