Chris Sampson’s journal round-up for 2nd December 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

The treatment decision under uncertainty: the effects of health, wealth and the probability of death. Journal of Health Economics Published 16th November 2019

It’s important to understand how people make decisions about treatment. At the end of life, the question can become a matter of whether to have treatment or to let things take their course such that you end up dead. In order to consider this scenario, the author of this paper introduces the probability of death to some existing theoretical models of decision-making under uncertainty.

The diagnostic risk model and the therapeutic risk model can be used to identify risk thresholds that determine decisions about treatment. The diagnostic model relates to the probability that disease is present and the therapeutic model relates to the probability that treatment is successful. The new model described in this paper builds on these models to consider the impact on the decision thresholds of i) initial health state, ii) probability of death, and iii) wealth. The model includes wealth after death, in the form of a bequest. Limited versions of the model are also considered, excluding the bequest and excluding wealth (described as a ‘QALY model’). Both an individual perspective and an aggregate perspective are considered by excluding and including the monetary cost of diagnosis and treatment, to allow for a social insurance type setting.

The comparative statics show a lot of ambiguity, but there are a few things that the model can tell us. The author identifies treatment as having an ‘insurance effect’, by reducing diagnostic risk, a ‘protective effect’, by lowering the probability of death, and a risk-increasing effect associated with therapeutic risk. A higher probability of death increases the propensity for treatment in both the no-bequest model and the QALY model, because of the protective effect of treatment. In the bequest model, the impact is ambiguous, because treatment costs reduce the bequest. In the full model, wealthier individuals will choose to undergo treatment at a lower probability of success because of a higher marginal utility for survival, but the effect becomes ambiguous if the marginal utility of wealth depends on health (which it obviously does).

I am no theoretician, so it can take me a long time to figure these things out in my head. For now, I’m not convinced that it is meaningful to consider death in this way using a one-period life model. In my view, the very definition of death is a loss of time, which plays little or no part in this model. But I think my main bugbear is the idea that anybody’s decision about life saving treatment is partly determined by the amount of money they will leave behind. I find this hard to believe. The author links the finding that a higher probability of death increases treatment propensity to NICE’s end of life premium. Though I’m not convinced that the model has anything to do with NICE’s reasoning on this matter.

Moving toward evidence-based policy: the value of randomization for program and policy implementation. JAMA [PubMed] Published 15th November 2019

Evidence-based policy is a nice idea. We should figure out whether something works before rolling it out. But decision-makers (especially politicians) tend not to think in this way, because doing something is usually seen to be better than doing nothing. The authors of this paper argue that randomisation is the key to understanding whether a particular policy creates value.

Without evidence based on random allocation, it’s difficult to know whether a policy works. This, the authors argue, can undermine the success of effective interventions and allow harmful policies to persist. A variety of positive examples are provided from US healthcare, including trials of Medicare bundled payments. Apparently, such trials increased confidence in the programmes’ effects in a way that post hoc evaluations cannot, though no evidence of this increased confidence is actually provided. Policy evaluation is not always easy, so the authors describe four preconditions for the success of such studies: i) early engagement with policymakers, ii) willingness from policy leaders to support randomisation, iii) timing the evaluation in line with policymakers’ objectives, and iv) designing the evaluation in line with the realities of policy implementation.

These are sensible suggestions, but it is not clear why the authors focus on randomisation. The paper doesn’t do what it says on the tin, i.e. describe the value of randomisation. Rather, it explains the value of pre-specified policy evaluations. Randomisation may or may not deserve special treatment compared with other analytical tools, but this paper provides no explanation for why it should. The authors also suggest that people are becoming more comfortable with randomisation, as large companies employ experimental methods, particularly on the Internet with A/B testing. I think this perception is way off and that most people feel creeped out knowing that the likes of Facebook are experimenting on them without any informed consent. In the authors’ view, it being possible to randomise is a sufficient basis on which to randomise. But, considering the ethics, as well as possible methodological contraindications, it isn’t clear that randomisation should become the default.

A new tool for creating personal and social EQ-5D-5L value sets, including valuing ‘dead’. Social Science & Medicine Published 30th November 2019

Nobody can agree on the best methods for health state valuation. Or, at least, some people have disagreed loud enough to make it seem that way. Novel approaches to health state valuation are therefore welcome. Even more welcome is the development and testing of methods that you can try at home.

This paper describes the PAPRIKA method (Potentially All Pairwise RanKings of all possible Alternatives) of discrete choice experiment, implemented using 1000Minds software. Participants are presented with two health states that are defined in terms of just two dimensions, each lasting for 10 years, and asked to choose between them. Using the magical power of computers, an adaptive process identifies further choices, automatically ranking states using transitivity so that people don’t need to complete unnecessary tasks. In order to identify where ‘dead’ sits on the scale, a binary search procedure asks participants to compare EQ-5D states with being dead. What’s especially cool about this process is that everybody who completes it is able to view their own personal value set. These personal value sets can then be averaged to identify a social value set.

The authors used their tool to develop an EQ-5D-5L value set for New Zealand (which is where the researchers are based). They recruited 5,112 people in an online panel, such that the sample was representative of the general public. Participants answered 20 DCE questions each, on average, and almost half of them said that they found the questions difficult to answer. The NZ value set showed that anxiety/depression was associated with the greatest disutility, though each dimension has a notably similar level of impact at each level. The value set correlates well with numerous existing value sets.

The main limitation of this research seems to be that only levels 1, 3, and 5 of each EQ-5D-5L domain were included. Including levels 2 and 4 would more than double the number of questions that would need to be answered. It is also concerning that more than half of the sample was excluded due to low data quality. But the authors do a pretty good job of convincing us that this is for the best. Adaptive designs of this kind could be the future of health state valuation, especially if they can be implemented online, at low cost. I expect we’ll be seeing plenty more from PAPRIKA.

Credits

Don Husereau’s journal round-up for 25th November 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Development and validation of the TRansparent Uncertainty ASsessmenT (TRUST) tool for assessing uncertainties in health economic decision models. PharmacoEconomics [PubMed] Published 11th November 2019

You’re going to quickly see that all three papers in today’s round-up align with some strong personal pet peeves that I harbour toward the nebulous world of market access and health technology assessment – most prominent is how loose we seem to be with language and form without overarching standards. This may be of no surprise to some when discussing a field which lacks a standard definition and for which many international standards of what constitutes good practice have never been defined.

This first paper deals with both issues and provides a useful tool for characterizing uncertainty. The authors state the purpose of the tool is “for systematically identifying, assessing, and reporting uncertainty in health economic models.” They suggest, to the best of their knowledge, no such tool exists. They also support the need for the tool by asserting that uncertainty in health economic modelling is often not fully characterized. The reasons, they suggest, are twofold: (1) there has been too much emphasis on imprecision; and (2) it is difficult to express all uncertainty.

I couldn’t agree more. What I sometimes deeply believe about those planning and conducting economic evaluation is that they obsess too often about uncertainty that is is less relevant (but more amenable to statistical adjustment) and don’t address uncertainty that payers actually care about. To wit, while it may be important to explore and adopt methods that deal with imprecision (dealing with known unknowns), such as improving utility variance estimates (from an SE of 0.003 to 0.011, yes sorry Kelvin and Feng for the callout), not getting this right is unlikely to lead to truly bad decisions. (Kelvin and Feng both know this.)

What is much more important for decision makers is uncertainty that stems from a lack of knowledge. These are unknown unknowns. In my experience this typically has to do with generalizability (how well will it work in different patients or against a different comparator?) and durability (how do I translate 16 weeks of data into a lifetime?); not things resolved by better variance estimates and probabilistic analysis. In Canada, our HTA body has even gone so far as to respond to the egregious act of not providing different parametric forms for extrapolation with the equally egregious act of using unrealistic time horizon adjustments to deal with this. Two wrongs don’t make a right.

To develop the tool, the authors first conducted a (presumably narrative) review of uncertainty frameworks and then ran identified concepts across a bunch of HTA expert committee types. They also used a previously developed framework as a basis for identifying all the places where uncertainty in HTA could occur. Using the concepts and the HTA areas they developed a tool which was presented a few times, and then validated through semi-structured interviews with different international stakeholders (N = 11), as well as insights into barriers to its use, user-friendliness, and feasibility.

Once the tool was developed, six case studies were worked up with an illustration of one of them (pembrolizumab for Hodgkin’s lymphoma) in the manuscript. While the tool does not provide a score or coefficient to adjust estimates or deal with uncertainty, it is not supposed to. What it is trying to do is make sure you are aware of them all so that you can make some determination as to whether the uncertainties are dealt with. One of the challenges of developing the tool is the lack of standardized terminology regarding uncertainty itself. While a short primer exists in the manuscript, for those who have looked into it, uncertainty terminology is far more uncertain than even the authors let on.

While I appreciate the tool and the attempt to standardize things, I do suspect the approach could have been strengthened (a systematic review and possibly a nominal group technique as is done for reporting guidelines). However, I’m not sure this would have gotten us much closer to the truth. Uncertainty needs to be sorted first and I am happy at their attempt. I hope it raises some awareness of how we can’t simply say we are “uncertain” as if that means something.

Unmet medical need: an introduction to definitions and stakeholder perceptions. Value in Health [PubMed] Published November 2019

The second, and also often-abused, term without an obvious definition is unmet medical need (UMN). My theory is that some confusion has arisen due to a confluence of marketing and clinical development teams and regulators. UMN has come to mean patients with rare diseases, drugs with ‘novel’ mechanisms of action, patients with highly prevalent disease, drugs with a more convenient formulation, or drugs with fewer side effects. And yet payers (in my experience) usually recognize none of these. Payers tend to characterize UMN in different ways: no drugs available to treat the condition, available drugs do not provide consistent or durable responses, and there have been no new medical developments in the area for > 10 years.

The purpose of this research then was to unpack the term UMN further. The authors conducted a comprehensive (gray) literature review to identify definitions of UMN in use by different stakeholders and then unpacked their meaning through definitions consultations with multi-European stakeholder discussions, trying to focus on the key elements of unmet medical need with a regulatory and reimbursement lens. This consisted of six one-hour teleconference calls and two workshops held in 2018. One open workshop involved 69 people from regulatory agencies, industry, payers, HTA bodies, patient organizations, healthcare, and academia.

A key finding of this work was that, yes indeed, UMN means different things to different people. A key dimension is whether unmet need is being defined in terms of individuals or populations. Population size (whether prevalent or rare) was not felt to be an element of the definition while there was general consensus that disease severity was. This means UMN should really only consider the UMNs of individual patients, not whether very few or very many patients are at need. It also means we see people who have higher rates of premature mortality and severe morbidity as having more of an unmet need, regardless of how many people are affected by the condition.

And last but not least was the final dimension of how many treatments are actually available. This, the authors point out, is the current legal definition in Europe (as laid down in Article 4, paragraph 2 of Commission Regulation [EC] No. 507/2006). And while this seems the most obvious definition of ‘need’ (we usually need things that are lacking) there was some acknowledgement by stakeholders that simply counting existing therapies is not adequate. There was also acknowledgement that there may be existing therapies available and still an UMN. Certainly this reflects my experience on the pan-Canadian Oncology Drug Review expert review committee, where unmet medical need was an explicit subdomain in their value framework, and where on more than one occasion it was felt, to my surprise, there was an unmet need despite the availability of two or more treatments.

Like the previous paper, the authors did not conduct a systematic review and could have consulted more broadly (no clinician stakeholders were consulted) or used more objective methods, a limitation they acknowledge but also unlikely to get them much further ahead in understanding. So what to do with this information? Well, the authors do propose an HTA approach that would triage reimbursement decision based on UMN. However, stakeholders commented that the method you use really depends on the HTA context. As such, the authors conclude that “the application of the definition within a broader framework depends on the scope of the stakeholder.” In other words, HTA must be fit for purpose (something we knew already). However, like uncertainty, I’m happy someone is actually trying to create reasonable coherent definitions of such an important concept.

On value frameworks and opportunity costs in health technology assessment. International Journal of Technology Assessment in Health Care [PubMed] Published 18th September 2019

The final, and most-abused term is that of ‘value’. While value seems an obvious prerequisite to those making investments in healthcare, and that we (some of us) are willing to acknowledge that value is what we are willing to give up to get something, what is less clear is what we want to get and what we want to give up.

The author of this paper, then, hopes to remind us of the various schools of thought on defining value in health that speak to these trade-offs. The first is broadly consistent with the welfarist school of economics and proposes that the value of health care used by decision makers should reflect individuals’ willingness to pay for it. An alternative approach – sometimes referred to as the extra-welfarist framework, argues that the value of a health technology should be consistent with the policy objectives of the health care system, typically health (the author states it is ‘health’ but I’m not sure it has to be). The final school of thought (which I was not familiar with and neither might you be which is the point of the paper) is what he terms ‘classical’, where the point is not to maximize a maximand or be held up to notions of efficiency but rather to discuss how consumers will be affected. The reference cited to support this framework is this interesting piece although I couldn’t find any allusion to the framework within.

What follows is a relatively fair treatment of extra-welfarist and welfarist applications to decision-making with a larger critical swipe at the former (using legitimate arguments that have been previously published – yes, extra-welfarists assume resources are divisible and, yes, extra-welfarists don’t identify the health-producing resources that will actually be displaced and, yes, using thresholds doesn’t always maximize health) and much downplay of the latter (how we might measure trade-offs reliably under a welfarist framework appears to be a mere detail until this concession is finally mentioned: “On account of the measurement issues surrounding [willingness to pay], there may be many situations in which no valid and reliable methods of operationalizing [welfarist economic value frameworks] exist.”) Given the premise of this commentary is that a recent commentary by Culyer seemed to overlook concepts of value beyond extra-welfarist ones, the swipe at extra-welfarist views is understandable. Hence, this paper can be seen as a kind of rebuttal and reminder that other views should not be ignored.

I like the central premise of the paper as summarized here:

“Although the concise term “value for money” may be much easier to sell to HTA decision makers than, for example, “estimated mean valuation of estimated change in mean health status divided by the estimated change in mean health-care costs,” the former loses too much in precision; it seems much less honest. Because loose language could result in dire consequences of economic evaluation being oversold to the HTA community, it should be avoided at all costs”

However, while I am really sympathetic to warning against conceptual shortcuts and loose language, I wonder if this paper misses the bigger point. Firstly, I’m not convinced we are making such bad decisions as those who wish the lambda to be silenced tend to want us to believe. But more importantly, while it is easy to be critical about economics applied loosely or misapplied, this paper (like others) offers no real practical solutions other than the need to acknowledge other frameworks. It is silent on the real reason extra-welfarist approaches and thresholds seem to have stuck around, namely, they have provided a practical and meaningful way forward for difficult decision-making and the HTA processes that support them. They make sense to decision-makers who are willing to overlook some of the conceptual wrinkles. And I’m a firm believer that conceptual models are a starting point for pragmatism. We shouldn’t be slaves to them.

Credits

David Mott’s journal round-up for 16th September 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Opening the ‘black box’: an overview of methods to investigate the decision‑making process in choice‑based surveys. The Patient [PubMed] Published 5th September 2019

Choice-based surveys using methods such as discrete choice experiments (DCEs) and best-worst scaling (BWS) exercises are increasingly being used in health to understand people’s preferences. A lot of time and energy is spent on analysing the data that come out from these surveys but increasingly there is an interest in better understanding respondents’ decision-making processes. Whilst many will be aware of ‘think aloud’ interviews (often used for piloting), other methods may be less familiar as they’re not applied frequently in health. That’s where this fascinating paper by Dan Rigby and colleagues comes in. It provides an overview of five different methods of what they call ‘pre-choice process analysis’ of decision-making, describing the application, state of knowledge, and future research opportunities.

Eye-tracking has been used in health recently. It’s intuitive and provides an insight into where the participants’ focus is (or isn’t). The authors explained that one of the ways it has been used is to explore attribute non-attendance (ANA), which essentially occurs when people are ignoring attributes either because they’re irrelevant to them, or simply because it makes the task easier. However, surprisingly, it has been suggested that ‘visual ANA’ (not looking at the attribute) doesn’t always align with ‘stated ANA’ (participants stating that they ignored the attribute) – which raises some interesting questions!

However, the real highlight for me was the overview of the use of brain imaging techniques to explore choices being made in DCEs. One study highlighted by the authors – which was a DCE about eggs and is now at least #2 on my list of the bizarre preference study topics after this oddly specific one on Iberian ham – predicted choices from an initial ‘passive viewing’ using functional magnetic resonance imaging (fMRI). They found that incorporating changes in blood flow (prompted by changes in attribute levels during ‘passive viewing’) into a random utility model accounted for a lot of the variation in willingness to pay for eggs – pretty amazing stuff.

Whilst I’ve highlighted the more unusual methods here, after reading this overview I have to admit that I’m an even bigger advocate for the ‘think aloud’ technique now. Although it may have some limitations, the amount of insight offered combined with its practicality is hard to beat. Though maybe I’m biased because I know that I won’t get my hands on any eye-tracking or brain imaging devices any time soon. In any case, I highly recommend that any researchers conducting preference studies give this paper a read as it’s really well written and will surely be of interest.

Disentangling public preferences for health gains at end-of-life: further evidence of no support of an end-of-life premium. Social Science & Medicine [PubMed] Published 21st June 2019

The end of life (EOL) policy introduced by NICE in 2009 [PDF] has proven controversial. The policy allows treatments that are not cost-effective within the usual range to be considered for approval, provided that certain criteria are met. Specifically, that the treatment targets patients with a short life expectancy (≤24 months), offers a life extension (of ≥3 months) and is for a ‘small patient population’. One of the biggest issues with this policy is that it is unclear whether the general population actually supports the idea of valuing health gains (specifically life extension) at EOL more than other health gains.

Numerous academic studies, usually involving some form of stated preference exercise, have been conducted to test whether the public might support this EOL premium. A recent review by Koonal Shah and colleagues summarised the existing published studies (up to October 2017), highlighting that evidence is extremely mixed. This recently published Danish study, by Lise Desireé Hansen and Trine Kjær, adds to this literature. The authors conducted an incredibly thorough stated preference exercise to test whether quality of life (QOL) gains and life extension (LE) at EOL are valued differently from other similarly sized health gains. Not only that, but the study also explored the effect of perspective on results (social vs individual), the effect of age (18-35 vs. 65+), and impact of initial severity (25% vs. 40% initial QOL) on results.

Overall, they did not find evidence of support for an EOL premium for QOL gains or for LEs (regardless of perspective) but their results do suggest that QOL gains are preferred over LE. In some scenarios, there was slightly more support for EOL in the social perspective variant, relative to the individual perspective – which seems quite intuitive. Both age and initial severity had an impact on results, with respondents preferring to treat the young and those with worse QOL at baseline. One of the most interesting results for me was within their subgroup analyses, which suggested that women and those with a relation to a terminally ill patient had a significantly positive preference for EOL – but only in the social perspective scenarios.

This is a really well-designed study, which covers a lot of different concepts. This probably doesn’t end the debate on NICE’s use of the EOL criteria – not least because the study wasn’t conducted in England and Wales – but it contributes a lot. I’d consider it a must-read for anyone interested in this area.

How should we capture health state utility in dementia? Comparisons of DEMQOL-Proxy-U and of self- and proxy-completed EQ-5D-5L. Value in Health Published 26th August 2019

Capturing quality of life (QOL) in dementia and obtaining health state utilities is incredibly challenging; which is something that I’ve started to really appreciate recently upon getting involved in a EuroQol-funded ‘bolt-ons’ project. The EQ-5D is not always able to detect meaningful changes in cognitive function and condition-specific preference-based measures (PBMs), such as the DEMQOL, may be preferred as a result. However, this isn’t the only challenge because in many cases patients are not in a position to complete the surveys themselves. This means that proxy-reporting is often required, which could be done by either a professional (formal) carer, or a friend or family member (informal carer). Researchers that want to use a PBM in this population therefore have a lot to consider.

This paper compares the performance of the EQ-5D-5L and the DEMQOL-Proxy when completed by care home residents (EQ-5D-5L only), formal carers and informal carers. The impressive dataset that the authors use contains 1,004 care home residents, across up to three waves, and includes a battery of different cognitive and QOL measures. The overall objective was to compare the performance of the EQ-5D-5L and DEMQOL-Proxy, across the three respondent groups, based on 1) construct validity, 2) criterion validity, and 3) responsiveness.

The authors found that self-reported EQ-5D-5L scores were larger and less responsive to changes in the cognitive measures, but better at capturing residents’ self-reported QOL (based on a non-PBM) relative to proxy-reported scores. It is unclear whether this is a case of adaptation as seen in many other patient groups, or if the residents’ cognitive impairments prevent them from reliably assessing their current status. The proxy-reported EQ-5D-5L scores were generally more responsive to changes in the cognitive measures relative to the DEMQOL-Proxy (irrespective of which type of proxy), which the authors note is probably due to the fact that the DEMQOL-Proxy focuses more on the emotional impact of dementia rather than functional impairment.

Overall, this is a really interesting paper, which highlights the challenges well and illustrates that there is value in collecting these data from both patients and proxies. In terms of the PBM comparison, whilst the authors do not explicitly state it, it does seem that the EQ-5D-5L may have a slight upper hand due to its responsiveness, as well as for pragmatic reasons (the DEMQOL-Proxy has >30 questions). Perhaps a cognition ‘bolt-on’ to the EQ-5D-5L might help to improve the situation in future?

Credits