Chris Sampson’s journal round-up for 13th January 2020

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A vision ‘bolt-on’ increases the responsiveness of EQ-5D: preliminary evidence from a study of cataract surgery. The European Journal of Health Economics [PubMed] Published 4th January 2020

The EQ-5D is insensitive to differences in how well people can see, despite this seeming to be an important aspect of health. In contexts where the impact of visual impairment may be important, we could potentially use a ‘bolt-on’ item that asks about a person’s vision. I’m working on the development of a vision bolt-on at the moment. But ours won’t be the first. A previously-developed bolt-on has undergone some testing and has been shown to be sensitive to differences between people with different levels of visual function. However, there is little or no evidence to support its responsiveness to changes in visual function, which might arise from treatment.

For this study, 63 individuals were recruited prior to receiving cataract surgery in Singapore. Participants completed the EQ-5D-3L and EQ-5D-5L, both with and without a vision bolt-on, which matched the wording of other EQ-5D dimensions. Additionally, the SF-6D, HUI3, and VF-12 were completed along with a LogMAR assessment of visual acuity. The authors sought to compare the responsiveness of the EQ-5D with a vision bolt-on compared with the standard EQ-5D and the other measures. Therefore, all measures were completed before and after cataract surgery. Preference weights can be generated for the EQ-5D-3L with a vision bolt-on, but they can’t for the EQ-5D-5L, so the authors looked at rescaled sum scores to compare across all measures. Responsiveness was measured using indicators such as standardised effect size and response mean.

Visual acuity changed dramatically before and after surgery, for almost everybody. The authors found that the vision bolt-on does seem to provide a great deal more in the way of response to this, compared to the EQ-5D without the bolt-on. For instance, the mean change in the EQ-5D-3L index score was 0.018 without the vision bolt-on, and 0.031 with it. The HUI3 came out with a mean change of 0.105 and showed the highest responsiveness across all analyses.

Does this mean that we should all be using a vision bolt-on, or perhaps the HUI3? Not exactly. Something I see a lot in papers of this sort – including in this one – is the framing of a “superior responsiveness” as an indication that the measure is doing a better job. That isn’t true if the measure is responding to things to which we don’t want it to respond. As the authors point out, the HUI3 has quite different foundations to the EQ-5D. We also don’t want a situation where analysts can pick and choose measures according to which ever is most responsive to the thing to which they want it to be most responsive. In EuroQol parlance, what goes into the descriptive system is very important.

The causal effect of social activities on cognition: evidence from 20 European countries. Social Science & Medicine Published 9th January 2020

Plenty of studies have shown that cognitive abilities are correlated with social engagement, but few have attempted to demonstrate causality in a large sample. The challenge, of course, is that people who engage in more social activities are likely to have greater cognitive abilities for other reasons, and people’s decision to engage in social activities might depend on their cognitive abilities. This study tackles the question of causality using a novel (to me, at least) methodology.

The analysis uses data from five waves of SHARE (the Survey of Health, Ageing and Retirement in Europe). Survey respondents are asked about whether they engage in a variety of social activities, such as voluntary work, training, sports, or community-related organisations. From this, the authors generate an indicator for people participating in zero, one, or two or more of these activities. The survey also uses a set of tests to measure people’s cognitive abilities in terms of immediate recall capacity, delayed recall capacity, fluency, and numeracy. The authors look at each of these four outcomes, with 231,407 observations for the first three and 124,381 for numeracy (for which the questions were missing from some waves). Confirming previous findings, a strong positive correlation is found between engagement in social activities and each of the cognition indicators.

The empirical strategy, which I had never heard of, is partial identification. This is a non-parametric method that identifies bounds for the average treatment effect. Thus, it is ‘partial’ because it doesn’t identify a point estimate. Fewer assumptions means wider and less informative bounds. The authors start with a model with no assumptions, for which the lower bound for the treatment effect goes below zero. They then incrementally add assumptions. These include i) a monotone treatment response, assuming that social participation does not reduce cognitive abilities on average; ii) monotone treatment selection, assuming that people who choose to be socially active tend to have higher cognitive capacities; iii) a monotone instrumental variable assumption that body mass index is negatively associated with cognitive abilities. The authors argue that their methodology is not likely to be undermined by unobservables, as previous studies might.

The various models show that engaging in social activities has a positive impact on all four of the cognitive indicators. The assumption of monotone treatment response had the highest identifying power. For all models that included this, the 95% confidence intervals in the estimates showed a statistically significant positive impact of social activities on cognition. What is perhaps most interesting about this approach is the huge amount of uncertainty in the estimates. Social activities might have a huge effect on cognition or they might have a tiny effect. A basic OLS-type model, assuming exogenous selection, provides very narrow confidence intervals, whereas the confidence intervals on the partial identification models are almost as wide as the lower and upper band themselves.

One shortcoming of this study for me is that it doesn’t seek to identify the causal channels that have been proposed in previous literature (e.g. loneliness, physical activity, self-care). So it’s difficult to paint a clear picture of what’s going on. But then, maybe that’s the point.

Do research groups align on an intervention’s value? Concordance of cost-effectiveness findings between the Institute for Clinical and Economic Review and other health system stakeholders. Applied Health Economics and Health Policy [PubMed] Published 10th January 2020

Aside from having the most inconvenient name imaginable, ICER has been a welcome edition to the US health policy scene, appraising health technologies in order to provide guidance on coverage. ICER has become influential, with some pharmacy benefit managers using their assessments as a basis for denying coverage for low value medicines. ICER identify technologies as falling in one of three categories – high, low, or intermediate long-term value – according to whether the ICER (grr) falls below, above, or between the threshold range of $50,000-$175,000 per QALY. ICER conduct their own evaluations, but so do plenty of other people. This study sought to find out whether other analyses in the literature agree with ICER’s categorisations.

The authors consider 18 assessments by ICER, including 76 interventions, between 2015 and 2017. For each of these, the authors searched the literature for other comparative studies. Specifically, they went looking for cost-effectiveness analyses that employed the same perspectives and outcomes. Unfortunately, they were only able to identify studies for six disease areas and 14 interventions (of the 76), across 25 studies. It isn’t clear whether this is because there is a lack of literature out there – which would be an interesting finding in itself – or because their search strategy or selection criteria weren’t up to scratch. Of the 14 interventions compared, 10 get a more favourable assessment in the published studies than in their corresponding ICER evaluations, with most being categorised as intermediate value instead of low value. The authors go on to conduct one case study, comparing an ICER evaluation in the context of migraine with a published study by some of the authors of this paper. There were methodological differences. In some respects, it seems as if ICER did a more thorough job, while in other respects the published study seemed to use more defensible assumptions.

I agree with the authors that these kinds of comparisons are important. Not least, we need to be sure that ICER’s approach to appraisal is valid. The findings of this study suggest that maybe ICER should be looking at multiple studies and combining all available data in a more meaningful way. But the authors excluded too many studies. Some imperfect comparisons would have been more useful than exclusion – 14 of 76 is kind of pitiful and probably not representative. And I’m not sure why the authors set out to identify studies that are ‘more favourable’, rather than just different. That perspective seems to reveal an assumption that ICER are unduly harsh in their assessments.

Credits

Don Husereau’s journal round-up for 25th November 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Development and validation of the TRansparent Uncertainty ASsessmenT (TRUST) tool for assessing uncertainties in health economic decision models. PharmacoEconomics [PubMed] Published 11th November 2019

You’re going to quickly see that all three papers in today’s round-up align with some strong personal pet peeves that I harbour toward the nebulous world of market access and health technology assessment – most prominent is how loose we seem to be with language and form without overarching standards. This may be of no surprise to some when discussing a field which lacks a standard definition and for which many international standards of what constitutes good practice have never been defined.

This first paper deals with both issues and provides a useful tool for characterizing uncertainty. The authors state the purpose of the tool is “for systematically identifying, assessing, and reporting uncertainty in health economic models.” They suggest, to the best of their knowledge, no such tool exists. They also support the need for the tool by asserting that uncertainty in health economic modelling is often not fully characterized. The reasons, they suggest, are twofold: (1) there has been too much emphasis on imprecision; and (2) it is difficult to express all uncertainty.

I couldn’t agree more. What I sometimes deeply believe about those planning and conducting economic evaluation is that they obsess too often about uncertainty that is is less relevant (but more amenable to statistical adjustment) and don’t address uncertainty that payers actually care about. To wit, while it may be important to explore and adopt methods that deal with imprecision (dealing with known unknowns), such as improving utility variance estimates (from an SE of 0.003 to 0.011, yes sorry Kelvin and Feng for the callout), not getting this right is unlikely to lead to truly bad decisions. (Kelvin and Feng both know this.)

What is much more important for decision makers is uncertainty that stems from a lack of knowledge. These are unknown unknowns. In my experience this typically has to do with generalizability (how well will it work in different patients or against a different comparator?) and durability (how do I translate 16 weeks of data into a lifetime?); not things resolved by better variance estimates and probabilistic analysis. In Canada, our HTA body has even gone so far as to respond to the egregious act of not providing different parametric forms for extrapolation with the equally egregious act of using unrealistic time horizon adjustments to deal with this. Two wrongs don’t make a right.

To develop the tool, the authors first conducted a (presumably narrative) review of uncertainty frameworks and then ran identified concepts across a bunch of HTA expert committee types. They also used a previously developed framework as a basis for identifying all the places where uncertainty in HTA could occur. Using the concepts and the HTA areas they developed a tool which was presented a few times, and then validated through semi-structured interviews with different international stakeholders (N = 11), as well as insights into barriers to its use, user-friendliness, and feasibility.

Once the tool was developed, six case studies were worked up with an illustration of one of them (pembrolizumab for Hodgkin’s lymphoma) in the manuscript. While the tool does not provide a score or coefficient to adjust estimates or deal with uncertainty, it is not supposed to. What it is trying to do is make sure you are aware of them all so that you can make some determination as to whether the uncertainties are dealt with. One of the challenges of developing the tool is the lack of standardized terminology regarding uncertainty itself. While a short primer exists in the manuscript, for those who have looked into it, uncertainty terminology is far more uncertain than even the authors let on.

While I appreciate the tool and the attempt to standardize things, I do suspect the approach could have been strengthened (a systematic review and possibly a nominal group technique as is done for reporting guidelines). However, I’m not sure this would have gotten us much closer to the truth. Uncertainty needs to be sorted first and I am happy at their attempt. I hope it raises some awareness of how we can’t simply say we are “uncertain” as if that means something.

Unmet medical need: an introduction to definitions and stakeholder perceptions. Value in Health [PubMed] Published November 2019

The second, and also often-abused, term without an obvious definition is unmet medical need (UMN). My theory is that some confusion has arisen due to a confluence of marketing and clinical development teams and regulators. UMN has come to mean patients with rare diseases, drugs with ‘novel’ mechanisms of action, patients with highly prevalent disease, drugs with a more convenient formulation, or drugs with fewer side effects. And yet payers (in my experience) usually recognize none of these. Payers tend to characterize UMN in different ways: no drugs available to treat the condition, available drugs do not provide consistent or durable responses, and there have been no new medical developments in the area for > 10 years.

The purpose of this research then was to unpack the term UMN further. The authors conducted a comprehensive (gray) literature review to identify definitions of UMN in use by different stakeholders and then unpacked their meaning through definitions consultations with multi-European stakeholder discussions, trying to focus on the key elements of unmet medical need with a regulatory and reimbursement lens. This consisted of six one-hour teleconference calls and two workshops held in 2018. One open workshop involved 69 people from regulatory agencies, industry, payers, HTA bodies, patient organizations, healthcare, and academia.

A key finding of this work was that, yes indeed, UMN means different things to different people. A key dimension is whether unmet need is being defined in terms of individuals or populations. Population size (whether prevalent or rare) was not felt to be an element of the definition while there was general consensus that disease severity was. This means UMN should really only consider the UMNs of individual patients, not whether very few or very many patients are at need. It also means we see people who have higher rates of premature mortality and severe morbidity as having more of an unmet need, regardless of how many people are affected by the condition.

And last but not least was the final dimension of how many treatments are actually available. This, the authors point out, is the current legal definition in Europe (as laid down in Article 4, paragraph 2 of Commission Regulation [EC] No. 507/2006). And while this seems the most obvious definition of ‘need’ (we usually need things that are lacking) there was some acknowledgement by stakeholders that simply counting existing therapies is not adequate. There was also acknowledgement that there may be existing therapies available and still an UMN. Certainly this reflects my experience on the pan-Canadian Oncology Drug Review expert review committee, where unmet medical need was an explicit subdomain in their value framework, and where on more than one occasion it was felt, to my surprise, there was an unmet need despite the availability of two or more treatments.

Like the previous paper, the authors did not conduct a systematic review and could have consulted more broadly (no clinician stakeholders were consulted) or used more objective methods, a limitation they acknowledge but also unlikely to get them much further ahead in understanding. So what to do with this information? Well, the authors do propose an HTA approach that would triage reimbursement decision based on UMN. However, stakeholders commented that the method you use really depends on the HTA context. As such, the authors conclude that “the application of the definition within a broader framework depends on the scope of the stakeholder.” In other words, HTA must be fit for purpose (something we knew already). However, like uncertainty, I’m happy someone is actually trying to create reasonable coherent definitions of such an important concept.

On value frameworks and opportunity costs in health technology assessment. International Journal of Technology Assessment in Health Care [PubMed] Published 18th September 2019

The final, and most-abused term is that of ‘value’. While value seems an obvious prerequisite to those making investments in healthcare, and that we (some of us) are willing to acknowledge that value is what we are willing to give up to get something, what is less clear is what we want to get and what we want to give up.

The author of this paper, then, hopes to remind us of the various schools of thought on defining value in health that speak to these trade-offs. The first is broadly consistent with the welfarist school of economics and proposes that the value of health care used by decision makers should reflect individuals’ willingness to pay for it. An alternative approach – sometimes referred to as the extra-welfarist framework, argues that the value of a health technology should be consistent with the policy objectives of the health care system, typically health (the author states it is ‘health’ but I’m not sure it has to be). The final school of thought (which I was not familiar with and neither might you be which is the point of the paper) is what he terms ‘classical’, where the point is not to maximize a maximand or be held up to notions of efficiency but rather to discuss how consumers will be affected. The reference cited to support this framework is this interesting piece although I couldn’t find any allusion to the framework within.

What follows is a relatively fair treatment of extra-welfarist and welfarist applications to decision-making with a larger critical swipe at the former (using legitimate arguments that have been previously published – yes, extra-welfarists assume resources are divisible and, yes, extra-welfarists don’t identify the health-producing resources that will actually be displaced and, yes, using thresholds doesn’t always maximize health) and much downplay of the latter (how we might measure trade-offs reliably under a welfarist framework appears to be a mere detail until this concession is finally mentioned: “On account of the measurement issues surrounding [willingness to pay], there may be many situations in which no valid and reliable methods of operationalizing [welfarist economic value frameworks] exist.”) Given the premise of this commentary is that a recent commentary by Culyer seemed to overlook concepts of value beyond extra-welfarist ones, the swipe at extra-welfarist views is understandable. Hence, this paper can be seen as a kind of rebuttal and reminder that other views should not be ignored.

I like the central premise of the paper as summarized here:

“Although the concise term “value for money” may be much easier to sell to HTA decision makers than, for example, “estimated mean valuation of estimated change in mean health status divided by the estimated change in mean health-care costs,” the former loses too much in precision; it seems much less honest. Because loose language could result in dire consequences of economic evaluation being oversold to the HTA community, it should be avoided at all costs”

However, while I am really sympathetic to warning against conceptual shortcuts and loose language, I wonder if this paper misses the bigger point. Firstly, I’m not convinced we are making such bad decisions as those who wish the lambda to be silenced tend to want us to believe. But more importantly, while it is easy to be critical about economics applied loosely or misapplied, this paper (like others) offers no real practical solutions other than the need to acknowledge other frameworks. It is silent on the real reason extra-welfarist approaches and thresholds seem to have stuck around, namely, they have provided a practical and meaningful way forward for difficult decision-making and the HTA processes that support them. They make sense to decision-makers who are willing to overlook some of the conceptual wrinkles. And I’m a firm believer that conceptual models are a starting point for pragmatism. We shouldn’t be slaves to them.

Credits

Rachel Houten’s journal round-up for 11th November 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A comparison of national guidelines for network meta-analysis. Value in Health [PubMed] Published October 2019

The evolving treatment landscape results in a greater dependence on indirect treatment comparisons to generate estimates of clinical effectiveness, where the current practice has not been compared to the proposed new intervention in a head-to-head trial. This paper is a review of the guidelines of reimbursement bodies for conducting network meta-analyses. Reassuringly, the authors find that it is possible to meet the needs of multiple agencies with one analysis.

The authors assign three categories to the criteria; “assessment and analysis to test assumptions required for a network meta-analysis, presentation and reporting of results, and justification of modelling choices”, with heterogeneity of the included studies highlighted as one of the key elements to be sure to include if prioritisation of the criteria is necessary. I think this is a simple way of thinking about what needs to be presented but the ‘justification’ category, in my experience, is often given less weight than the other two.

This paper is a useful resource for companies submitting to multiple HTA agencies with the requirements of each national body displayed in tables that are easy to navigate. It meets a practical need but doesn’t really go far enough for me. They do signpost to the PRISMA criteria, but I think it would have been really good to think about the purpose of the submission guidelines; to encourage a logical and coherent summary of the approaches taken so the evidence can be evaluated by decision-makers.

Variation in responsiveness to warranted behaviour change among NHS clinicians: novel implementation of change detection methods in longitudinal prescribing data. BMJ [PubMed] Published 2nd October 2019

I really like this paper. Such a lot of work, from all sectors, is devoted to the production of relevant and timely evidence to inform practice, but if the guidance does not become embedded into the real world then its usefulness is limited.

The authors have managed to utilize a HUGE amount of data to identify the real reaction to two pieces of guidance recommending a change in practice in England. The authors used “trend indicator saturation”, which I’m not ashamed to admit I knew nothing about beforehand, but it is explained nicely. Their thoughtful use of the information available to them results in three indicators of response (in this case the deprescribing of two drugs) around when the change occurs, how quickly it occurs, and how much change occurs.

The authors discover variation in response to the recommendations but suggest an application of their methods could be used to generate feedback to clinicians and therefore drive further response. As some primary care practices took a while to embed the guidance change into their prescribing, the paper raises interesting questions as to where the barriers to the adoption of guidance have occurred.

What is next for patient preferences in health technology assessment? A systematic review of the challenges. Value in Health Published November 2019

It may be that patient preferences have a role to play in the uptake of guideline recommendations, as proposed by the authors of my final paper this week. This systematic review, of the literature around embedding patient preferences into HTA decision-making, groups the discussion in the academic literature into five broad areas; conceptual, normative, procedural, methodological, and practical. The authors state that their purpose was not to formulate their own views, merely to present the available literature, but they do a good job of indicating where to find more opinionated literature on this topic.

Methodological issues were the biggest group, with aspects such as the sample selection, internal and external validity of the preferences generated, and the generalisability of the preferences collected from a sample to the entire population. However, in general, the number of topics covered in the literature is vast and varied.

It’s a great summary of the challenges that are faced, and a ranking based on frequency of topic being mentioned in the literature drives the authors proposed next steps. They recommend further research into the incorporation of preferences within or beyond the QALY and the use of multiple-criteria decision analysis as a method of integrating patient preferences into decision-making. I support the need for “a scientifically and valid manner” to integrate patient preferences into HTA decision-making but wonder if we can first learn of what works well and hasn’t worked so well from the attempts of HTA agencies thus far.

Credits