Simon McNamara’s journal round-up for 1st October 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A review of NICE appraisals of pharmaceuticals 2000-2016 found variation in establishing comparative clinical effectiveness. Journal of Clinical Epidemiology [PubMed] Published 17th September 2018

The first paper in this week’s round-up is on the topic on single arm studies; specifically, the way in which the comparative effectiveness of medicines granted a marketing authorisation on the basis of single arm studies have been evaluated in NICE appraisals. If you are interested in comparative effectiveness, single arm studies are difficult to deal with. If you don’t have a control arm to refer to, how do you know what the impact of the intervention is? If you don’t know how effective the intervention is, how can you say whether it is cost-effective?

In this paper, the authors conduct a review into the way this problem has been dealt with during NICE appraisals. They do this by searching through the 489 NICE technology appraisals conducted between 2010 and 2016. The search identified 22 relevant appraisals (4% of the total). The most commonly used way of estimating comparative effectiveness (19 of 22 appraisals) was simulation of a control arm using external data – be that from observational study or a randomised trial. Of these,14 of the appraisals featured naïve comparison across studies, with no attempt made to adjust for potential differences between population groups. The three appraisals that didn’t use external data were reliant upon the use of expert opinion, or the assumption that non-responders in the intervention single-arm study could be used as a proxy for those who would receive the comparator intervention.

Interestingly, the authors find little difference between the proportion of medicines reliant on non-RCT data being approved by NICE (83%), compared to those with RCT data (86%), however; the likelihood of receiving an “optimised” (aka subgroup) approval was substantially higher for medicines with solely non-RCT data (41% vs 19%). These findings demonstrate that NICE do accept models based on single-arm studies – even if more than 75% of the comparative effectiveness estimates these models were based on were reliant upon naïve indirect comparisons, or other less robust methods.

The paper concludes by noting that single-arm studies are becoming more common (50% of the appraisals identified were conducted in 2015-2016), and suggesting that HTA and regulatory bodies should work together, to develop guidance on how to evaluate comparative effectiveness based on single-arm studies.

I thought this paper was great, and it made me reflect on a couple of things. Firstly, the fact that NICE completed such a high volume of appraisals (489) between 2010 and 2016 is extremely impressive – well done NICE. Secondly, should the EMA, or EUnetHTA, play a larger role in providing estimates of comparative effectiveness for single arm studies? Whilst different countries may reasonably make different value judgements about different health outcomes, comparative effectiveness is – at least in theory – a matter of fact, rather than values, so can’t we assess it centrally?

A QALY loss is a QALY loss is a QALY loss: a note on independence of loss aversion from health states. The European Journal of Health Economics [PubMed] Published 18th September 2018

If I told you that you would receive £10 in return for doing some work for me, and then I only paid you £5, how annoyed would you be? What about if I told you I would give you £10 but then gave you £15? How delighted would you be? If you are economically rational then these two impacts (annoyance vs being delighted) should be symmetrical; but, if you are a human, your annoyance in the first scenario would likely outweigh the delight you would experience in the second. This is the basic idea behind Kahneman and Tversky’s seminal work on “loss aversion” – we dislike changes we perceive as losses more than we like equivalent changes we perceive as gains. The second paper in this week’s roundup explores loss aversion in the context of health. Application of loss aversion in health is a really interesting idea, because it calls into question the idea that people value all QALYs equally – perhaps QALYs perceived as losses are valued more highly than QALYs perceived as gains.

In the introduction of this paper, the authors note that existing evidence suggests loss aversion is present for duration of life, and for quality of life, but note that nobody has explored whether loss aversion remains constant if the two elements change together – simply put, when it comes to loss aversion is “a QALY loss a QALY loss a QALY loss”? The authors test this idea via a choice experiment fielded in a sample of 111 Dutch students. In this experiment, the loss aversion of each participant was independently elicited for four EQ-5D-5L health states – ranging from perfect health down to a health state utility value of 0.46.

As you might have guessed from the title of the paper, the authors found that, at the aggregate level, loss aversion was not significantly different between the four health states – albeit with some variation at the individual level. For each health state, perceived losses were weighted around two times as highly as perceived gains.

I enjoyed this paper, and it prompted me to think about the consequences of loss-aversion for health economics more generally. Do health related decision makers treat the outcomes associated with a new technology as a reference-point, and so feel loss aversion when considering not funding it? From a normative perspective, should we accept asymmetry in the valuation of health? Is this simply a behavioural quirk that we should over-ride in our analyses, or should we be conforming to it and granting differential weight to outcomes depending upon whether the recipient perceives it as a gain or a loss?

Advanced therapy medicinal products and health technology assessment principles and practices for value-based and sustainable healthcare. The European Journal of Health Economics [PubMed] Published 18th September 2018

The final paper in this week’s roundup is on “Advanced Therapy Medicinal Products” (ATMPs). According to the European Union Regulation 1394/2007, an ATMP is a medicine which is either (1) a gene therapy, (2) a somatic-cell therapy, (3) a tissue-engineered therapy, or (4) a combination of these approaches. I don’t pretend to understand the nuances of how these medicines work, but in simple terms ATMPs aim to replace, or regenerate, human cells, tissues and organs in order to treat ill health. Whilst ATMPs are thought to have great potential in improving health and providing long-term survival gains, they present a number of challenges for Health Technology Assessment (HTA) bodies.

This paper details a meeting of a panel of experts from the UK, Germany, France and Sweden, who were tasked with identifying and discussing these challenges. The experts identified three key challenges; (1) uncertainty of long-term benefit, and subsequently cost-effectiveness, (2) discount rates, and (3) capturing the broader “value” of these therapies – including the incremental value associated with potentially curative therapies. These three challenges stem from the fact that at the point of HTA, ATMPs are likely to have immature data and the uncertain prospect of long-term benefits. The experts suggest a range of solutions to these problems, including the use of outcomes-based reimbursement schemes, initiating a multi-disciplinary forum to consider different approaches to discounting, and further research into elements of “value” not captured by current HTA processes.

Whilst there is undoubtedly merit to some of these suggestions, I couldn’t help but feel a bit uneasy about this paper due to its funder – an ATMP manufacturer. Would the authors have written this paper if they hadn’t been paid to by a company with a vested interest in changing HTA systems to suit their agenda? Whilst I don’t doubt the paper was written independently of the company, and don’t mean to cast aspersions on the authors, this does make me question how industry shapes the areas of discourse in our field – even if it doesn’t shape the specific details of that discourse.

Many of the problems raised in this paper are not unique to ATMPs, they apply equally to all interventions with the uncertain prospect of potential cure or long-term benefit (e.g. for therapies for the treatment of early stage cancer, public health interventions or immunotherapies). Science aside, funder aside, what makes ATMPs any different to these prior interventions?

Credits

Chris Sampson’s journal round-up for 17th September 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Does competition from private surgical centres improve public hospitals’ performance? Evidence from the English National Health Service. Journal of Public Economics Published 11th September 2018

This study looks at proper (supply-side) privatisation in the NHS. The subject is the government-backed introduction of Independent Sector Treatment Centres (ISTCs), which, in the name of profit, provide routine elective surgical procedures to NHS patients. ISTCs were directed to areas with high waiting times and began rolling out from 2003.

The authors take pre-surgery length of stay as a proxy for efficiency and hypothesise that the entry of ISTCs would improve efficiency in nearby NHS hospitals. They also hypothesise that the ISTCs would cream-skim healthier patients, leaving NHS hospitals to foot the bill for a more challenging casemix. Difference-in-difference regressions are used to test these hypotheses, the treatment group being those NHS hospitals close to ISTCs and the control being those not likely to be affected. The authors use patient-level Hospital Episode Statistics from 2002-2008 for elective hip and knee replacements.

The key difficulty here is that the trend in length of stay changed dramatically at the time ISTCs began to be introduced, regardless of whether a hospital was affected by their introduction. This is because there was a whole suite of policy and structural changes being implemented around this period, many targeting hospital efficiency. So we’re looking at comparing new trends, not comparing changes in existing levels or trends.

The authors’ hypotheses prove right. Pre-surgery length of stay fell in exposed hospitals by around 16%. The ISTCs engaged in risk selection, meaning that NHS hospitals were left with sicker patients. What’s more, the savings for NHS hospitals (from shorter pre-surgery length of stay) were more than undermined by an increase in post-surgery length of stay, which may have been due to the change in casemix.

I’m not sure how useful difference-in-difference is in this case. We don’t know what the trend would have been without the intervention because the pre-intervention trend provides no clues about it and, while the outcome is shown to be unrelated to selection into the intervention, we don’t know whether selection into the ISTC intervention was correlated with exposure to other policy changes. The authors do their best to quell these concerns about parallel trends and correlated policy shocks, and the results appear robust.

Broadly speaking, the study satisfies my prior view of for-profit providers as leeches on the NHS. Still, I’m left a bit unsure of the findings. The problem is, I don’t see the causal mechanism. Hospitals had the financial incentive to be efficient and achieve a budget surplus without competition from ISTCs. It’s hard (for me, at least) to see how reduced length of stay has anything to do with competition unless hospitals used it as a basis for getting more patients through the door, which, given that ISTCs were introduced in areas with high waiting times, the hospitals could have done anyway.

While the paper describes a smart and thorough analysis, the findings don’t tell us whether ISTCs are good or bad. Both the length of stay effect and the casemix effect are ambiguous with respect to patient outcomes. If only we had some PROMs to work with…

One method, many methodological choices: a structured review of discrete-choice experiments for health state valuation. PharmacoEconomics [PubMed] Published 8th September 2018

Discrete choice experiments (DCEs) are in vogue when it comes to health state valuation. But there is disagreement about how they should be conducted. Studies can differ in terms of the design of the choice task, the design of the experiment, and the analysis methods. The purpose of this study is to review what has been going on; how have studies differed and what could that mean for our use of the value sets that are estimated?

A search of PubMed for valuation studies using DCEs – including generic and condition-specific measures – turned up 1132 citations, of which 63 were ultimately included in the review. Data were extracted and quality assessed.

The ways in which the studies differed, and the ways in which they were similar, hint at what’s needed from future research. The majority of recent studies were conducted online. This could be problematic if we think self-selecting online panels aren’t representative. Most studies used five or six attributes to describe options and many included duration as an attribute. The methodological tweaks necessary to anchor at 0=dead were a key source of variation. Those using duration varied in terms of the number of levels presented and the range of duration (from 2 months to 50 years). Other studies adopted alternative strategies. In DCE design, there is a necessary trade-off between statistical efficiency and the difficulty of the task for respondents. A variety of methods have been employed to try and ease this difficulty, but there remains a lack of consensus on the best approach. An agreed criterion for this trade-off could facilitate consistency. Some of the consistency that does appear in the literature is due to conformity with EuroQol’s EQ-VT protocol.

Unfortunately, for casual users of DCE valuations, all of this means that we can’t just assume that a DCE is a DCE is a DCE. Understanding the methodological choices involved is important in the application of resultant value sets.

Trusting the results of model-based economic analyses: is there a pragmatic validation solution? PharmacoEconomics [PubMed] Published 6th September 2018

Decision models are almost never validated. This means that – save for a superficial assessment of their outputs – they are taken at good faith. That should be a worry. This article builds on the experience of the authors to outline why validation doesn’t take place and to try to identify solutions. This experience includes a pilot study in France, NICE Evidence Review Groups, and the perspective of a consulting company modeller.

There are a variety of reasons why validation is not conducted, but resource constraints are a big part of it. Neither HTA agencies, nor modellers themselves, have the time to conduct validation and verification exercises. The core of the authors’ proposed solution is to end the routine development of bespoke models. Models – or, at least, parts of models – need to be taken off the shelf. Thus, open source or otherwise transparent modelling standards are a prerequisite for this. The key idea is to create ‘standard’ or ‘reference’ models, which can be extensively validated and tweaked. The most radical aspect of this proposal is that they should be ‘freely available’.

But rather than offering a path to open source modelling, the authors offer recommendations for how we should conduct ourselves until open source modelling is realised. These include the adoption of a modular and incremental approach to modelling, combined with more transparent reporting. I agree; we need a shift in mindset. Yet, the barriers to open source models are – I believe – the same barriers that would prevent these recommendations from being realised. Modellers don’t have the time or the inclination to provide full and transparent reporting. There is no incentive for modellers to do so. The intellectual property value of models means that public release of incremental developments is not seen as a sensible thing to do. Thus, the authors’ recommendations appear to me to be dependent on open source modelling, rather than an interim solution while we wait for it. Nevertheless, this is the kind of innovative thinking that we need.

Credits

Chris Sampson’s journal round-up for 23rd July 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Quantifying life: understanding the history of quality-adjusted life-years (QALYs). Social Science & Medicine [PubMed] Published 3rd July 2018

We’ve had some fun talking about the history of the QALY here on this blog. The story of how the QALY came to be important in health policy has been obscured. This paper seeks to address that. The research adopts a method called ‘multiple streams analysis’ (MSA) in order to explain how QALYs caught on. The MSA framework identifies three streams – policy, politics, and problems – and considers the ‘policy entrepreneurs’ involved. For this study, archival material was collected from the National Archives, Department of Health files, and the University of York. The researchers also conducted 44 semi-structured interviews with academics and civil servants.

The problem stream highlights shocks to the UK economy in the late 1960s, coupled with growth in health care costs due to innovations and changing expectations. Cost-effectiveness began to be studied and, increasingly, policymaking was meant to be research-based and accountable. By the 80s, the likes of Williams and Maynard were drawing attention to apparent inequities and inefficiencies in the health service. The policy stream gets going in the 40s and 50s when health researchers started measuring quality of life. By the early 60s, the idea of standardising these measures to try and rank health states was on the table. Through the late 60s and early 70s, government economists proliferated and proved themselves useful in health policy. The meeting of Rachel Rosser and Alan Williams in the mid-70s led to the creation of QALYs as we know them, combining quantity and quality of life on a 0-1 scale. Having acknowledged inefficiencies and inequities in the health service, UK politicians and medics were open to new ideas, but remained unconvinced by the QALY. Yet it was a willingness to consider the need for rationing that put the wheels in motion for NICE, and the politics stream – like the problem and policy stream – characterises favourable conditions for the use of the QALY.

The MSA framework also considers ‘policy entrepreneurs’ who broker the transition from idea to implementation. The authors focus on the role of Alan Williams and of the Economic Advisers’ Office. Williams was key in translating economic ideas into forms that policymakers could understand. Meanwhile, the Economic Advisers’ Office encouraged government economists to engage with academics at HESG and later the QoL Measurement Group (which led to the creation of EuroQol).

The main takeaway from the paper is that good ideas only prevail in the right conditions and with the right people. It’s important to maintain multi-disciplinary and multi-stakeholder networks. In the case of the QALY, the two-way movement of economists between government and academia was crucial.

I don’t completely understand or appreciate the MSA framework, but this paper is an enjoyable read. My only reservation is with the way the authors describe the QALY as being a dominant aspect of health policy in the UK. I don’t think that’s right. It’s dominant within a niche of a niche of a niche – that is, health technology assessment for new pharmaceuticals. An alternative view is that the QALY has in fact languished in a quiet corner of British policymaking, and been completely excluded in some other countries.

Accuracy of patient recall for self‐reported doctor visits: is shorter recall better? Health Economics [PubMed] Published 2nd July 2018

In designing observational studies, such as clinical trials, I have always recommended that self-reported resource use be collected no less frequently than every 3 months. This is partly based on something I once read somewhere that I can’t remember, but partly also on some logic that the accuracy of people’s recall decays over time. This paper has come to tell me how wrong I’ve been.

The authors start by highlighting that recall can be subject to omission, whereby respondents forget relevant information, or commission, whereby respondents include events that did not occur. A key manifestation of the latter is ‘telescoping’, whereby events are included from outside the recall period. We might expect commission to be more likely in short recalls and omission to be more common for long recalls. But there’s very little research on this regarding health service use.

This study uses data from a large trial in diabetes care in Australia, in which 5,305 participants were randomised to receive either 2-week, 3-month, or 12-month recall for how many times they had seen a doctor. Then, the trial data were matched with Medicare data to identify the true levels of resource use.

Over 92% of 12-month recall participants made an error, 76% of the 3-month recall, and 46% of the 2-week recall. The patterns of errors were different. There was very little under-reporting in the 2-week recall sample, with 3-month giving the most over-reporting and 12-month giving the most under-reporting. 12-month recall was associated with the largest number of days reported in error. However, when the authors account for the longer period being considered, and estimate a relative error, the impact of misreporting is smallest for the 12-month recall and greatest for the 2-week recall. This translates into a smaller overall bias for the longest recall period. The authors also find that older, less educated, unemployed, and low‐income patients exhibit higher measurement errors.

Health surveys and comparative studies that estimate resource use over a long period of time should use 12-month recall unless they can find a reason to do otherwise. The authors provide some examples from economic evaluations to demonstrate how selecting shorter recall periods could result in recommending the wrong decisions. It’s worth trying to understand the reasons why people can more accurately recall service use over 12 months. That way, data collection methods could be designed to optimise recall accuracy.

Who should receive treatment? An empirical enquiry into the relationship between societal views and preferences concerning healthcare priority setting. PLoS One [PubMed] Published 27th June 2018

Part of the reason the QALY faces opposition is that it has been used in a way that might not reflect societal preferences for resource allocation. In particular, the idea that ‘a QALY is a QALY is a QALY’ may conflict with notions of desert, severity, or process. We’re starting to see more evidence for groups of people holding different views, which makes it difficult to come up with decision rules to maximise welfare. This study considers some of the perspectives that people adopt, which have been identified in previous research – ‘equal right to healthcare’, ‘limits to healthcare’, and ‘effective and efficient healthcare’ – and looks at how they are distributed in the Netherlands. Using four willingness to trade-off (WTT) exercises, the authors explore the relationship between these views and people’s preferences about resource allocation. Trade-offs are between quality vs quantity of life, health maximisation vs equality, children vs the elderly, and lifestyle-related risk vs adversity. The authors sought to test several hypotheses: i) that ‘equal right’ respondents have a lower WTT; ii) ‘limits to healthcare’ people express a preference for health gains, health maximisation, and treating people with adversity; and iii) ‘effective and efficient’ people support health maximisation, treating children, and treating people with adversity.

A representative online sample of adults in the Netherlands (n=261) was recruited. The first part of the questionnaire collected socio-demographic information. The second part asked questions necessary to allocate people to one of the three perspectives using Likert scales based on a previous study. The third part of the questionnaire consisted of the four reimbursement scenarios. Participants were asked to identify the point (in terms of the relevant quantities) at which they would be indifferent between two options.

The distribution of the viewpoints was 65% ‘equal right’, 23% ‘limits to healthcare’, and 7% ‘effective and efficient’. 6% couldn’t be matched to one of the three viewpoints. In each scenario, people had the option to opt out of trading. 24% of respondents were non-traders for all scenarios and, of these, 78% were of the ‘equal right’ viewpoint. Unfortunately, a lot of people opted out of at least one of the trades, and for a wide variety of reasons. Decisionmakers can’t opt out, so I’m not sure how useful this is.

The authors describe many associations between individual characteristics, viewpoints, and WTT results. But the tested hypotheses were broadly supported. While the findings showed that different groups were more or less willing to trade, the points of indifference for traders within the groups did not vary. So while you can’t please everyone in health care priority setting, this study shows how policies might be designed to satisfy the preferences of people with different perspectives.

Credits