Chris Sampson’s journal round-up for 7th January 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Overview, update, and lessons learned from the international EQ-5D-5L valuation work: version 2 of the EQ-5D-5L valuation protocol. Value in Health Published 2nd January 2019

Insofar as there is any drama in health economics, the fallout from the EQ-5D-5L value set for England was pretty dramatic. If you ask me, the criticisms are entirely ill-conceived. Regardless of that, one of the main sticking points was that the version of the EQ-5D-5L valuation protocol that was used was flawed. England was one of the first countries to get a valuation, so it used version 1.0 of the EuroQol Valuation Technique (EQ-VT). We’re now up to version 2.1. This article outlines the issues that arose in using the first version, what EuroQol did to try and solve them, and describes the current challenges in valuation.

EQ-VT 1.0 includes the composite time trade-off (cTTO) task to elicit values for health states better and worse than dead. Early valuation studies showed some unusual patterns. Research into the causes of this showed that in many cases there was very little time spent on the task. Some interviewers had a tendency to skip parts of the explanation for completing the worse-than-dead bit of the cTTO, resulting in no values worse than dead. EQ-VT 1.1 added three practise valuations along with greater monitoring of interviewer performance and a quality control procedure. This dramatically reduced interviewer effects and the likelihood of inconsistent responses. Yet further improvements could be envisioned. And so EQ-VT 2.0 added a feedback module. The feedback module shows respondents the ranking of states implied by their valuations, with which respondents can then agree or disagree. 2.0 was tested against 1.1 and showed further reductions in inconsistencies thanks to the feedback module. Other modifications were not supported by the evaluation. EQ-VT 2.1 added a dynamic question to further improve the warm-up tasks.

There are ongoing challenges with the cTTO, mostly to do with how to model the data. The authors provide a table setting out causes, consequences, and possible solutions for various issues that might arise in the modelling of cTTO data. And then there’s the discrete choice experiment (DCE), which is included in addition to the cTTO, but which different valuation studies used (or did not use) differently in modelling values. Research is ongoing that will probably lead to developments beyond EQ-VT 2.1. This might involve abandoning the cTTO altogether. Or, at least, there might be a reduction in cTTO tasks and a greater reliance on DCE. But more research is needed before duration can be adequately incorporated into DCEs.

Helpfully, the paper includes a table with a list of countries and specification of the EQ-VT versions used. This demonstrates the vast amount of knowledge that has been accrued about EQ-5D-5L valuation and the lack of wisdom in continuing to support the (relatively under-interrogated) EQ-5D-3L MVH valuation.

Do time trade-off values fully capture attitudes that are relevant to health-related choices? The European Journal of Health Economics [PubMed] Published 31st December 2018

Different people have different preferences, so values for health states elicited using TTO should vary from person to person. This study is concerned with how personal circumstances and beliefs influence TTO values and whether TTO entirely captures the impact of these on preferences for health states.

The authors analysed data from an online survey with a UK-representative sample of 1,339. Participants were asked about their attitudes towards quality and quantity of life, before completing some TTO tasks based on the EQ-5D-5L. Based on their response, they were shown two ‘lives’ that – given their TTO response – they should have considered to be of equivalent value. The researchers constructed generalised estimating equations to model the TTO values and logit models for the subsequent choices between states. Age, marital status, education, and attitudes towards trading quality and quantity of life all determined TTO values in addition to the state that was being valued. In the modelling of the decisions about the two lives, attitudes influenced decisions through the difference between the two lives in the number of life years available. That is, an interaction term between the attitudes variable and years variables showed that people who prefer quantity of life over quality of life were more likely to choose the state with a greater number of years.

The authors’ interpretation from this is that TTO reflects people’s attitudes towards quality and quantity of life, but only partially. My interpretation would be that the TTO exercise would have benefitted from the kind of refinement described above. The choice between the two lives is similar to the feedback module of the EQ-VT 2.0. People often do not understand the implications of their TTO valuations. The study could also be interpreted as supportive of ‘head-to-head’ choice methods (such as DCE) rather than making choices involving full health and death. But the design of the TTO task used in this study was quite dissimilar to others, which makes it difficult to say anything generally about TTO as a valuation method.

Exploring the item sets of the Recovering Quality of Life (ReQoL) measures using factor analysis. Quality of Life Research [PubMed] Published 21st December 2018

The ReQoL is a patient-reported outcome measure for use with people experiencing mental health difficulties. The ReQoL-10 and ReQoL-20 both ask questions relating to seven domains: six mental, one physical. There’s been a steady stream of ReQoL research published in recent years and the measures have been shown to have acceptable psychometric properties. This study concerns the factorial structure of the ReQoL item sets, testing internal construct validity and informing scoring procedures. There’s also a more general methodological contribution relating to the use of positive and negative factors in mental health outcome questionnaires.

At the outset of this study, the ReQoL was based on 61 items. These were reduced to 40 on the basis of qualitative and quantitative analysis reported in other papers. This paper reports on two studies – the first group (n=2,262) completed the 61 items and the second group (n=4,266) completed 40 items. Confirmatory factor analysis and exploratory factor analysis were conducted. Six-factor (according to ReQoL domains), two-factor (negative/positive) and bi-factor (global/negative/positive) models were tested. In the second study, participants were either presented with a version that jumbled up the positively and negatively worded questions or a version that showed a block of negatives followed by a block of positives. The idea here is that if a two-factor structure is simply a product of the presentation of questions, it should be more pronounced in the jumbled version.

The results were much the same from the two study samples. The bi-factor model demonstrated acceptable fit, with much higher factor loadings on the general quality of life factor that loaded on all items. The results indicated sufficient unidimensionality to go ahead with reducing the number of items and the two ordering formats didn’t differ, suggesting that the negative and positive loadings weren’t just an artefact of the presentation. The findings show that the six dimensions of the ReQoL don’t stand as separate factors. The justification for maintaining items from each of the six dimensions, therefore, seems to be a qualitative one.

Some outcome measurement developers have argued that items should all be phrased in the same direction – as either positive or negative – to obtain high-quality data. But there’s good reason to think that features of mental health can’t reliably be translated from negative to positive, and this study supports the inclusion (and intermingling) of both within a measure.

Credits

Chris Sampson’s journal round-up for 17th September 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Does competition from private surgical centres improve public hospitals’ performance? Evidence from the English National Health Service. Journal of Public Economics Published 11th September 2018

This study looks at proper (supply-side) privatisation in the NHS. The subject is the government-backed introduction of Independent Sector Treatment Centres (ISTCs), which, in the name of profit, provide routine elective surgical procedures to NHS patients. ISTCs were directed to areas with high waiting times and began rolling out from 2003.

The authors take pre-surgery length of stay as a proxy for efficiency and hypothesise that the entry of ISTCs would improve efficiency in nearby NHS hospitals. They also hypothesise that the ISTCs would cream-skim healthier patients, leaving NHS hospitals to foot the bill for a more challenging casemix. Difference-in-difference regressions are used to test these hypotheses, the treatment group being those NHS hospitals close to ISTCs and the control being those not likely to be affected. The authors use patient-level Hospital Episode Statistics from 2002-2008 for elective hip and knee replacements.

The key difficulty here is that the trend in length of stay changed dramatically at the time ISTCs began to be introduced, regardless of whether a hospital was affected by their introduction. This is because there was a whole suite of policy and structural changes being implemented around this period, many targeting hospital efficiency. So we’re looking at comparing new trends, not comparing changes in existing levels or trends.

The authors’ hypotheses prove right. Pre-surgery length of stay fell in exposed hospitals by around 16%. The ISTCs engaged in risk selection, meaning that NHS hospitals were left with sicker patients. What’s more, the savings for NHS hospitals (from shorter pre-surgery length of stay) were more than undermined by an increase in post-surgery length of stay, which may have been due to the change in casemix.

I’m not sure how useful difference-in-difference is in this case. We don’t know what the trend would have been without the intervention because the pre-intervention trend provides no clues about it and, while the outcome is shown to be unrelated to selection into the intervention, we don’t know whether selection into the ISTC intervention was correlated with exposure to other policy changes. The authors do their best to quell these concerns about parallel trends and correlated policy shocks, and the results appear robust.

Broadly speaking, the study satisfies my prior view of for-profit providers as leeches on the NHS. Still, I’m left a bit unsure of the findings. The problem is, I don’t see the causal mechanism. Hospitals had the financial incentive to be efficient and achieve a budget surplus without competition from ISTCs. It’s hard (for me, at least) to see how reduced length of stay has anything to do with competition unless hospitals used it as a basis for getting more patients through the door, which, given that ISTCs were introduced in areas with high waiting times, the hospitals could have done anyway.

While the paper describes a smart and thorough analysis, the findings don’t tell us whether ISTCs are good or bad. Both the length of stay effect and the casemix effect are ambiguous with respect to patient outcomes. If only we had some PROMs to work with…

One method, many methodological choices: a structured review of discrete-choice experiments for health state valuation. PharmacoEconomics [PubMed] Published 8th September 2018

Discrete choice experiments (DCEs) are in vogue when it comes to health state valuation. But there is disagreement about how they should be conducted. Studies can differ in terms of the design of the choice task, the design of the experiment, and the analysis methods. The purpose of this study is to review what has been going on; how have studies differed and what could that mean for our use of the value sets that are estimated?

A search of PubMed for valuation studies using DCEs – including generic and condition-specific measures – turned up 1132 citations, of which 63 were ultimately included in the review. Data were extracted and quality assessed.

The ways in which the studies differed, and the ways in which they were similar, hint at what’s needed from future research. The majority of recent studies were conducted online. This could be problematic if we think self-selecting online panels aren’t representative. Most studies used five or six attributes to describe options and many included duration as an attribute. The methodological tweaks necessary to anchor at 0=dead were a key source of variation. Those using duration varied in terms of the number of levels presented and the range of duration (from 2 months to 50 years). Other studies adopted alternative strategies. In DCE design, there is a necessary trade-off between statistical efficiency and the difficulty of the task for respondents. A variety of methods have been employed to try and ease this difficulty, but there remains a lack of consensus on the best approach. An agreed criterion for this trade-off could facilitate consistency. Some of the consistency that does appear in the literature is due to conformity with EuroQol’s EQ-VT protocol.

Unfortunately, for casual users of DCE valuations, all of this means that we can’t just assume that a DCE is a DCE is a DCE. Understanding the methodological choices involved is important in the application of resultant value sets.

Trusting the results of model-based economic analyses: is there a pragmatic validation solution? PharmacoEconomics [PubMed] Published 6th September 2018

Decision models are almost never validated. This means that – save for a superficial assessment of their outputs – they are taken at good faith. That should be a worry. This article builds on the experience of the authors to outline why validation doesn’t take place and to try to identify solutions. This experience includes a pilot study in France, NICE Evidence Review Groups, and the perspective of a consulting company modeller.

There are a variety of reasons why validation is not conducted, but resource constraints are a big part of it. Neither HTA agencies, nor modellers themselves, have the time to conduct validation and verification exercises. The core of the authors’ proposed solution is to end the routine development of bespoke models. Models – or, at least, parts of models – need to be taken off the shelf. Thus, open source or otherwise transparent modelling standards are a prerequisite for this. The key idea is to create ‘standard’ or ‘reference’ models, which can be extensively validated and tweaked. The most radical aspect of this proposal is that they should be ‘freely available’.

But rather than offering a path to open source modelling, the authors offer recommendations for how we should conduct ourselves until open source modelling is realised. These include the adoption of a modular and incremental approach to modelling, combined with more transparent reporting. I agree; we need a shift in mindset. Yet, the barriers to open source models are – I believe – the same barriers that would prevent these recommendations from being realised. Modellers don’t have the time or the inclination to provide full and transparent reporting. There is no incentive for modellers to do so. The intellectual property value of models means that public release of incremental developments is not seen as a sensible thing to do. Thus, the authors’ recommendations appear to me to be dependent on open source modelling, rather than an interim solution while we wait for it. Nevertheless, this is the kind of innovative thinking that we need.

Credits

Chris Sampson’s journal round-up for 11th June 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

End-of-life healthcare expenditure: testing economic explanations using a discrete choice experiment. Journal of Health Economics Published 7th June 2018

People incur a lot of health care costs at the end of life, despite the fact that – by definition – they aren’t going to get much value from it (so long as we’re using QALYs, anyway). In a 2007 paper, Gary Becker and colleagues put forward a theory for the high value of life and high expenditure on health care at the end of life. This article sets out to test a set of hypotheses derived from this theory, namely: i) higher willingness-to-pay (WTP) for health care with proximity to death, ii) higher WTP with greater chance of survival, iii) societal WTP exceeds individual WTP due to altruism, and iv) societal WTP may exceed individual WTP due to an aversion to restricting access to new end-of-life care. A further set of hypotheses relating to the ‘pain of risk-bearing’ is also tested. The authors conducted an online discrete choice experiment (DCE) with 1,529 Swiss residents, which asked respondents to suppose that they had terminal cancer and was designed to elicit WTP for a life-prolonging novel cancer drug. Attributes in the DCE included survival, quality of life, and ‘hope’ (chance of being cured). Individual WTP – using out-of-pocket costs – and societal WTP – based on social health insurance – were both estimated. The overall finding is that the hypotheses are on the whole true, at least in part. But the fact is that different people have different preferences – the authors note that “preferences with regard to end-of-life treatment are very heterogeneous”. The findings provide evidence to explain the prevailing high level of expenditure in end of life (cancer) care. But the questions remain of what we can or should do about it, if anything.

Valuation of preference-based measures: can existing preference data be used to generate better estimates? Health and Quality of Life Outcomes [PubMed] Published 5th June 2018

The EuroQol website lists EQ-5D-3L valuation studies for 27 countries. As the EQ-5D-5L comes into use, we’re going to see a lot of new valuation studies in the pipeline. But what if we could use data from one country’s valuation to inform another’s? The idea is that a valuation study in one country may be able to ‘borrow strength’ from another country’s valuation data. The author of this article has developed a Bayesian non-parametric model to achieve this and has previously applied it to UK and US EQ-5D valuations. But what about situations in which few data are available in the country of interest, and where the country’s cultural characteristics are substantially different. This study reports on an analysis to generate an SF-6D value set for Hong Kong, firstly using the Hong Kong values only, and secondly using the UK value set as a prior. As expected, the model which uses the UK data provided better predictions. And some of the differences in the valuation of health states are quite substantial (i.e. more than 0.1). Clearly, this could be a useful methodology, especially for small countries. But more research is needed into the implications of adopting the approach more widely.

Can a smoking ban save your heart? Health Economics [PubMed] Published 4th June 2018

Here we have another Swiss study, relating to the country’s public-place smoking bans. Exposure to tobacco smoke can have an acute and rapid impact on health to the extent that we would expect an immediate reduction in the risk of acute myocardial infarction (AMI) if a smoking ban reduces the number of people exposed. Studies have already looked at this effect, and found it to be large, but mostly with simple pre-/post- designs that don’t consider important confounding factors or prevailing trends. This study tests the hypothesis in a quasi-experimental setting, taking advantage of the fact that the 26 Swiss cantons implemented smoking bans at different times between 2007 and 2010. The authors analyse individual-level data from Swiss hospitals, estimating the impact of the smoking ban on AMI incidence, with area and time fixed effects, area-specific time trends, and unemployment. The findings show a large and robust effect of the smoking ban(s) for men, with a reduction in AMI incidence of about 11%. For women, the effect is weaker, with an average reduction of around 2%. The evidence also shows that men in low-education regions experienced the greatest benefit. What makes this an especially nice paper is that the authors bring in other data sources to help explain their findings. Panel survey data are used to demonstrate that non-smokers are likely to be the group benefitting most from smoking bans and that people working in public places and people with less education are most exposed to environmental tobacco smoke. These findings might not be generalisable to other settings. Other countries implemented more gradual policy changes and Switzerland had a particularly high baseline smoking rate. But the findings suggest that smoking bans are associated with population health benefits (and the associated cost savings) and could also help tackle health inequalities.

Credits