Chris Sampson’s journal round-up for 25th March 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

How prevalent are implausible EQ-5D-5L health states and how do they affect valuation? A study combining quantitative and qualitative evidence. Value in Health Published 15th March 2019

The EQ-5D-5L is able to describe a lot of different health states (3,125, to be precise), including some that don’t seem likely to ever be observed. For example, it’s difficult to conceive of somebody having extreme problems in pain/discomfort and anxiety/depression while also having no problems with usual activities. Valuation studies exclude these kinds of states because it’s thought that their inclusion could negatively affect the quality of the data. But there isn’t much evidence to help us understand how ‘implausibility’ might affect valuations, or which health states are seen as implausible.

This study is based on an EQ-5D-5L valuation exercise with 890 students in China. The valuation was conducted using the EQ VAS, rather than the standard EuroQol valuation protocol, with up to 197 states being valued by each student. Two weeks after conducting the valuation, participants were asked to indicate (yes or no) whether or not the states were implausible. After that, a small group were invited to participate in a focus group or interview.

No health state was unanimously identified as implausible. Only four states were unanimously rated as not being implausible. 910 of the 3,125 states defined by the EQ-5D-5L were rated implausible by at least half of the people who rated them. States more commonly rated as implausible were of moderate severity overall, but with divergent severities between states (i.e. 5s and 1s together). Overall, implausibility was associated with lower valuations.

Four broad themes arose from the qualitative work, namely i) reasons for implausibility, ii) difficulties in valuing implausible states, iii) strategies for valuing implausible states, and iv) values of implausible states. Some states were considered to have logical conflicts, with some dimensions being seen as mutually inclusive (e.g. walking around is a usual activity). The authors outline the themes and sub-themes, which are a valuable contribution to our understanding of what people think when they complete a valuation study.

This study makes plain the fact that there is a lot of heterogeneity in perceptions of implausibility. But the paper doesn’t fully address the issue of what plausibility actually means. The authors describe it as subjective. I’m not sure about that. For me, it’s an empirical question. If states are observed in practice, they are plausible. We need meaningful valuations of states that are observed, so perhaps the probability of a state being included in a valuation exercise should correspond to the probability of it being observed in reality. The difficulty of valuing a state may relate to plausibility – as this work shows – but that difficulty is a separate issue. Future research on implausible health states should be aligned with research on respondents’ experience of health states. Individuals’ judgments about the plausibility of health states (and the accuracy of those judgments) will depend on individuals’ experience.

An EU-wide approach to HTA: an irrelevant development or an opportunity not to be missed? The European Journal of Health Economics [PubMed] Published 14th March 2019

The use of health technology assessment is now widespread across the EU. The European Commission recently saw an opportunity to rationalise disparate processes and proposed new regulation for cooperation in HTA across EU countries. In particular, the proposal targets cooperation in the assessment of the relative effectiveness of pharmaceuticals and medical devices. A key purpose is to reduce duplication of efforts, but it should also make the basis for national decision-making more consistent.

The authors of this editorial argue that the regulation needs to provide more clarity, in the definition of clinical value, and of the quality of evidence that is acceptable, which vary across EU Member States. There is also a need for the EU to support early dialogue and scientific advice. There is also scope to support the generation and use of real-world evidence. The authors also argue that the challenges for medical device assessment are particularly difficult because many medical device companies cannot – or are not incentivised to – generate sufficient evidence for assessment.

As the final paragraph argues, EU cooperation in HTA isn’t likely to be associated with much in the way of savings. This is because appraisals will still need to be conducted in each country, as well as an assessment of country-specific epidemiology and other features of the population. The main value of cooperation could be in establishing a stronger position for the EU in negotiating in matters of drug design and evidence requirements. Not that we needed any more reasons to stop Brexit.

Patient-centered item selection for a new preference-based generic health status instrument: CS-Base. Value in Health Published 14th March 2019

I do not believe that we need a new generic measure of health. This paper was always going to have a hard time convincing me otherwise…

The premise for this work is that generic preference-based measures of health (such as the EQ-5D) were not developed with patients. True. So the authors set out to create one that is. A key feature of this study is the adoption of a framework that aligns with the multiattribute preference response model, whereby respondents rate their own health state relative to another. This is run through a mobile phone app.

The authors start by extracting candidate items from existing health frameworks and generic measures (which doesn’t seem to be a particularly patient-centred approach) and some domains were excluded for reasons that are not at all clear. 47 domains were included after overlapping candidates were removed. The 47 were classified as physical, mental, social, or ‘meta’. An online survey was conducted by a market research company. 2,256 ‘patients’ (people with diseases or serious complaints) were asked which 9 domains they thought were most important. Why 9? Because the authors figured it was the maximum that could fit on the screen of a mobile phone.

Of the candidate items, 5 were regularly selected in the survey: pain, personal relationships, fatigue, memory, and vision. Mobility and daily activities were also judged important enough to be included. Independence and self-esteem were added as paired domains and hearing was paired with the vision domain. The authors also added anxiety/depression as a pair of domains because they thought it was important. Thus, 12 items were included altogether, of which 6 were parts of pairs. Items were rephrased according to the researchers’ preferences. Each item was given 4 response levels.

It is true to say (as the authors do) that most generic preference-based measures (most notably the EQ-5D) were not developed with direct patient input. The argument goes that this somehow undermines the measure. But there are a) plenty of patient-centred measures for which preference-based values could be created and b) plenty of ways in which existing measures can be made patient-centred post hoc (n.b. our bolt-on study).

Setting aside my scepticism about the need for a new measure, I have a lot of problems with this study and with the resulting CS-Base instrument. The defining feature of its development seems to be arbitrariness. The underlying framework (as far as it is defined) does not seem well-grounded. The selection of items was largely driven by researchers. The wording was entirely driven by the researchers. The measure cannot justifiably be called ‘patient-centred’. It is researcher-centred, even if the researchers were able to refer to a survey of patients. And the whole thing has nothing whatsoever to do with preferences. The measure may prove fantastic at capturing health outcomes, but if it does it will be in spite of the methods used for its development, not because of them. Ironically, that would be a good advert for researcher-centred outcome development.

Proximity to death and health care expenditure increase revisited: a 15-year panel analysis of elderly persons. Health Economics Review [PubMed] [RePEc] Published 11th March 2019

It is widely acknowledged that – on average – people incur a large proportion of their lifetime health care costs in the last few years of their life. But there’s still a question mark over whether it is proximity to death that drives costs or age-related morbidity. The two have very different implications – we want people to be living for longer, but we probably don’t want them to be dying for longer. There’s growing evidence that proximity to death is very important, but it isn’t clear how important – if at all – ageing is. It’s important to understand this, particularly in predicting the impacts of demographic changes.

This study uses Swiss health insurance claims data for around 104,000 people over the age of 60 between 1996 and 2011. Two-part regression models were used to estimate health care expenditures conditional on them being greater than zero. The author analysed both birth cohorts and age classes to look at age-associated drivers of health care expenditure.

As expected, health care expenditures increased with age. The models imply that proximity-to-death has grown in importance over time. For the 1931-35 birth cohort, for example, the proportion of expenditures explained by proximity-to-death rose from 19% to 31%. Expenditures were partly explained by morbidity, and this effect appeared to be relatively constant over time. Thus, proximity to death is not the only determinant of rising expenditures (even if it is an important one). Looking at different age classes over time, there was no clear picture in the trajectory of health care expenditures. For the oldest age groups (76-85), health care expenditures were growing, but for some of the younger groups, costs appeared to be decreasing over time. This study paints a complex picture of health care expenditures, calling for complex policy responses. Part of this could be supporting people to commence palliative care earlier, but there is also a need for more efficient management of chronic illness over the long term.

Credits

Chris Sampson’s journal round-up for 5th November 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Stratified treatment recommendation or one-size-fits-all? A health economic insight based on graphical exploration. The European Journal of Health Economics [PubMed] Published 29th October 2018

Health care is increasingly personalised. This creates the need to evaluate interventions for smaller and smaller subgroups as patient heterogeneity is taken into account. And this usually means we lack the statistical power to have confidence in our findings. The purpose of this paper is to consider the usefulness of a tool that hasn’t previously been employed in economic evaluation – the subpopulation treatment effect pattern plot (STEPP). STEPP works by assessing the interaction between treatments and covariates in different subgroups, which can then be presented graphically. Imagine your X-axis with the values defining the subgroups and your Y-axis showing the treatment outcome. This information can then be used to determine which subgroups exhibit positive outcomes.

This study uses data from a trial-based economic evaluation in heart failure, where patients’ 18-month all-cause mortality risk was estimated at baseline before allocation to one of three treatment strategies. For the STEPP procedure, the authors use baseline risk to define subgroups and adopt net monetary benefit at the patient level as the outcome. The study makes two comparisons (between three alternative strategies) and therefore presents two STEPP figures. The STEPP figures are used to identify subgroups, which the authors apply in a stratified cost-effectiveness analysis, estimating net benefit in each defined risk subgroup.

Interpretation of the STEPPs is a bit loose, with no hard decision rules. The authors suggest that one of the STEPPs shows no clear relationship between net benefit and baseline risk in terms of the cost-effectiveness of the intervention (care as usual vs basic support). The other STEPP shows that, on average, people with baseline risk below 0.16 have a positive net benefit from the intervention (intensive support vs basic support), while those with higher risk do not. The authors evaluate this stratification strategy against an alternative stratification strategy (based on the patient’s New York Heart Association class) and find that the STEPP-based approach is expected to be more cost-effective. So the key message seems to be that STEPP can be used as a basis for defining subgroups as cost-effectively as possible.

I’m unsure about the extent to which this is a method that deserves to have its own name, insofar as it is used in this study. I’ve seen plenty of studies present a graph with net benefit on the Y-axis and some patient characteristic on the X-axis. But my main concern is about defining subgroups on the basis of net monetary benefit rather than some patient characteristic. Is it OK to deny treatment to subgroup A because treatment costs are higher than in subgroup B, even if treatment is cost-effective for the entire population of A+B? Maybe, but I think that creates more challenges than stratification on the basis of treatment outcome.

Using post-market utilisation analysis to support medicines pricing policy: an Australian case study of aflibercept and ranibizumab use. Applied Health Economics and Health Policy [PubMed] Published 25th October 2018

The use of ranibizumab and aflibercept has been a hot topic in the UK, where NHS providers feel that they’ve been bureaucratically strong-armed into using an incredibly expensive drug to treat certain eye conditions when a cheaper and just-as-effective alternative is available. Seeing how other countries have managed prices in this context could, therefore, be valuable to the NHS and other health services internationally. This study uses data from Australia, where decisions about subsidising medicines are informed by research into how drugs are used after they come to market. Both ranibizumab (in 2007) and aflibercept (in 2012) were supported for the treatment of age-related macular degeneration. These decisions were based on clinical trials and modelling studies, which also showed that the benefit of ~6 aflibercept prescriptions equated to the benefit of ~12 ranibizumab prescriptions, justifying a higher price-per-injection for aflibercept.

In the UK and US, aflibercept attracts a higher price. The authors assume that this is because of the aforementioned trial data relating to the number of doses. However, in Australia, the same price is paid for aflibercept and ranibizumab. This is because a post-market analysis showed that, in practice, ranibizumab and aflibercept had a similar dose frequency. The purpose of this study is to see whether this is because different groups of patients are being prescribed the two drugs. If they are, then we might anticipate heterogenous treatment outcomes and thus a justification for differential pricing. Data were drawn from an administrative claims database for 208,000 Australian veterans for 2007-2017. The monthly number of aflibercept and ranibizumab prescriptions was estimated for each person, showing that total prescriptions increased steadily over the period, with aflibercept taking around half the market within a year of its approval. Ranibizumab initiators were slightly older in the post-aflibercept era but, aside from that, there were no real differences identified. When it comes to the prescription of ranibizumab or aflibercept, gender, being in residential care, remoteness of location, and co-morbidities don’t seem to be important. Dispensing rates were similar, at around 3 prescriptions during the first 90 days and around 9 prescriptions during the following 12 months.

The findings seem to support Australia’s decision to treat ranibizumab and aflibercept as substitutes at the same price. More generally, they support the idea that post-market utilisation assessments can (and perhaps should) be used as part of the health technology assessment and reimbursement process.

Do political factors influence public health expenditures? Evidence pre- and post-great recession. The European Journal of Health Economics [PubMed] Published 24th October 2018

There is mixed evidence about the importance of partisanship in public spending, and very little relating specifically to health care. I’d be worried if political factors didn’t influence public spending on health, given that that’s a definitively political issue. How the situation might be different before and after a recession is an interesting question.

The authors combined OECD data for 34 countries from 1970-2016 with the Database of Political Institutions. This allowed for the creation of variables relating to the ideology of the government and the proximity of elections. Stationary panel data models were identified as the most appropriate method for analysis of these data. A variety of political factors were included in the models, for which the authors present marginal effects. The more left-wing a government, the higher is public spending on health care, but this is only statistically significant in the period before the crisis of 2007. Before the crisis, coalition governments tended to spend more, while governments with more years in office tended to spend less. These effects also seem to disappear after 2007. Throughout the whole period, governing parties with a stronger majority tended to spend less on health care. Several of the non-political factors included in the models show the results that we would expect. GDP per capita is positively associated with health care expenditures, for example. The findings relating to the importance of political factors appear to be robust to the inclusion of other (non-political) variables and there are similar findings when the authors look at public health expenditure as a percentage of total health expenditure. In contradiction with some previous studies, proximity to elections does not appear to be important.

The most interesting finding here is that the effect of partisanship seems to have mostly disappeared – or, at least, reduced – since the crisis of 2007. Why did left-wing parties and right-wing parties converge? The authors suggest that it’s because adverse economic circumstances restrict the extent to which governments can make decisions on the basis of ideology. Though I dare say readers of this blog could come up with plenty of other (perhaps non-economic) explanations.

Credits

Sam Watson’s journal round-up for 8th October 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A cost‐effectiveness threshold based on the marginal returns of cardiovascular hospital spending. Health Economics [PubMed] Published 1st October 2018

There are two types of cost-effectiveness threshold of interest to researchers. First, there’s the societal willingness-to-pay for a given gain in health or quality of life. This is what many regulatory bodies, such as NICE, use. Second, there is the actual return on medical spending achieved by the health service. Reimbursement of technologies with a lesser return for every pound or dollar would reduce the overall efficiency of the health service. Some refer to this as the opportunity cost, although in a technical sense I would disagree that it is the opportunity cost per se. Nevertheless, this latter definition has seen a growth in empirical work; with some data on health spending and outcomes, we can start to estimate this threshold.

This article looks at spending on cardiovascular disease (CVD) among elderly age groups by gender in the Netherlands and survival. Estimating the causal effect of spending is tricky with these data: spending may go up because survival is worsening, external factors like smoking may have a confounding role, and using five year age bands (as the authors do) over time can lead to bias as the average age in these bands is increasing as demographics shift. The authors do a pretty good job in specifying a Bayesian hierarchical model with enough flexibility to accommodate these potential issues. For example, linear time trends are allowed to vary by age-gender groups and  dynamic effects of spending are included. However, there’s no examination of whether the model is actually a good fit to the data, something which I’m growing to believe is an area where we, in health and health services research, need to improve.

Most interestingly (for me at least) the authors look at a range of priors based on previous studies and a meta-analysis of similar studies. The estimated elasticity using information from prior studies is more ‘optimistic’ about the effect of health spending than a ‘vague’ prior. This could be because CVD or the Netherlands differs in a particular way from other areas. I might argue that the modelling here is better than some previous efforts as well, which could explain the difference. Extrapolating using life tables the authors estimate a base case cost per QALY of €40,000.

Early illicit drug use and the age of onset of homelessness. Journal of the Royal Statistical Society: Series A Published 11th September 2018

How the consumption of different things, like food, drugs, or alcohol, affects life and health outcomes is a difficult question to answer empirically. Consider a recent widely-criticised study on alcohol published in The Lancet. Among a number of issues, despite including a huge amount of data, the paper was unable to address the problem that different kinds of people drink different amounts. The kind of person who is teetotal may be so for a number of reasons including alcoholism, interaction with medication, or other health issues. Similarly, studies on the effect of cannabis consumption have shown among other things an association with lower IQ and poorer mental health. But are those who consume cannabis already those with lower IQs or at higher risk of psychoses? This article considers the relationship between cannabis and homelessness. While homelessness may lead to an increase in drug use, drug use may also be a cause of homelessness.

The paper is a neat application of bivariate hazard models. We recently looked at shared parameter models on the blog, which factorise the joint distribution of two variables into their marginal distribution by assuming their relationship is due to some unobserved variable. The bivariate hazard models work here in a similar way: the bivariate model is specified as the product of the marginal densities and the individual unobserved heterogeneity. This specification allows (i) people to have different unobserved risks for both homelessness and cannabis use and (ii) cannabis to have a causal effect on homelessness and vice versa.

Despite the careful set-up though, I’m not wholly convinced of the face validity of the results. The authors claim that daily cannabis use among men has a large effect on becoming homeless – as large an effect as having separated parents – which seems implausible to me. Cannabis use can cause psychological dependency but I can’t see people choosing it over having a home as they might with something like heroin. The authors also claim that homelessness doesn’t really have an effect on cannabis use among men because the estimated effect is “relatively small” (it is the same order of magnitude as the reverse causal effect) and only “marginally significant”. Interpreting these results in the context of cannabis use would then be difficult, though. The paper provides much additional material of interest. However, the conclusion that regular cannabis use, all else being equal, has a “strong effect” on male homelessness, seems both difficult to conceptualise and not in keeping with the messiness of the data and complexity of the empirical question.

How could health care be anything other than high quality? The Lancet: Global Health [PubMed] Published 5th September 2018

Tedros Adhanom Ghebreyesus, or Dr Tedros as he’s better known, is the head of the WHO. This editorial was penned in response to the recent Lancet Commission on Health Care Quality and related studies (see this round-up). However, I was critical of these studies for a number of reasons, in particular, the conflation of ‘quality’ as we normally understand it and everything else that may impact on how a health system performs. This includes resourcing, which is obviously low in poor countries, availability of labour and medical supplies, and demand side choices about health care access. The empirical evidence was fairly weak; even in countries like in the UK in which we’re swimming in data we struggle to quantify quality. Data are also often averaged at the national level, masking huge underlying variation within-country. This editorial is, therefore, a bit of an empty platitude: of course we should strive to improve ‘quality’ – its goodness is definitional. But without a solid understanding of how to do this or even what we mean when we say ‘quality’ in this context, we’re not really saying anything at all. Proposing that we need a ‘revolution’ without any real concrete proposals is fairly meaningless and ignores the massive strides that have been made in recent years. Delivering high-quality, timely, effective, equitable, and integrated health care in the poorest settings means more resources. Tinkering with what little services already exist for those most in need is not going to produce a revolutionary change. But this strays into political territory, which UN organisations often flounder in.

Editorial: Statistical flaws in the teaching excellence and student outcomes framework in UK higher education. Journal of the Royal Statistical Society: Series A Published 21st September 2018

As a final note for our academic audience, we give you a statement on the Teaching Excellence Framework (TEF). For our non-UK audience, the TEF is a new system being introduced by the government, which seeks to introduce more of a ‘market’ in higher education by trying to quantify teaching quality and then allowing the best-performing universities to charge more. No-one would disagree with the sentiment that improving higher education standards is better for students and teachers alike, but the TEF is fundamentally statistically flawed, as discussed in this editorial in the JRSS.

Some key points of contention are: (i) TEF doesn’t actually assess any teaching, such as through observation; (ii) there is no consideration of uncertainty about scores and rankings; (iii) “The benchmarking process appears to be a kind of poor person’s propensity analysis” – copied verbatim as I couldn’t have phrased it any better; (iv) there has been no consideration of gaming the metrics; and (v) the proposed models do not reflect the actual aims of TEF and are likely to be biased. Economists will also likely have strong views on how the TEF incentives will affect institutional behaviour. But, as Michael Gove, the former justice and education secretary said, Britons have had enough of experts.

Credits