Sam Watson’s journal round-up for 15th January 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Cost-effectiveness of publicly funded treatment of opioid use disorder in California. Annals of Internal Medicine [PubMed] Published 2nd January 2018

Deaths from opiate overdose have soared in the United States in recent years. In 2016, 64,000 people died this way, up from 16,000 in 2010 and 4,000 in 1999. The causes of public health crises like this are multifaceted, but we can identify two key issues that have contributed more than any other. Firstly, medical practitioners have been prescribing opiates irresponsibly for years. For the last ten years, well over 200,000,000 opiate prescriptions were issued per year in the US – enough for seven in every ten people. Once prescribed, opiate use is often not well managed. Prescriptions can be stopped abruptly, for example, leaving people with unexpected withdrawal syndromes and rebound pain. It is estimated that 75% of heroin users in the US began by using legal, prescription opiates. Secondly, drug suppliers have started cutting heroin with its far stronger but cheaper cousin, fentanyl. Given fentanyl’s strength, only a tiny amount is required to achieve the same effects as heroin, but the lack of pharmaceutical knowledge and equipment means it is often not measured or mixed appropriately into what is sold as ‘heroin’. There are two clear routes to alleviating the epidemic of opiate overdose: prevention, by ensuring responsible medical use of opiates, and ‘cure’, either by ensuring the quality and strength of heroin, or providing a means to stop opiate use. The former ‘cure’ is politically infeasible so it falls on the latter to help those already habitually using opiates. However, the availability of opiate treatment programs, such as opiate agonist treatment (OAT), is lacklustre in the US. OAT provides non-narcotic opiates, such as methadone or buprenorphine, to prevent withdrawal syndromes in users, from which they can slowly be weaned. This article looks at the cost-effectiveness of providing OAT for all persons seeking treatment for opiate use in California for an unlimited period versus standard care, which only provides OAT to those who have failed supervised withdrawal twice, and only for 21 days. The paper adopts a previously developed semi-Markov cohort model that includes states for treatment, relapse, incarceration, and abstinence. Transition probabilities for the new OAT treatment were determined from treatment data for current OAT patients (as far as I understand it). Although this does raise the question about the generalisability of this population to the whole population of opiate users – given the need to have already been through two supervised withdrawals, this population may have a greater motivation to quit, for example. In any case, the article estimates that the OAT program would be cost-saving, through reductions in crime and incarceration, and improve population health, by reducing the risk of death. Taken at face value these results seem highly plausible. But, as we’ve discussed before, drug policy rarely seems to be evidence-based.

The impact of aid on health outcomes in Uganda. Health Economics [PubMed] Published 22nd December 2017

Examining the response of population health outcomes to changes in health care expenditure has been the subject of a large and growing number of studies. One reason is to estimate a supply-side cost-effectiveness threshold: the health returns the health service achieves in response to budget expansions or contractions. Similarly, we might want to know the returns to particular types of health care expenditure. For example, there remains a debate about the effectiveness of aid spending in low and middle-income country (LMIC) settings. Aid spending may fail to be effective for reasons such as resource leakage, failure to target the right population, poor design and implementation, and crowding out of other public sector investment. Looking at these questions at an aggregate level can be tricky; the link between expenditure or expenditure decisions and health outcomes is long and causality flows in multiple directions. Effects are likely to therefore be small and noisy and require strong theoretical foundations to interpret. This article takes a different, and innovative, approach to looking at this question. In essence, the analysis boils down to a longitudinal comparison of those who live near large, aid funded health projects with those who don’t. The expectation is that the benefit of any aid spending will be felt most acutely by those who live nearest to actual health care facilities that come about as a result of it. Indeed, this is shown by the results – proximity to an aid project reduced disease prevalence and work days lost to ill health with greater effects observed closer to the project. However, one way of considering the ‘usefulness’ of this evidence is how it can be used to improve policymaking. One way is in understanding the returns to investment or over what area these projects have an impact. The latter is covered in the paper to some extent, but the former is hard to infer. A useful next step may be to try to quantify what kind of benefit aid dollars produce and its heterogeneity thereof.

The impact of social expenditure on health inequalities in Europe. Social Science & Medicine Published 11th January 2018

Let us consider for a moment how we might explore empirically whether social expenditure (e.g. unemployment support, child support, housing support, etc) affects health inequalities. First, we establish a measure of health inequality. We need a proxy measure of health – this study uses self-rated health and self-rated difficulty in daily living – and then compare these outcomes along some relevant measure of socioeconomic status (SES) – in this study they use level of education and a compound measure of occupation, income, and education (the ISEI). So far, so good. Data on levels of social expenditure are available in Europe and are used here, but oddly these data are converted to a percentage of GDP. The trouble with doing this is that this variable can change if social expenditure changes or if GDP changes. During the financial crisis, for example, social expenditure shot up as a proportion of GDP, which likely had very different effects on health and inequality than when social expenditure increased as a proportion of GDP due to a policy change under the Labour government. This variable also likely has little relationship to the level of support received per eligible person. Anyway, at the crudest level, we can then consider how the relationship between SES and health is affected by social spending. A more nuanced approach might consider who the recipients of social expenditure are and how they stand on our measure of SES, but I digress. In the article, the baseline category for education is those with only primary education or less, which seems like an odd category to compare to since in Europe I would imagine this is a very small proportion of people given compulsory schooling ages unless, of course, they are children. But including children in the sample would be an odd choice here since they don’t personally receive social assistance and are difficult to compare to adults. However, there are no descriptive statistics in the paper so we don’t know and no comparisons are made between other groups. Indeed, the estimates of the intercepts in the models are very noisy and variable for no obvious reason other than perhaps the reference group is very small. Despite the problems outlined so far though, there is a potentially more serious one. The article uses a logistic regression model, which is perfectly justifiable given the binary or ordinal nature of the outcomes. However, the authors justify the conclusion that “Results show that health inequalities measured by education are lower in countries where social expenditure is higher” by demonstrating that the odds ratio for reporting a poor health outcome in the groups with greater than primary education, compared to primary education or less, is smaller in magnitude when social expenditure as a proportion of GDP is higher. But the conclusion does not follow from the premise. It is entirely possible for these odds ratios to change without any change in the variance of the underlying distribution of health, the relative ordering of people, or the absolute difference in health between categories, simply by shifting the whole distribution up or down. For example, if the proportions of people in two groups reporting a negative outcome are 0.3 and 0.4, which then change to 0.2 and 0.3 respectively, then the odds ratio comparing the two groups changes from 0.64 to 0.58. The difference between them remains 0.1. No calculations are made regarding absolute effects in the paper though. GDP is also shown to have a positive effect on health outcomes. All that might have been shown is that the relative difference in health outcomes between those with primary education or less and others changes as GDP changes because everyone is getting healthier. The question of the article is interesting, it’s a shame about the execution.

Credits

 

Paul Mitchell’s journal round-up for 6th November 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A longitudinal study to assess the frequency and cost of antivascular endothelial therapy, and inequalities in access, in England between 2005 and 2015. BMJ Open [PubMed] Published 22nd October 2017

I am breaking one of my unwritten rules in a journal paper round-up by talking about colleagues’ work, but I feel it is too important not to provide a summary for a number of reasons. The study highlights the problems faced by regional healthcare purchasers in England when implementing national guideline recommendations on the cost-effectiveness of new treatments. The paper focuses on anti-vascular endothelial growth factor (anti-VEGF) medicines in particular, with two drugs, ranibizumab and aflibercept, offered to patients with a range of eye conditions, costing £550-800 per injection. Another drug, bevacizumab, that is closely related to ranibizumab and performs similarly in trials, could be provided at a fraction of the cost (£50-100 per injection), but it is currently unlicensed for eye conditions in the UK. This study investigates how the regional areas in England have coped with trying to provide the recommended drugs using administrative data from Hospital Episode Statistics in England between 2005-2015 by tracking their use since they have been recommended for a number of different eye conditions over the past decade. In 2014/15 the cost of these two new drugs for treating eye conditions alone was estimated at £447 million nationally. The distribution of where these drugs are provided is not equal, varying widely across regions after controlling for socio-demographics, suggesting an inequality of access associated with the introduction of these high-cost drugs over the past decade at a time of relatively low growth in national health spending. Although there are limitations associated with using data not intended for research purposes, the study shows how the most can be made from data routinely collected for non-research purposes. On a public policy level, it raises questions over the provision of such high-cost drugs, for which the authors state the NHS are currently paying more for than US insurers. Although it is important to be careful when comparing to unlicensed drugs, the authors point to clear evidence in the paper as to why their comparison is a reasonable one in this scenario, with a large opportunity cost associated with not including this option in national guidelines. If national recommendations continue to insist that such drugs be provided, clearer guidance is also required on how to disinvest from existing services at a regional level to reduce further examples of inequality in access in the future.

In search of a common currency: a comparison of seven EQ-5D-5L value sets. Health Economics [PubMed] Published 24th October 2017

For those of us out there who like a good valuation study, you will need to set yourself aside a good piece of time to work your way through this one. The new EQ-5D-5L measure of health status, with a primary purpose of generating quality-adjusted life years (QALYs) for economic evaluations, is now starting to have valuation studies emerging from different countries, whereby the relative importance of each of the measure dimensions and levels are quantified based on general population preferences. This study offers the first comparison of value sets across seven countries: 3 Western European (England, Netherlands, Spain), 1 North American (Canada), 1 South American (Uruguay), and two East Asian (Japan and South Korea). The authors in this paper aim to describe methodological differences between the seven value sets, compare the relative importance of dimensions, level decrements and scale length (i.e. quality/quantity trade-offs for QALYs), as well as developing a common (Western) currency across four of the value sets. In brief summary, there does appear to be similar trends across the three Western European countries: level decrements from levels 3 to 4 have the largest value, followed by levels 1 to 2. There is also a pattern in these three countries’ dimensions, whereby the two “symptom” dimensions (i.e. pain/discomfort, anxiety/depression) have equal importance to the other three “functioning” dimensions (i.e. mobility, self-care and usual activities). There are also clear differences with the other four value sets. Canada, although it also has the highest level decrements between levels 3 and 4 (49%), unusually has equal decrements for the remainder (17% x 3). For the other three countries, greater weight is attached to the three functioning dimensions relative to the two symptom dimensions. Although South Korea also has the greatest level decrements between level 3 and 4, it was greatest between level 4 and level 5 in Uruguay and levels 1 and 2 in Japan. Although the authors give a number of plausible reasons as to why these differences may occur, less justification is given in the choice of the four value sets they offer as a common currency, beyond the need to have a value set for countries that do not have one already. The most in-common value sets were the three Western European countries, so a Western European value set may have been more appropriate if the criterion was to have comparable values across countries. If the aim was really for a more international common currency, there are issues with the exclusion of non-Western countries’ value sets from their common currency version. Surely differences across cultures should be reflected in a common currency if they are apparent in different cultures and settings. A common currency should also have a better spread of regions geographically, with no country from Africa, the Middle East, Central and South Asia represented in this study, as well as no lower- and middle-income countries. Though this final criticism is out of the control of the authors based on current data availability.

Quantifying the relationship between capability and health in older people: can’t map, won’t map. Medical Decision Making [PubMed] Published 23rd October 2017

The EQ-5D is one of many ways quality of life can be measured within economic evaluations. A more recent way based on Amartya Sen’s capability approach has attempted to develop outcome measures that move beyond health-related aspects of quality of life captured by EQ-5D and similar measures used in the generation of QALYs. This study examines the relationship between the EQ-5D and the ICECAP-O capability measure in three different patient populations included in the Medical Crises in Older People programme in England. The authors propose a reasonable hypothesis that health could be considered a conversion factor for a person’s broader capability set, and so it is plausible to test how well the EQ-5D-3L dimension values and overall score can map onto the ICECAP-O overall score. Through numerous regressions performed, the strongest relationship between the two measures in this sample was an R-squared of 0.35. Interestingly, the dimensions on the EQ-5D that had a significant relationship with the ICECAP-O score were a mix of dimensions with a focus on functioning (i.e. self-care, usual activities) and symptoms (anxiety/depression), so overall capability on ICECAP-O appears to be related, at least to a small degree, to both health components of EQ-5D discussed in this round-up’s previous paper. The authors suggest it provides further evidence of the complementary data provided by EQ-5D and ICECAP-O, but the causal relationship, as the authors suggest, between both measures remains under-researched. Longitudinal data analysis would provide a more definitive answer to the question of how much interaction there is between these two measures and their dimensions as health and capability changes over time in response to different treatments and care provision.

Credits

 

Chris Sampson’s journal round-up for 11th September 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Core items for a standardized resource use measure (ISRUM): expert Delphi consensus survey. Value in Health Published 1st September 2017

Trial-based collection of resource use data, for the purpose of economic evaluation, is wild. Lots of studies use bespoke questionnaires. Some use off-the-shelf measures, but many of these are altered to suit the context. Validity rarely gets a mention. Some of you may already be aware of this research; I’m sure I’m not the only one here who participated. The aim of the study is to establish a core set of resource use items that should be included in all studies to aid comparability, consistency and validity. The researchers identified a long list of 60 candidate items for inclusion, through a review of 59 resource use instruments. An NHS and personal social services perspective was adopted, and any similar items were merged. This list was constructed into a Delphi survey. Members of the HESG mailing list – as well as 111 other identified experts – were invited to complete the survey, for which there were two rounds. The first round asked participants to rate the importance of including each item in the core set, using a scale from 1 (not important) to 9 (very important). Participants were then asked to select their ‘top 10’. Items survived round 1 if they scored at least 7 with more than 50% of respondents, and less than 3 by no more than 15%, either overall or within two or more participant subgroups. In round 2, participants were presented with the results of round 1 and asked to re-rate 34 remaining items. There was a sample of 45 usable responses in round 1 and 42 in round 2. Comments could also be provided, which were subsequently subject to content analysis. After all was said and done, a meeting was held for final item selection based on the findings, to which some survey participants were invited but only one attended (sorry I couldn’t make it). The final 10 items were: i) hospital admissions, ii) length of stay, iii) outpatient appointments, iv) A&E visits, v) A&E admissions, vi) number of appointments in the community, vii) type of appointments in the community, viii) number of home visits, ix) type of home visits and x) name of medication. The measure isn’t ready to use just yet. There is still research to be conducted to identify the ideal wording for each item. But it looks promising. Hopefully, this work will trigger a whole stream of research to develop bolt-ons in specific contexts for a modular system of resource use measurement. I also think that this work should form the basis of alignment between costing and resource use measurement. Resource use is often collected in a way that is very difficult to ‘map’ onto costs or prices. I’m sure the good folk at the PSSRU are paying attention to this work, and I hope they might help us all out by estimating unit costs for each of the core items (as well as any bolt-ons, once they’re developed). There’s some interesting discussion in the paper about the parallels between this work and the development of core outcome sets. Maybe analysis of resource use can be as interesting as the analysis of quality of life outcomes.

A call for open-source cost-effectiveness analysis. Annals of Internal Medicine [PubMed] Published 29th August 2017

Yes, this paper is behind a paywall. Yes, it is worth pointing out this irony over and over again until we all start practising what we preach. We’re all guilty; we all need to keep on keeping on at each other. Now, on to the content. The authors argue in favour of making cost-effectiveness analysis (and model-based economic evaluation in particular) open to scrutiny. The key argument is that there is value in transparency, and analogies are drawn with clinical trial reporting and epidemiological studies. This potential additional value is thought to derive from i) easy updating of models with new data and ii) less duplication of efforts. The main challenges are thought to be the need for new infrastructure – technical and regulatory – and preservation of intellectual property. Recently, I discussed similar issues in a call for a model registry. I’m clearly in favour of cost-effectiveness analyses being ‘open source’. My only gripe is that the authors aren’t the first to suggest this, and should have done some homework before publishing this call. Nevertheless, it is good to see this issue being raised in a journal such as Annals of Internal Medicine, which could be an indication that the tide is turning.

Differential item functioning in quality of life measurement: an analysis using anchoring vignettes. Social Science & Medicine [PubMed] [RePEc] Published 26th August 2017

Differential item functioning (DIF) occurs when different groups of people have different interpretations of response categories. For example, in response to an EQ-5D questionnaire, the way that two groups of people understand ‘slight problems in walking about’ might not be the same. If that were the case, the groups wouldn’t be truly comparable. That’s a big problem for resource allocation decisions, which rely on trade-offs between different groups of people. This study uses anchoring vignettes to test for DIF, whereby respondents are asked to rate their own health alongside some health descriptions for hypothetical individuals. The researchers conducted 2 online surveys, which together recruited a representative sample of 4,300 Australians. Respondents completed the EQ-5D-5L, some vignettes, some other health outcome measures and a bunch of sociodemographic questions. The analysis uses an ordered probit model to predict responses to the EQ-5D dimensions, with the vignettes used to identify the model’s thresholds. This is estimated for each dimension of the EQ-5D-5L, in the hope that the model can produce coefficients that facilitate ‘correction’ for DIF. But this isn’t a guaranteed approach to identifying the effect of DIF. Two important assumptions are inherent; first, that individuals rate the hypothetical vignette states on the same latent scale as they rate their own health (AKA response consistency) and, second, that everyone values the vignettes on an equivalent latent scale (AKA vignette equivalence). Only if these assumptions hold can anchoring vignettes be used to adjust for DIF and make different groups comparable. The researchers dedicate a lot of effort to testing these assumptions. To test response consistency, separate (condition-specific) measures are used to assess each domain of the EQ-5D. The findings suggest that responses are consistent. Vignette equivalence is assessed by the significance of individual characteristics in determining vignette values. In this study, the vignette equivalence assumption didn’t hold, which prevents the authors from making generalisable conclusions. However, the researchers looked at whether the assumptions were satisfied in particular age groups. For 55-65 year olds (n=914), they did, for all dimensions except anxiety/depression. That might be because older people are better at understanding health problems, having had more experience of them. So the authors can tell us about DIF in this older group. Having corrected for DIF, the mean health state value in this group increases from 0.729 to 0.806. Various characteristics explain the heterogeneous response behaviour. After correcting for DIF, the difference in EQ-5D index values between high and low education groups increased from 0.049 to 0.095. The difference between employed and unemployed respondents increased from 0.077 to 0.256. In some cases, the rankings changed. The difference between those divorced or widowed and those never married increased from -0.028 to 0.060. The findings hint at a trade-off between giving personalised vignettes to facilitate response consistency and generalisable vignettes to facilitate vignette equivalence. It may be that DIF can only be assessed within particular groups (such as the older sample in this study). But then, if that’s the case, what hope is there for correcting DIF in high-level resource allocation decisions? Clearly, DIF in the EQ-5D could be a big problem. Accounting for it could flip resource allocation decisions. But this study shows that there isn’t an easy answer.

How to design the cost-effectiveness appraisal process of new healthcare technologies to maximise population health: a conceptual framework. Health Economics [PubMed] Published 22nd August 2017

The starting point for this paper is that, when it comes to reimbursement decisions, the more time and money spent on the appraisal process, the more precise the cost-effectiveness estimates are likely to be. So the question is, how much should be committed to the appraisal process in the way of resources? The authors set up a framework in which to consider a variety of alternatively defined appraisal processes, how these might maximise population health and which factors are key drivers in this. The appraisal process is conceptualised as a diagnostic tool to identify which technologies are cost-effective (true positives) and which aren’t (true negatives). The framework builds on the fact that manufacturers can present a claimed ICER that makes their technology more attractive, but that the true ICER can never be known with certainty. As a diagnostic test, there are four possible outcomes: true positive, false positive, true negative, or false negative. Each outcome is associated with an expected payoff in terms of population health and producer surplus. Payoffs depend on the accuracy of the appraisal process (sensitivity and specificity), incremental net benefit per patient, disease incidence, time of relevance for an approval, the cost of the process and the price of the technology. The accuracy of the process can be affected by altering the time and resources dedicated to it or by adjusting the definition of cost-effectiveness in terms of the acceptable level of uncertainty around the ICER. So, what determines an optimal level of accuracy in the appraisal process, assuming that producers’ price setting is exogenous? Generally, the process should have greater sensitivity (at the expense of specificity) when there is more to gain: when a greater proportion of technologies are cost-effective or when the population or time of relevance is greater. There is no fixed optimum for all situations. If we relax the assumption of exogenous pricing decisions, and allow pricing to be partly determined by the appraisal process, we can see that a more accurate process incentivises cost-effective price setting. The authors also consider the possibility of there being multiple stages of appraisal, with appeals, re-submissions and price agreements. The take-home message is that the appraisal process should be re-defined over time and with respect to the range of technologies being assessed, or even an individualised process for each technology in each setting. At least, it seems clear that technologies with exceptional characteristics (with respect to their potential impact on population health), should be given a bespoke appraisal. NICE is already onto these ideas – they recently introduced a fast track process for technologies with a claimed ICER below £10,000 and now give extra attention to technologies with major budget impact.

Credits