# Sam Watson’s journal round-up for 8th October 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A cost‐effectiveness threshold based on the marginal returns of cardiovascular hospital spending. Health Economics [PubMed] Published 1st October 2018

There are two types of cost-effectiveness threshold of interest to researchers. First, there’s the societal willingness-to-pay for a given gain in health or quality of life. This is what many regulatory bodies, such as NICE, use. Second, there is the actual return on medical spending achieved by the health service. Reimbursement of technologies with a lesser return for every pound or dollar would reduce the overall efficiency of the health service. Some refer to this as the opportunity cost, although in a technical sense I would disagree that it is the opportunity cost per se. Nevertheless, this latter definition has seen a growth in empirical work; with some data on health spending and outcomes, we can start to estimate this threshold.

This article looks at spending on cardiovascular disease (CVD) among elderly age groups by gender in the Netherlands and survival. Estimating the causal effect of spending is tricky with these data: spending may go up because survival is worsening, external factors like smoking may have a confounding role, and using five year age bands (as the authors do) over time can lead to bias as the average age in these bands is increasing as demographics shift. The authors do a pretty good job in specifying a Bayesian hierarchical model with enough flexibility to accommodate these potential issues. For example, linear time trends are allowed to vary by age-gender groups and  dynamic effects of spending are included. However, there’s no examination of whether the model is actually a good fit to the data, something which I’m growing to believe is an area where we, in health and health services research, need to improve.

Most interestingly (for me at least) the authors look at a range of priors based on previous studies and a meta-analysis of similar studies. The estimated elasticity using information from prior studies is more ‘optimistic’ about the effect of health spending than a ‘vague’ prior. This could be because CVD or the Netherlands differs in a particular way from other areas. I might argue that the modelling here is better than some previous efforts as well, which could explain the difference. Extrapolating using life tables the authors estimate a base case cost per QALY of €40,000.

Early illicit drug use and the age of onset of homelessness. Journal of the Royal Statistical Society: Series A Published 11th September 2018

How the consumption of different things, like food, drugs, or alcohol, affects life and health outcomes is a difficult question to answer empirically. Consider a recent widely-criticised study on alcohol published in The Lancet. Among a number of issues, despite including a huge amount of data, the paper was unable to address the problem that different kinds of people drink different amounts. The kind of person who is teetotal may be so for a number of reasons including alcoholism, interaction with medication, or other health issues. Similarly, studies on the effect of cannabis consumption have shown among other things an association with lower IQ and poorer mental health. But are those who consume cannabis already those with lower IQs or at higher risk of psychoses? This article considers the relationship between cannabis and homelessness. While homelessness may lead to an increase in drug use, drug use may also be a cause of homelessness.

The paper is a neat application of bivariate hazard models. We recently looked at shared parameter models on the blog, which factorise the joint distribution of two variables into their marginal distribution by assuming their relationship is due to some unobserved variable. The bivariate hazard models work here in a similar way: the bivariate model is specified as the product of the marginal densities and the individual unobserved heterogeneity. This specification allows (i) people to have different unobserved risks for both homelessness and cannabis use and (ii) cannabis to have a causal effect on homelessness and vice versa.

Despite the careful set-up though, I’m not wholly convinced of the face validity of the results. The authors claim that daily cannabis use among men has a large effect on becoming homeless – as large an effect as having separated parents – which seems implausible to me. Cannabis use can cause psychological dependency but I can’t see people choosing it over having a home as they might with something like heroin. The authors also claim that homelessness doesn’t really have an effect on cannabis use among men because the estimated effect is “relatively small” (it is the same order of magnitude as the reverse causal effect) and only “marginally significant”. Interpreting these results in the context of cannabis use would then be difficult, though. The paper provides much additional material of interest. However, the conclusion that regular cannabis use, all else being equal, has a “strong effect” on male homelessness, seems both difficult to conceptualise and not in keeping with the messiness of the data and complexity of the empirical question.

How could health care be anything other than high quality? The Lancet: Global Health [PubMed] Published 5th September 2018

Tedros Adhanom Ghebreyesus, or Dr Tedros as he’s better known, is the head of the WHO. This editorial was penned in response to the recent Lancet Commission on Health Care Quality and related studies (see this round-up). However, I was critical of these studies for a number of reasons, in particular, the conflation of ‘quality’ as we normally understand it and everything else that may impact on how a health system performs. This includes resourcing, which is obviously low in poor countries, availability of labour and medical supplies, and demand side choices about health care access. The empirical evidence was fairly weak; even in countries like in the UK in which we’re swimming in data we struggle to quantify quality. Data are also often averaged at the national level, masking huge underlying variation within-country. This editorial is, therefore, a bit of an empty platitude: of course we should strive to improve ‘quality’ – its goodness is definitional. But without a solid understanding of how to do this or even what we mean when we say ‘quality’ in this context, we’re not really saying anything at all. Proposing that we need a ‘revolution’ without any real concrete proposals is fairly meaningless and ignores the massive strides that have been made in recent years. Delivering high-quality, timely, effective, equitable, and integrated health care in the poorest settings means more resources. Tinkering with what little services already exist for those most in need is not going to produce a revolutionary change. But this strays into political territory, which UN organisations often flounder in.

Editorial: Statistical flaws in the teaching excellence and student outcomes framework in UK higher education. Journal of the Royal Statistical Society: Series A Published 21st September 2018

As a final note for our academic audience, we give you a statement on the Teaching Excellence Framework (TEF). For our non-UK audience, the TEF is a new system being introduced by the government, which seeks to introduce more of a ‘market’ in higher education by trying to quantify teaching quality and then allowing the best-performing universities to charge more. No-one would disagree with the sentiment that improving higher education standards is better for students and teachers alike, but the TEF is fundamentally statistically flawed, as discussed in this editorial in the JRSS.

Some key points of contention are: (i) TEF doesn’t actually assess any teaching, such as through observation; (ii) there is no consideration of uncertainty about scores and rankings; (iii) “The benchmarking process appears to be a kind of poor person’s propensity analysis” – copied verbatim as I couldn’t have phrased it any better; (iv) there has been no consideration of gaming the metrics; and (v) the proposed models do not reflect the actual aims of TEF and are likely to be biased. Economists will also likely have strong views on how the TEF incentives will affect institutional behaviour. But, as Michael Gove, the former justice and education secretary said, Britons have had enough of experts.

Credits

# Sam Watson’s journal round-up for 10th September 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Probabilistic sensitivity analysis in cost-effectiveness models: determining model convergence in cohort models. PharmacoEconomics [PubMed] Published 27th July 2018

Probabilistic sensitivity analysis (PSA) is rightfully a required component of economic evaluations. Deterministic sensitivity analyses are generally biased; averaging the outputs of a model based on a choice of values from a complex joint distribution is not likely to be a good reflection of the true model mean. PSA involves repeatedly sampling parameters from their respective distributions and analysing the resulting model outputs. But how many times should you do this? Most times, an arbitrary number is selected that seems “big enough”, say 1,000 or 10,000. But these simulations themselves exhibit variance; so-called Monte Carlo error. This paper discusses making the choice of the number of simulations more formal by assessing the “convergence” of simulation output.

In the same way as sample sizes are chosen for trials, the number of simulations should provide an adequate level of precision, anything more wastes resources without improving inferences. For example, if the statistic of interest is the net monetary benefit, then we would want the confidence interval (CI) to exclude zero as this should be a sufficient level of certainty for an investment decision. The paper, therefore, proposed conducting a number of simulations, examining the CI for when it is ‘narrow enough’, and conducting further simulations if it is not. However, I see a problem with this proposal: the variance of a statistic from a sequence of simulations itself has variance. The stopping points at which we might check CI are themselves arbitrary: additional simulations can increase the width of the CI as well as reduce them. Consider the following set of simulations from a simple ratio of random variables $ICER = gamma(1,0.01)/normal(0.01,0.01)$:The “stopping rule” therefore proposed doesn’t necessarily indicate “convergence” as a few more simulations could lead to a wider, as well as narrower, CI. The heuristic approach is undoubtedly an improvement on the current way things are usually done, but I think there is scope here for a more rigorous method of assessing convergence in PSA.

Mortality due to low-quality health systems in the universal health coverage era: a systematic analysis of amenable deaths in 137 countries. The Lancet [PubMed] Published 5th September 2018

Richard Horton, the oracular editor-in-chief of the Lancet, tweeted last week:

There is certainly an argument that academic journals are good forums to make advocacy arguments. Who better to interpret the analyses presented in these journals than the authors and audiences themselves? But, without a strict editorial bulkhead between analysis and opinion, we run the risk that the articles and their content are influenced or dictated by the political whims of editors rather than scientific merit. Unfortunately, I think this article is evidence of that.

No-one debates that improving health care quality will improve patient outcomes and experience. It is in the very definition of ‘quality’. This paper aims to estimate the numbers of deaths each year due to ‘poor quality’ in low- and middle-income countries (LMICs). The trouble with this is two-fold: given the number of unknown quantities required to get a handle on this figure, the definition of quality notwithstanding, the uncertainty around this figure should be incredibly high (see below); and, attributing these deaths in a causal way to a nebulous definition of ‘quality’ is tenuous at best. The approach of the article is, in essence, to assume that the differences in fatality rates of treatable conditions between LMICs and the best performing health systems on Earth, among people who attend health services, are entirely caused by ‘poor quality’. This definition of quality would therefore seem to encompass low resourcing, poor supply of human resources, a lack of access to medicines, as well as everything else that’s different in health systems. Then, to get to this figure, the authors have multiple sources of uncertainty including:

• Using a range of proxies for health care utilisation;
• Using global burden of disease epidemiology estimates, which have associated uncertainty;
• A number of data slicing decisions, such as truncating case fatality rates;
• Estimating utilisation rates based on a predictive model;
• Estimating the case-fatality rate for non-users of health services based on other estimated statistics.

Despite this, the authors claim to estimate a 95% uncertainty interval with a width of only 300,000 people, with a mean estimate of 5.0 million, due to ‘poor quality’. This seems highly implausible, and yet it is claimed to be a causal effect of an undefined ‘poor quality’. The timing of this article coincides with the Lancet Commission on care quality in LMICs and, one suspects, had it not been for the advocacy angle on care quality, it would not have been published in this journal.

Embedding as a pitfall for survey‐based welfare indicators: evidence from an experiment. Journal of the Royal Statistical Society: Series A Published 4th September 2018

Health economists will be well aware of the various measures used to evaluate welfare and well-being. Surveys are typically used that are comprised of questions relating to a number of different dimensions. These could include emotional and social well-being or physical functioning. Similar types of surveys are also used to collect population preferences over states of the world or policy options, for example, Kahneman and Knetsch conducted a survey of WTP for different environmental policies. These surveys can exhibit what is called an ’embedding effect’, which Kahneman and Knetsch described as when the value of a good varies “depending on whether the good is assessed on its own or embedded as part of a more inclusive package.” That is to say that the way people value single dimensional attributes or qualities can be distorted when they’re embedded as part of a multi-dimensional choice. This article reports the results of an experiment involving students who were asked to weight the relative importance of different dimensions of the Better Life Index, including jobs, housing, and income. The randomised treatment was whether they rated ‘jobs’ as a single category, or were presented with individual dimensions, such as the unemployment rate and job security. The experiment shows strong evidence of embedding – the overall weighting substantially differed by treatment. This, the authors conclude, means that the Better Life Index fails to accurately capture preferences and is subject to manipulation should a researcher be so inclined – if you want evidence to say your policy is the most important, just change the way the dimensions are presented.

Credits

# Chris Sampson’s journal round-up for 27th August 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Ethically acceptable compensation for living donations of organs, tissues, and cells: an unexploited potential? Applied Health Economics and Health Policy [PubMed] Published 25th August 2018

Around the world, there are shortages of organs for transplantation. In economics, the debate around the need to increase organ donation can be frustratingly ignorant of ethical and distributional concerns. So it’s refreshing to see this article attempting to square concerns about efficiency and equity. The authors do so by using a ‘spheres of justice’ framework. This is the idea that different social goods should be distributed according to different principles. So, while we might be happy for brocolli and iPhones to be distributed on the basis of free exchange, we might want health to be distributed on the basis of need. The argument can be extended to state that – for a just situation to prevail – certain exchanges between these spheres of justice (e.g. health for iPhones) should never take place. This idea might explain why – as the authors demonstrate with a review of European countries – policy tends not to allow monetary compensation for organ donation.

The paper cleverly sets out to taxonomise monetary and non-monetary reimbursement and compensation with reference to individuals’ incentives and the spheres of justice principles. From this, the authors reach two key conclusions. Firstly, that (monetary) reimbursement of donors’ expenses (e.g. travel costs or lost earnings) is ethically sound as this does not constitute an incentive to donate but rather removes existing disincentives. Secondly, that non-monetary compensation could be deemed ethical.

Three possible forms of non-monetary compensation are discussed: i) prioritisation, ii) free access, and iii) non-health care-related benefits. The first could involve being given priority for receiving organs, or it could extend to the jumping of other health care waiting lists. I think this is more problematic than the authors let on because it asserts that health care should – at least in part – be distributed according to desert rather than need. The second option – free access – could mean access to health care that people would otherwise have to pay for. The third option could involve access to other social goods such as education or housing.

This is an interesting article and an enjoyable read, but I don’t think it provides a complete solution. Maybe I’m just too much of a Marxist, but I think that this – as all other proposals – fails to distribute from each according to ability. That is, we’d still expect non-monetary compensation to incentivise poorer (and on average less healthy) people to donate organs, thus exacerbating health inequality. This is because i) poorer people are more likely to need the non-monetary benefits and ii) we live in a capitalist society in which there is almost nothing that money can’t by and which is strictly non-monetary. Show me a proposal that increases donation rates from those who can most afford to donate them (i.e. the rich and healthy).

Selecting bolt-on dimensions for the EQ-5D: examining their contribution to health-related quality of life. Value in Health Published 18th August 2018

Measures such as the EQ-5D are used to describe health-related quality of life as completely and generically as possible. But there is a trade-off between completeness and the length of the questionnaire. Necessarily, there are parts of the evaluative space that measures will not capture because they are a simplification. If the bit they’re missing is important to your patient group, that’s a problem. You might fancy a bolt-on. But how do we decide which areas of the evaluative space should be more completely included in the measure? Which bolt-ons should be used? This paper seeks to provide means of answering these questions.

The article builds on an earlier piece of work that was included in an earlier journal round-up. In the previous paper, the authors used factor analysis to identify candidate bolt-ons. The goal of this paper is to outline an approach for specifying which of these candidates ought to be used. Using data from the Multi-Instrument Comparison study, the authors fit linear regressions to see how well 37 candidate bolt-on items explain differences in health-related quality of life. The 37 items correspond to six different domains: energy/vitality, satisfaction, relationships, hearing, vision, and speech. In a second test, the authors explored whether the bolt-on candidates could explain differences in health-related quality of life associated with six chronic conditions. Health-related quality of life is defined according to a visual analogue scale, which notably does not correspond to that used in the EQ-5D but rather uses a broader measure of physical, mental, and social health.

The results suggest that items related to energy/vitality, relationships, and satisfaction explained a significant part of health-related quality of life on top of the existing EQ-5D dimensions. The implication is that these could be good candidates for bolt-ons. The analysis of the different conditions was less clear.

For me, there’s a fundamental problem with this study. It moves the goals posts. Bolt-ons are about improving the extent to which a measure can more accurately represent the evaluative space that it is designed to characterise. In this study, the authors use a broader definition of health-related quality of life that – as far as I can tell – the EQ-5D is not designed to capture. We’re not dealing with bolt-ons, we’re dealing with extensions to facilitate expansions to the evaluative space. Nevertheless, the method could prove useful if combined with a more thorough consideration of the evaluative space.

Sources of health financing and health outcomes: a panel data analysis. Health Economics [PubMed] [RePEc] Published 15th August 2018

There is a growing body of research looking at the impact that health (care) spending has on health outcomes. Usually, these studies don’t explicitly look at who is doing the spending. In this study, the author distinguishes between public and private spending and attempts to identify which type of spending (if either) results in greater health improvements.

The author uses data from the World Bank’s World Development Indicators for 1995-2014. Life expectancy at birth is adopted as the primary health outcome and the key expenditure variables are health expenditure as a share of GDP and private health expenditure as a share of total health expenditure. Controlling for a variety of other variables, including some determinants of health such as income and access to an improved water source, a triple difference analysis is described. The triple difference estimator corresponds to the difference in health outcomes arising from i) differences in the private expenditure level, given ii) differences in total expenditure, over iii) time.

The key finding from the study is that, on average, private expenditure is more effective in increasing life expectancy at birth than public expenditure. The author also looks at government effectiveness, which proves crucial. The finding in favour of private expenditure entirely disappears when only countries with effective government are considered. There is some evidence that public expenditure is more effective in these countries, and this is something that future research should investigate further. For countries with ineffective governments, the implication is that policy should be directed towards increasing overall health care expenditure by increasing private expenditure.

Credits