Sam Watson’s journal round-up for 8th October 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A cost‐effectiveness threshold based on the marginal returns of cardiovascular hospital spending. Health Economics [PubMed] Published 1st October 2018

There are two types of cost-effectiveness threshold of interest to researchers. First, there’s the societal willingness-to-pay for a given gain in health or quality of life. This is what many regulatory bodies, such as NICE, use. Second, there is the actual return on medical spending achieved by the health service. Reimbursement of technologies with a lesser return for every pound or dollar would reduce the overall efficiency of the health service. Some refer to this as the opportunity cost, although in a technical sense I would disagree that it is the opportunity cost per se. Nevertheless, this latter definition has seen a growth in empirical work; with some data on health spending and outcomes, we can start to estimate this threshold.

This article looks at spending on cardiovascular disease (CVD) among elderly age groups by gender in the Netherlands and survival. Estimating the causal effect of spending is tricky with these data: spending may go up because survival is worsening, external factors like smoking may have a confounding role, and using five year age bands (as the authors do) over time can lead to bias as the average age in these bands is increasing as demographics shift. The authors do a pretty good job in specifying a Bayesian hierarchical model with enough flexibility to accommodate these potential issues. For example, linear time trends are allowed to vary by age-gender groups and  dynamic effects of spending are included. However, there’s no examination of whether the model is actually a good fit to the data, something which I’m growing to believe is an area where we, in health and health services research, need to improve.

Most interestingly (for me at least) the authors look at a range of priors based on previous studies and a meta-analysis of similar studies. The estimated elasticity using information from prior studies is more ‘optimistic’ about the effect of health spending than a ‘vague’ prior. This could be because CVD or the Netherlands differs in a particular way from other areas. I might argue that the modelling here is better than some previous efforts as well, which could explain the difference. Extrapolating using life tables the authors estimate a base case cost per QALY of €40,000.

Early illicit drug use and the age of onset of homelessness. Journal of the Royal Statistical Society: Series A Published 11th September 2018

How the consumption of different things, like food, drugs, or alcohol, affects life and health outcomes is a difficult question to answer empirically. Consider a recent widely-criticised study on alcohol published in The Lancet. Among a number of issues, despite including a huge amount of data, the paper was unable to address the problem that different kinds of people drink different amounts. The kind of person who is teetotal may be so for a number of reasons including alcoholism, interaction with medication, or other health issues. Similarly, studies on the effect of cannabis consumption have shown among other things an association with lower IQ and poorer mental health. But are those who consume cannabis already those with lower IQs or at higher risk of psychoses? This article considers the relationship between cannabis and homelessness. While homelessness may lead to an increase in drug use, drug use may also be a cause of homelessness.

The paper is a neat application of bivariate hazard models. We recently looked at shared parameter models on the blog, which factorise the joint distribution of two variables into their marginal distribution by assuming their relationship is due to some unobserved variable. The bivariate hazard models work here in a similar way: the bivariate model is specified as the product of the marginal densities and the individual unobserved heterogeneity. This specification allows (i) people to have different unobserved risks for both homelessness and cannabis use and (ii) cannabis to have a causal effect on homelessness and vice versa.

Despite the careful set-up though, I’m not wholly convinced of the face validity of the results. The authors claim that daily cannabis use among men has a large effect on becoming homeless – as large an effect as having separated parents – which seems implausible to me. Cannabis use can cause psychological dependency but I can’t see people choosing it over having a home as they might with something like heroin. The authors also claim that homelessness doesn’t really have an effect on cannabis use among men because the estimated effect is “relatively small” (it is the same order of magnitude as the reverse causal effect) and only “marginally significant”. Interpreting these results in the context of cannabis use would then be difficult, though. The paper provides much additional material of interest. However, the conclusion that regular cannabis use, all else being equal, has a “strong effect” on male homelessness, seems both difficult to conceptualise and not in keeping with the messiness of the data and complexity of the empirical question.

How could health care be anything other than high quality? The Lancet: Global Health [PubMed] Published 5th September 2018

Tedros Adhanom Ghebreyesus, or Dr Tedros as he’s better known, is the head of the WHO. This editorial was penned in response to the recent Lancet Commission on Health Care Quality and related studies (see this round-up). However, I was critical of these studies for a number of reasons, in particular, the conflation of ‘quality’ as we normally understand it and everything else that may impact on how a health system performs. This includes resourcing, which is obviously low in poor countries, availability of labour and medical supplies, and demand side choices about health care access. The empirical evidence was fairly weak; even in countries like in the UK in which we’re swimming in data we struggle to quantify quality. Data are also often averaged at the national level, masking huge underlying variation within-country. This editorial is, therefore, a bit of an empty platitude: of course we should strive to improve ‘quality’ – its goodness is definitional. But without a solid understanding of how to do this or even what we mean when we say ‘quality’ in this context, we’re not really saying anything at all. Proposing that we need a ‘revolution’ without any real concrete proposals is fairly meaningless and ignores the massive strides that have been made in recent years. Delivering high-quality, timely, effective, equitable, and integrated health care in the poorest settings means more resources. Tinkering with what little services already exist for those most in need is not going to produce a revolutionary change. But this strays into political territory, which UN organisations often flounder in.

Editorial: Statistical flaws in the teaching excellence and student outcomes framework in UK higher education. Journal of the Royal Statistical Society: Series A Published 21st September 2018

As a final note for our academic audience, we give you a statement on the Teaching Excellence Framework (TEF). For our non-UK audience, the TEF is a new system being introduced by the government, which seeks to introduce more of a ‘market’ in higher education by trying to quantify teaching quality and then allowing the best-performing universities to charge more. No-one would disagree with the sentiment that improving higher education standards is better for students and teachers alike, but the TEF is fundamentally statistically flawed, as discussed in this editorial in the JRSS.

Some key points of contention are: (i) TEF doesn’t actually assess any teaching, such as through observation; (ii) there is no consideration of uncertainty about scores and rankings; (iii) “The benchmarking process appears to be a kind of poor person’s propensity analysis” – copied verbatim as I couldn’t have phrased it any better; (iv) there has been no consideration of gaming the metrics; and (v) the proposed models do not reflect the actual aims of TEF and are likely to be biased. Economists will also likely have strong views on how the TEF incentives will affect institutional behaviour. But, as Michael Gove, the former justice and education secretary said, Britons have had enough of experts.

Credits

Sam Watson’s journal round-up for 10th September 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Probabilistic sensitivity analysis in cost-effectiveness models: determining model convergence in cohort models. PharmacoEconomics [PubMed] Published 27th July 2018

Probabilistic sensitivity analysis (PSA) is rightfully a required component of economic evaluations. Deterministic sensitivity analyses are generally biased; averaging the outputs of a model based on a choice of values from a complex joint distribution is not likely to be a good reflection of the true model mean. PSA involves repeatedly sampling parameters from their respective distributions and analysing the resulting model outputs. But how many times should you do this? Most times, an arbitrary number is selected that seems “big enough”, say 1,000 or 10,000. But these simulations themselves exhibit variance; so-called Monte Carlo error. This paper discusses making the choice of the number of simulations more formal by assessing the “convergence” of simulation output.

In the same way as sample sizes are chosen for trials, the number of simulations should provide an adequate level of precision, anything more wastes resources without improving inferences. For example, if the statistic of interest is the net monetary benefit, then we would want the confidence interval (CI) to exclude zero as this should be a sufficient level of certainty for an investment decision. The paper, therefore, proposed conducting a number of simulations, examining the CI for when it is ‘narrow enough’, and conducting further simulations if it is not. However, I see a problem with this proposal: the variance of a statistic from a sequence of simulations itself has variance. The stopping points at which we might check CI are themselves arbitrary: additional simulations can increase the width of the CI as well as reduce them. Consider the following set of simulations from a simple ratio of random variables ICER = gamma(1,0.01)/normal(0.01,0.01):ciwidthThe “stopping rule” therefore proposed doesn’t necessarily indicate “convergence” as a few more simulations could lead to a wider, as well as narrower, CI. The heuristic approach is undoubtedly an improvement on the current way things are usually done, but I think there is scope here for a more rigorous method of assessing convergence in PSA.

Mortality due to low-quality health systems in the universal health coverage era: a systematic analysis of amenable deaths in 137 countries. The Lancet [PubMed] Published 5th September 2018

Richard Horton, the oracular editor-in-chief of the Lancet, tweeted last week:

There is certainly an argument that academic journals are good forums to make advocacy arguments. Who better to interpret the analyses presented in these journals than the authors and audiences themselves? But, without a strict editorial bulkhead between analysis and opinion, we run the risk that the articles and their content are influenced or dictated by the political whims of editors rather than scientific merit. Unfortunately, I think this article is evidence of that.

No-one debates that improving health care quality will improve patient outcomes and experience. It is in the very definition of ‘quality’. This paper aims to estimate the numbers of deaths each year due to ‘poor quality’ in low- and middle-income countries (LMICs). The trouble with this is two-fold: given the number of unknown quantities required to get a handle on this figure, the definition of quality notwithstanding, the uncertainty around this figure should be incredibly high (see below); and, attributing these deaths in a causal way to a nebulous definition of ‘quality’ is tenuous at best. The approach of the article is, in essence, to assume that the differences in fatality rates of treatable conditions between LMICs and the best performing health systems on Earth, among people who attend health services, are entirely caused by ‘poor quality’. This definition of quality would therefore seem to encompass low resourcing, poor supply of human resources, a lack of access to medicines, as well as everything else that’s different in health systems. Then, to get to this figure, the authors have multiple sources of uncertainty including:

  • Using a range of proxies for health care utilisation;
  • Using global burden of disease epidemiology estimates, which have associated uncertainty;
  • A number of data slicing decisions, such as truncating case fatality rates;
  • Estimating utilisation rates based on a predictive model;
  • Estimating the case-fatality rate for non-users of health services based on other estimated statistics.

Despite this, the authors claim to estimate a 95% uncertainty interval with a width of only 300,000 people, with a mean estimate of 5.0 million, due to ‘poor quality’. This seems highly implausible, and yet it is claimed to be a causal effect of an undefined ‘poor quality’. The timing of this article coincides with the Lancet Commission on care quality in LMICs and, one suspects, had it not been for the advocacy angle on care quality, it would not have been published in this journal.

Embedding as a pitfall for survey‐based welfare indicators: evidence from an experiment. Journal of the Royal Statistical Society: Series A Published 4th September 2018

Health economists will be well aware of the various measures used to evaluate welfare and well-being. Surveys are typically used that are comprised of questions relating to a number of different dimensions. These could include emotional and social well-being or physical functioning. Similar types of surveys are also used to collect population preferences over states of the world or policy options, for example, Kahneman and Knetsch conducted a survey of WTP for different environmental policies. These surveys can exhibit what is called an ’embedding effect’, which Kahneman and Knetsch described as when the value of a good varies “depending on whether the good is assessed on its own or embedded as part of a more inclusive package.” That is to say that the way people value single dimensional attributes or qualities can be distorted when they’re embedded as part of a multi-dimensional choice. This article reports the results of an experiment involving students who were asked to weight the relative importance of different dimensions of the Better Life Index, including jobs, housing, and income. The randomised treatment was whether they rated ‘jobs’ as a single category, or were presented with individual dimensions, such as the unemployment rate and job security. The experiment shows strong evidence of embedding – the overall weighting substantially differed by treatment. This, the authors conclude, means that the Better Life Index fails to accurately capture preferences and is subject to manipulation should a researcher be so inclined – if you want evidence to say your policy is the most important, just change the way the dimensions are presented.

Credits

Sam Watson’s journal round-up for Monday 20th August 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A Randomized Trial of Epinephrine in Out-of-Hospital Cardiac Arrest. New England Journal of Medicine. Published July 2018.

Adrenaline (epinephrine) is often administered to patients in cardiac arrest in order to increase blood flow and improve heart rhythm. However, there had been some concern about the potential adverse effects of using adrenaline, and a placebo controlled trial was called for. This article presents the findings of this trial. While there is little economics in this article, it is an interesting example of what I believe to be erroneous causal thinking, especially in the way it was reported in the media. For example, The Guardian‘s headline was,

Routine treatment for cardiac arrest doubles risk of brain damage – study

while The Telegraph went for the even more inflammatory

Cardiac arrest resuscitation drug has needlessly brain-damaged thousands

But what did the study itself say about their findings:

the use of epinephrine during resuscitation for out-of-hospital cardiac arrest resulted in a significantly higher rate of survival at 30 days than the use of placebo. […] although the rate of survival was slightly better, the trial did not show evidence of a between-group difference in the rate of survival with a favorable neurologic outcome. This result was explained by a higher proportion of patients who survived with severe neurologic disability in the epinephrine group.

Clearly, a slightly more nuanced view, but nevertheless it leaves room for the implication that the adrenaline is causing the neurological damage. Indeed the authors go on to say that “the use of epinephrine did not improve neurologic outcome.” But a counterfactual view of causation should lead us to ask what would have happened to those who survived with brain damage had they not been given adrenaline.

We have a competing risks set up: (A) survival with favourable neurologic outcome, (B) survival with neurologic impairment, and (C) death. The proportion of patients with outcome (A) was slightly higher in the adrenaline group (although not statistically significant so apparently no effect eyes roll), the proportion of patients with outcome (B) was a lot higher in the adrenaline group, and the proportion of patients with outcome (C) was lower in the adrenaline group. This all suggests to me that the adrenaline caused patients who would have otherwise died to mostly survive with brain damage, and a few to survive impairment free, not that adrenaline caused those who would have otherwise been fine to have brain damage. So the question in response to the above quotes is then, is death a preferable neurologic outcome to brain damage? As trite as this may sound, it is a key health economics question – how do we value these health states?

Incentivizing Safer Sexual Behavior: Evidence from a Lottery Experiment on HIV Prevention. American Economic Review: Applied Economics. [RePEcPublished July 2018. 

This article presents a randomised trial testing an interesting idea. People who are at high risk of HIV and other sexually transmitted infections (STIs) and often those who engage in riskier sexual behaviour. A basic decision theoretic conception would be that those individuals don’t consider the costs to be high enough relative to the benefits (although there is clearly some divide between this explanation and how people actually think in terms of risky sexual behaviour, much like any other seemingly irrational behaviour). Conditional cash transfers can change the balance of the decision to incentivise people to act differently, what this study looks at is using a conditional lottery with the chance of high winnings instead, since this should be more attractive still to risk-seeking individuals. While the trial was designed to reduce HIV prevalance, entry into the lottery in the treatment arm was conditional on being free of two curable STIs at each round – this enabled people who fail to be eligible again, and also allowed the entry of HIV-positive individuals whose sexual behaviour is perhaps the most important to reducing HIV transmission. The lottery arm of the trial was found to have 20% lower incidence over the study period compared to the control arm – quite impressive. However, the cost-effectiveness of the program was estimated to be $882 per HIV infection averted on the basis of lottery payments alone, and around $3,300 per case averted all in. This seems quite high to me. Despite a plethora of non-comparable outcomes in cost-effectiveness studies of HIV public health interventions other studies have reported costs per cases averted an order of magnitude lower than this. The conclusions seems to be then that the idea works well – it’s just too costly to be of much use.

Monitoring equity in universal health coverage with essential services for neglected tropical diseases: an analysis of data reported for five diseases in 123 countries over 9 years. The Lancet: Global Health. [PubMedPublished July 2018. 

Universal health coverage (UHC) is one the key parts of Sustainable Development Goal (SDG) 3, good health and well-being. The text of the SDG identifies UHC as being about access to services – but this word “access” in the context of health care is often vague and nebulous. Many people mistakenly treat access to health services as synonymous with use of health services, but having access to something is not dependent on whether you actually use it or not. Barriers to a person’s ability to use health care for a given complaint are numerous: financial cost, time cost, lack of education, language barrier, and so forth. It is therefore difficult to quantify and measure access. Hogan and co-authors proposed an index to quantify and monitor UHC across the world that was derived from a number of proxies such as women with four or more antenatal visits, children with vaccines, blood pressure, and health worker density. Their work is useful but of course flawed – these proxies all capture something different, either access, use, or health outcomes – and it is unclear that they are all sensitive to the same underlying construct. Needless to say, we should still be able to diagnose access issues from some combination of these data. This article extends the work of Hogan et al to look at neglected tropical diseases, which affect over 1.5 billion, yet which are, obviously, neglected. The paper uses ‘preventative chemotherapy coverage’ as its key measure, which is the proportion of those needed the chemotherapy who actually receive it. This is a measure of use and not access (although they should be related), for example, there may be near universal availability of the chemotherapy, but various factors on the demand side limiting use. Needless to say, the measure should still be a useful diagnostic tool and it is interesting to see how much worse countries perform on this metric for neglected tropical diseases than general health care.

 

Credits