# Sam Watson’s journal round-up for 11th February 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Contest models highlight inherent inefficiencies of scientific funding competitions. PLoS Biology [PubMed] Published 2nd January 2019

If you work in research you will have no doubt thought to yourself at one point that you spend more time applying to do research than actually doing it. You can spend weeks working on (what you believe to be) a strong proposal only for it to fail against other strong bids. That time could have been spent collecting and analysing data. Indeed, the opportunity cost of writing extensive proposals can be very high. The question arises as to whether there is another method of allocating research funding that reduces this waste and inefficiency. This paper compares the proposal competition to a partial lottery. In this lottery system, proposals are short, and among those that meet some qualifying standard those that are funded are selected at random. This system has the benefit of not taking up too much time but has the cost of reducing the average scientific value of the winning proposals. The authors compare the two approaches using an economic model of contests, which takes into account factors like proposal strength, public benefits, benefits to the scientist like reputation and prestige, and scientific value. Ultimately they conclude that, when the number of awards is smaller than the number of proposals worthy of funding, the proposal competition is inescapably inefficient. It means that researchers have to invest heavily to get a good project funded, and even if it is good enough it may still not get funded. The stiffer the competition the more researchers have to work to win the award. And what little evidence there is suggests that the format of the application makes little difference to the amount of time spent by researchers on writing it. The lottery mechanism only requires the researcher to propose something that is good enough to get into the lottery. Far less time would therefore be devoted to writing it and more time spent on actual science. I’m all for it!

Preventability of early versus late hospital readmissions in a national cohort of general medicine patients. Annals of Internal Medicine [PubMed] Published 5th June 2018

Hospital quality is hard to judge. We’ve discussed on this blog before the pitfalls of using measures such as adjusted mortality differences for this purpose. Just because a hospital has higher than expected mortality does not mean those death could have been prevented with higher quality care. More thorough methods assess errors and preventable harm in care. Case note review studies have suggested as little as 5% of deaths might be preventable in England and Wales. Another paper we have covered previously suggests then that the predictive value of standardised mortality ratios for preventable deaths may be less than 10%.

Visualisation in Bayesian workflow. Journal of the Royal Statistical Society: Series A (Statistics in Society) [RePEc] Published 15th January 2019

This article stems from a broader programme of work from these authors on good “Bayesian workflow”. That is to say, if we’re taking a Bayesian approach to analysing data, what steps ought we to be taking to ensure our analyses are as robust and reliable as possible? I’ve been following this work for a while as this type of pragmatic advice is invaluable. I’ve often read empirical papers where the authors have chosen, say, a logistic regression model with covariates x, y, and z and reported the outcomes, but at no point ever justified why this particular model might be any good at all for these data or the research objective. The key steps of the workflow include, first, exploratory data analysis to help set up a model, and second, performing model checks before estimating model parameters. This latter step is important: one can generate data from a model and set of prior distributions, and if the data that this model generates looks nothing like what we would expect the real data to look like, then clearly the model is not very good. Following this, we should check whether our inference algorithm is doing its job, for example, are the MCMC chains converging? We can also conduct posterior predictive model checks. These have had their criticisms in the literature for using the same data to both estimate and check the model which could lead to the model generalising poorly to new data. Indeed in a recent paper of my own, posterior predictive checks showed poor fit of a model to my data and that a more complex alternative was better fitting. But other model fit statistics, which penalise numbers of parameters, led to the alternative conclusions. So the simpler model was preferred on the grounds that the more complex model was overfitting the data. So I would argue posterior predictive model checks are a sensible test to perform but must be interpreted carefully as one step among many. Finally, we can compare models using tools like cross-validation.

This article discusses the use of visualisation to aid in this workflow. They use the running example of building a model to estimate exposure to small particulate matter from air pollution across the world. Plots are produced for each of the steps and show just how bad some models can be and how we can refine our model step by step to arrive at a convincing analysis. I agree wholeheartedly with the authors when they write, “Visualization is probably the most important tool in an applied statistician’s toolbox and is an important complement to quantitative statistical procedures.”

Credits

# Sam Watson’s journal round-up for 10th September 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Probabilistic sensitivity analysis in cost-effectiveness models: determining model convergence in cohort models. PharmacoEconomics [PubMed] Published 27th July 2018

Probabilistic sensitivity analysis (PSA) is rightfully a required component of economic evaluations. Deterministic sensitivity analyses are generally biased; averaging the outputs of a model based on a choice of values from a complex joint distribution is not likely to be a good reflection of the true model mean. PSA involves repeatedly sampling parameters from their respective distributions and analysing the resulting model outputs. But how many times should you do this? Most times, an arbitrary number is selected that seems “big enough”, say 1,000 or 10,000. But these simulations themselves exhibit variance; so-called Monte Carlo error. This paper discusses making the choice of the number of simulations more formal by assessing the “convergence” of simulation output.

In the same way as sample sizes are chosen for trials, the number of simulations should provide an adequate level of precision, anything more wastes resources without improving inferences. For example, if the statistic of interest is the net monetary benefit, then we would want the confidence interval (CI) to exclude zero as this should be a sufficient level of certainty for an investment decision. The paper, therefore, proposed conducting a number of simulations, examining the CI for when it is ‘narrow enough’, and conducting further simulations if it is not. However, I see a problem with this proposal: the variance of a statistic from a sequence of simulations itself has variance. The stopping points at which we might check CI are themselves arbitrary: additional simulations can increase the width of the CI as well as reduce them. Consider the following set of simulations from a simple ratio of random variables $ICER = gamma(1,0.01)/normal(0.01,0.01)$:The “stopping rule” therefore proposed doesn’t necessarily indicate “convergence” as a few more simulations could lead to a wider, as well as narrower, CI. The heuristic approach is undoubtedly an improvement on the current way things are usually done, but I think there is scope here for a more rigorous method of assessing convergence in PSA.

Mortality due to low-quality health systems in the universal health coverage era: a systematic analysis of amenable deaths in 137 countries. The Lancet [PubMed] Published 5th September 2018

Richard Horton, the oracular editor-in-chief of the Lancet, tweeted last week:

There is certainly an argument that academic journals are good forums to make advocacy arguments. Who better to interpret the analyses presented in these journals than the authors and audiences themselves? But, without a strict editorial bulkhead between analysis and opinion, we run the risk that the articles and their content are influenced or dictated by the political whims of editors rather than scientific merit. Unfortunately, I think this article is evidence of that.

No-one debates that improving health care quality will improve patient outcomes and experience. It is in the very definition of ‘quality’. This paper aims to estimate the numbers of deaths each year due to ‘poor quality’ in low- and middle-income countries (LMICs). The trouble with this is two-fold: given the number of unknown quantities required to get a handle on this figure, the definition of quality notwithstanding, the uncertainty around this figure should be incredibly high (see below); and, attributing these deaths in a causal way to a nebulous definition of ‘quality’ is tenuous at best. The approach of the article is, in essence, to assume that the differences in fatality rates of treatable conditions between LMICs and the best performing health systems on Earth, among people who attend health services, are entirely caused by ‘poor quality’. This definition of quality would therefore seem to encompass low resourcing, poor supply of human resources, a lack of access to medicines, as well as everything else that’s different in health systems. Then, to get to this figure, the authors have multiple sources of uncertainty including:

• Using a range of proxies for health care utilisation;
• Using global burden of disease epidemiology estimates, which have associated uncertainty;
• A number of data slicing decisions, such as truncating case fatality rates;
• Estimating utilisation rates based on a predictive model;
• Estimating the case-fatality rate for non-users of health services based on other estimated statistics.

Despite this, the authors claim to estimate a 95% uncertainty interval with a width of only 300,000 people, with a mean estimate of 5.0 million, due to ‘poor quality’. This seems highly implausible, and yet it is claimed to be a causal effect of an undefined ‘poor quality’. The timing of this article coincides with the Lancet Commission on care quality in LMICs and, one suspects, had it not been for the advocacy angle on care quality, it would not have been published in this journal.

Embedding as a pitfall for survey‐based welfare indicators: evidence from an experiment. Journal of the Royal Statistical Society: Series A Published 4th September 2018

Health economists will be well aware of the various measures used to evaluate welfare and well-being. Surveys are typically used that are comprised of questions relating to a number of different dimensions. These could include emotional and social well-being or physical functioning. Similar types of surveys are also used to collect population preferences over states of the world or policy options, for example, Kahneman and Knetsch conducted a survey of WTP for different environmental policies. These surveys can exhibit what is called an ’embedding effect’, which Kahneman and Knetsch described as when the value of a good varies “depending on whether the good is assessed on its own or embedded as part of a more inclusive package.” That is to say that the way people value single dimensional attributes or qualities can be distorted when they’re embedded as part of a multi-dimensional choice. This article reports the results of an experiment involving students who were asked to weight the relative importance of different dimensions of the Better Life Index, including jobs, housing, and income. The randomised treatment was whether they rated ‘jobs’ as a single category, or were presented with individual dimensions, such as the unemployment rate and job security. The experiment shows strong evidence of embedding – the overall weighting substantially differed by treatment. This, the authors conclude, means that the Better Life Index fails to accurately capture preferences and is subject to manipulation should a researcher be so inclined – if you want evidence to say your policy is the most important, just change the way the dimensions are presented.

Credits

# Thesis Thursday: Thomas Hoe

On the third Thursday of every month, we speak to a recent graduate about their thesis and their studies. This month’s guest is Dr Thomas Hoe who has a PhD from University College London. If you would like to suggest a candidate for an upcoming Thesis Thursday, get in touch.

Title
Essays on the economics of health care provision
Supervisors
Richard Blundell, Orazio Attanasio
http://discovery.ucl.ac.uk/10048627/

What data do you use in your analyses and what are your main analytical methods?

I use data from the English National Health Service (NHS). One of the great features of the NHS is the centralized data it collects, with the Hospital Episodes Statistics (HES) containing information on every public hospital visit in England.

In my thesis, I primarily use two empirical approaches. In my work on trauma and orthopaedic departments, I exploit the fact that the number of emergency trauma admissions to hospital each day is random. This randomness allows me to conduct a quasi-experiment to assess how hospitals perform when they are more or less busy.

The second approach I use, in my work on emergency departments with Jonathan Gruber and George Stoye, is based on bunching techniques that originated in the tax literature (Chetty et al, 2013; Kleven and Waseem, 2013; Saez, 2010). These techniques use interpolation to infer how discontinuities in incentive schemes affect outcomes. We apply and extend these techniques to evaluate the impact of the ‘4-hour target’ in English emergency departments.

How did you characterise and measure quality in your research?

Measuring the quality of health care outcomes is always a challenge in empirical research. Since my research primarily relies on administrative data from HES, I use the patient outcomes that can be directly constructed from this data: in-hospital mortality, and unplanned readmission.

Mortality is, of course, an outcome that is widely used, and offers an unambiguous interpretation. Readmission, on the other hand, is an outcome that has gained more acceptance as a measure of quality in recent years, particularly following the implementation of readmission penalties in the UK and the US.

What is ‘crowding’, and how can it affect the quality of care?

I use the term crowding to refer, in a fairly general sense, to how busy a hospital is. This could mean that the hospital is physically very crowded, with lots of patients in close proximity to one another, or that the number of patients outstrips the available resources.

In practice, I evaluate how crowding affects quality of care by comparing hospital performance and patient outcomes on days when hospitals deal with different levels of admissions (due to random spikes in the number of trauma admissions). I find that hospitals respond by not only cancelling some planned admissions, such as elective hip and knee replacements, but also discharge existing patients sooner. For these discharged patients, the shorter-than-otherwise stay in the hospital is associated with poorer health outcomes for patients, most notably an increase in subsequent hospital visits (unplanned readmissions).

How might incentives faced by hospitals lead to negative consequences?

One of the strongest incentives faced by public hospitals in England is to meet the government-set waiting time target for elective care. This target has been very successful at reducing wait times. In doing so, however, it may have contributed to hospitals shortening patient stays and increasing patient admissions.

My research shows that shorter hospitals stays, in turn, can lead to increases in unplanned readmissions. Setting strong wait time targets, then, is in effect trading off shorter waits (from which patients benefit) with crowding effects (which may harm patients).

Your research highlights the importance of time in the hospital production process. How does this play out?

I look at this from three dimensions, each a separate part of a patient’s journey through hospital.

The first two relate to waiting for treatment. For elective patients, this means waiting for an appointment, and previous work has shown that patients attach significant value to reductions in these wait times. I show that trauma and orthopaedic patients would be better off with further wait time reductions, even if that leads to more crowding.

Emergency patients, in contrast, wait for treatment while physically in a hospital emergency department. I show that these waiting times can be very harmful and that by shortening these wait times we can actually save lives.

The third dimension relates to how long a patient spends in hospital recovering from surgery. I show that, at least on the margin of care for trauma and orthopaedic patients, an additional day in hospital has tangible benefits in terms of reducing the likelihood of experiencing an unplanned readmission.

How could your findings be practically employed in the NHS to improve productivity?

I would highlight two areas of my research that speak directly to the policy debate about NHS productivity.

First, while the wait time targets for elective care may have led to some crowding problems and subsequently more readmissions, the net benefit of these targets to trauma and orthopaedic patients is positive. Second, the wait time target for emergency departments also appears to have benefited patients: it saved lives at a reasonably cost-effective rate.

From the perspective of patients, therefore, I would argue these policies have been relatively successful and should be maintained.