Thesis Thursday: Andrea Gabrio

On the third Thursday of every month, we speak to a recent graduate about their thesis and their studies. This month’s guest is Dr Andrea Gabrio who has a PhD from University College London. If you would like to suggest a candidate for an upcoming Thesis Thursday, get in touch.

Title
Full Bayesian methods to handle missing data in health economic evaluation
Supervisors
Gianluca Baio, Alexina Mason, Rachael Hunter
Repository link
http://discovery.ucl.ac.uk/10072087

What kind of assumptions about missing data are made in trial-based economic evaluations?

In any analysis, assumptions about the missing values are always made, about those values which are not observed. Since the final results may depend on these assumptions, it is important that they are as plausible as possible within the context considered. For example, in trial-based economic evaluations, missing values often occur when data are collected through self-reported patient questionnaires and in many cases it is plausible that patients with unobserved responses are different from the others (e.g. have worse health states). In general, it is very important that a range of plausible scenarios (defined according to the available information) are considered, and that the robustness of our conclusions across them is assessed in sensitivity analysis. Often, however, analysts prefer to ignore this uncertainty and rely on ‘default’ approaches (e.g. remove the missing data from the analysis) which implicitly make unrealistic assumptions and possibly lead to biased results. For a more in-depth overview of current practice, I refer to my published review.

Given that any assumption about the missing values cannot be checked from the data at hand, an ideal approach to handle missing data should combine a well-defined model for the observed data and explicit assumptions about missingness.

What do you mean by ‘full Bayesian’?

The term ‘full Bayesian’ is a technicality and typically indicates that, in the Bayesian analysis, the prior distributions are freely specified by the analyst, rather than being based on the data (e.g. ’empirical Bayesian’). Being ‘fully’ Bayesian has some key advantages for handling missingness compared to other approaches, especially in small samples. First, a flexible choice of the priors may help to stabilise inference and avoid giving too much weight to implausible parameter values. Second, external information about missingness (e.g. expert opinion) can be easily incorporated into the model through the priors. This is essential when performing sensitivity analysis to missingness, as it allows assessment of the robustness of the results to a range of assumptions, with the uncertainty of any unobserved quantity (parameters or missing data) being fully propagated and quantified in the posterior distribution.

How did you use case studies to support the development of your methods?

In my PhD I had access to economic data from two small trials, which were characterised by considerable amounts of missing outcome values and which I used as motivating examples to implement my methods. In particular, individual-level economic data are characterised by a series of complexities that make it difficult to justify the use of more ‘standardised’ methods and which, if not taken into account, may lead to biased results.

Examples of these include the correlation between effectiveness and costs, the skewness in the empirical distributions of both outcomes, the presence of identical values for many individuals (e.g. excess zeros or ones), and, on top of that, missingness. In many cases, the implementation of methods to handle these issues is not straightforward, especially when multiple types of complexities affect the data.

The flexibility of the Bayesian framework allows the specification of a model whose level of complexity can be increased in a relatively easy way to handle all these problems simultaneously, while also providing a natural way to perform probabilistic sensitivity analysis. I refer to my published work to see an example of how Bayesian models can be implemented to handle trial-based economic data.

How does your framework account for longitudinal data?

Since the data collected within a trial have a longitudinal nature (i.e. collected at different times), it is important that any missingness methods for trial-based economic evaluations take into account this feature. I therefore developed a Bayesian parametric model for a bivariate health economic longitudinal response which, together with accounting for the typical complexities of the data (e.g. skewness), can be fitted to all the effectiveness and cost variables in a trial.

Time dependence between the responses is formally taken into account by means of a series of regressions, where each variable can be modelled conditionally on other variables collected at the same or at previous time points. This also offers an efficient way to handle missingness, as the available evidence at each time is included in the model, which may provide valuable information for imputing the missing data and therefore improve the confidence in the final results. In addition, sensitivity analysis to a range of missingness assumptions can be performed using a ‘pattern mixture’ approach. This allows the identification of certain parameters, known as sensitivity parameters, on which priors can be specified to incorporate external information and quantify its impact on the conclusions. A detailed description of the longitudinal model and the missing data analyses explored is also available online.

Are your proposed methods easy to implement?

Most of the methods that I developed in my project were implemented in JAGS, a software specifically designed for the analysis of Bayesian models using Markov Chain Monte Carlo simulation. Like other Bayesian software (e.g. OpenBUGS and STAN), JAGS is freely available and can be interfaced with different statistical programs, such as R, SAS, Stata, etc. Therefore, I believe that, once people are willing to overcome the initial barrier of getting familiar with a new software language, these programs provide extremely powerful tools to implement Bayesian methods. Although in economic evaluations analysts are typically more familiar with frequentist methods (e.g. multiple imputations), it is clear that as the complexity of the analysis increases, the implementation of these methods would require tailor-made routines for the optimisation of non-standard likelihood functions, and a full Bayesian approach is likely to be a preferable option as it naturally allows the propagation of uncertainty to the wider economic model and to perform sensitivity analysis.

Chris Sampson’s journal round-up for 11th March 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Identification, review, and use of health state utilities in cost-effectiveness models: an ISPOR Good Practices for Outcomes Research Task Force report. Value in Health [PubMed] Published 1st March 2019

When modellers select health state utility values to plug into their models, they often do it in an ad hoc and unsystematic way. This ISPOR Task Force report seeks to address that.

The authors discuss the process of searching, reviewing, and synthesising utility values. Searches need to use iterative techniques because evidence requirements develop as a model develops. Due to the scope of models, it may be necessary to develop multiple search strategies (for example, for different aspects of disease pathways). Searches needn’t be exhaustive, but they should be systematic and transparent. The authors provide a list of factors that should be considered in defining search criteria. In reviewing utility values, both quality and appropriateness should be considered. Quality is indicated by the precision of the evidence, the response rate, and missing data. Appropriateness relates to the extent to which the evidence being reviewed conforms to the context of the model in which it is to be used. This includes factors such as the characteristics of the study population, the measure used, value sets used, and the timing of data collection. When it comes to synthesis, the authors suggest it might not be meaningful in most cases, because of variation in methods. We can’t pool values if they aren’t (at least roughly) equivalent. Therefore, one approach is to employ strict inclusion criteria (e.g only EQ-5D, only a particular value set), but this isn’t likely to leave you with much. Meta-regression can be used to analyse more dissimilar utility values and provide insight into the impact of methodological differences. But the extent to which this can provide pooled values for a model is questionable, and the authors concede that more research is needed.

This paper can inform that future research. Not least in its attempt to specify minimum reporting standards. We have another checklist, with another acronym (SpRUCE). The idea isn’t so much that this will guide publications of systematic reviews of utility values, but rather that modellers (and model reviewers) can use it to assess whether the selection of utility values was adequate. The authors then go on to offer methodological recommendations for using utility values in cost-effectiveness models, considering issues such as modelling technique, comorbidities, adverse events, and sensitivity analysis. It’s early days, so the recommendations in this report ought to be changed as methods develop. Still, it’s a first step away from the ad hoc selection of utility values that (no doubt) drives the results of many cost-effectiveness models.

Estimating the marginal cost of a life year in Sweden’s public healthcare sector. The European Journal of Health Economics [PubMed] Published 22nd February 2019

It’s only recently that health economists have gained access to data that enables the estimation of the opportunity cost of health care expenditure on a national level; what is sometimes referred to as a supply-side threshold. We’ve seen studies in the UK, Spain, Australia, and here we have one from Sweden.

The authors use data on health care expenditure at the national (1970-2016) and regional (2003-2016) level, alongside estimates of remaining life expectancy by age and gender (1970-2016). First, they try a time series analysis, testing the nature of causality. Finding an apparently causal relationship between longevity and expenditure, the authors don’t take it any further. Instead, the results are based on a panel data analysis, employing similar methods to estimates generated in other countries. The authors propose a conceptual model to support their analysis, which distinguishes it from other studies. In particular, the authors assert that the majority of the impact of expenditure on mortality operates through morbidity, which changes how the model should be specified. The number of newly graduated nurses is used as an instrument indicative of a supply-shift at the national rather than regional level. The models control for socioeconomic and demographic factors and morbidity not amenable to health care.

The authors estimate the marginal cost of a life year by dividing health care expenditure by the expenditure elasticity of life expectancy, finding an opportunity cost of €38,812 (with a massive 95% confidence interval). Using Swedish population norms for utility values, this would translate into around €45,000/QALY.

The analysis is considered and makes plain the difficulty of estimating the marginal productivity of health care expenditure. It looks like a nail in the coffin for the idea of estimating opportunity costs using time series. For now, at least, estimates of opportunity cost will be based on variation according to geography, rather than time. In their excellent discussion, the authors are candid about the limitations of their model. Their instrument wasn’t perfect and it looks like there may have been important confounding variables that they couldn’t control for.

Frequentist and Bayesian meta‐regression of health state utilities for multiple myeloma incorporating systematic review and analysis of individual patient data. Health Economics [PubMed] Published 20th February 2019

The first paper in this round-up was about improving practice in the systematic review of health state utility values, and it indicated the need for more research on the synthesis of values. Here, we have some. In this study, the authors conduct a meta-analysis of utility values alongside an analysis of registry and clinical study data for multiple myeloma patients.

A literature search identified 13 ‘methodologically appropriate’ papers, providing 27 health state utility values. The EMMOS registry included data for 2,445 patients in 22 counties and the APEX clinical study included 669 patients, all with EQ-5D-3L data. The authors implement both a frequentist meta-regression and a Bayesian model. In both cases, the models were run including all values and then with a limited set of only EQ-5D values. These models predicted utility values based on the number of treatment classes received and the rate of stem cell transplant in the sample. The priors used in the Bayesian model were based on studies that reported general utility values for the presence of disease (rather than according to treatment).

The frequentist models showed that utility was low at diagnosis, higher at first treatment, and lower at each subsequent treatment. Stem cell transplant had a positive impact on utility values independent of the number of previous treatments. The results of the Bayesian analysis were very similar, which the authors suggest is due to weak priors. An additional Bayesian model was run with preferred data but vague priors, to assess the sensitivity of the model to the priors. At later stages of disease (for which data were more sparse), there was greater uncertainty. The authors provide predicted values from each of the five models, according to the number of treatment classes received. The models provide slightly different results, except in the case of newly diagnosed patients (where the difference was 0.001). For example, the ‘EQ-5D only’ frequentist model gave a value of 0.659 for one treatment, while the Bayesian model gave a value of 0.620.

I’m not sure that the study satisfies the recommendations outlined in the ISPOR Task Force report described above (though that would be an unfair challenge, given the timing of publication). We’re told very little about the nature of the studies that are included, so it’s difficult to judge whether they should have been combined in this way. However, the authors state that they have made their data extraction and source code available online, which means I could check that out (though, having had a look, I can’t find the material that the authors refer to, reinforcing my hatred for the shambolic ‘supplementary material’ ecosystem). The main purpose of this paper is to progress the methods used to synthesise health state utility values, and it does that well. Predictably, the future is Bayesian.

Credits

Sam Watson’s journal round-up for 11th February 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Contest models highlight inherent inefficiencies of scientific funding competitions. PLoS Biology [PubMed] Published 2nd January 2019

If you work in research you will have no doubt thought to yourself at one point that you spend more time applying to do research than actually doing it. You can spend weeks working on (what you believe to be) a strong proposal only for it to fail against other strong bids. That time could have been spent collecting and analysing data. Indeed, the opportunity cost of writing extensive proposals can be very high. The question arises as to whether there is another method of allocating research funding that reduces this waste and inefficiency. This paper compares the proposal competition to a partial lottery. In this lottery system, proposals are short, and among those that meet some qualifying standard those that are funded are selected at random. This system has the benefit of not taking up too much time but has the cost of reducing the average scientific value of the winning proposals. The authors compare the two approaches using an economic model of contests, which takes into account factors like proposal strength, public benefits, benefits to the scientist like reputation and prestige, and scientific value. Ultimately they conclude that, when the number of awards is smaller than the number of proposals worthy of funding, the proposal competition is inescapably inefficient. It means that researchers have to invest heavily to get a good project funded, and even if it is good enough it may still not get funded. The stiffer the competition the more researchers have to work to win the award. And what little evidence there is suggests that the format of the application makes little difference to the amount of time spent by researchers on writing it. The lottery mechanism only requires the researcher to propose something that is good enough to get into the lottery. Far less time would therefore be devoted to writing it and more time spent on actual science. I’m all for it!

Preventability of early versus late hospital readmissions in a national cohort of general medicine patients. Annals of Internal Medicine [PubMed] Published 5th June 2018

Hospital quality is hard to judge. We’ve discussed on this blog before the pitfalls of using measures such as adjusted mortality differences for this purpose. Just because a hospital has higher than expected mortality does not mean those death could have been prevented with higher quality care. More thorough methods assess errors and preventable harm in care. Case note review studies have suggested as little as 5% of deaths might be preventable in England and Wales. Another paper we have covered previously suggests then that the predictive value of standardised mortality ratios for preventable deaths may be less than 10%.

Another commonly used metric is readmission rates. Poor care can mean patients have to return to the hospital. But again, the question remains as to how preventable these readmissions are. Indeed, there may also be substantial differences between those patients who are readmitted shortly after discharge and those for whom it may take a longer time. This article explores the preventability of early and late readmissions in ten hospitals in the US. It uses case note review and a number of reviewers to evaluate preventability. The headline figures are that 36% of early readmissions are considered preventable compared to 23% of late readmissions. Moreover, it was considered that the early readmissions were most likely to have been preventable at the hospital whereas for late readmissions, an outpatient clinic or the home would have had more impact. All in all, another paper which provides evidence to suggest crude, or even adjusted rates, are not good indicators of hospital quality.

Visualisation in Bayesian workflow. Journal of the Royal Statistical Society: Series A (Statistics in Society) [RePEc] Published 15th January 2019

This article stems from a broader programme of work from these authors on good “Bayesian workflow”. That is to say, if we’re taking a Bayesian approach to analysing data, what steps ought we to be taking to ensure our analyses are as robust and reliable as possible? I’ve been following this work for a while as this type of pragmatic advice is invaluable. I’ve often read empirical papers where the authors have chosen, say, a logistic regression model with covariates x, y, and z and reported the outcomes, but at no point ever justified why this particular model might be any good at all for these data or the research objective. The key steps of the workflow include, first, exploratory data analysis to help set up a model, and second, performing model checks before estimating model parameters. This latter step is important: one can generate data from a model and set of prior distributions, and if the data that this model generates looks nothing like what we would expect the real data to look like, then clearly the model is not very good. Following this, we should check whether our inference algorithm is doing its job, for example, are the MCMC chains converging? We can also conduct posterior predictive model checks. These have had their criticisms in the literature for using the same data to both estimate and check the model which could lead to the model generalising poorly to new data. Indeed in a recent paper of my own, posterior predictive checks showed poor fit of a model to my data and that a more complex alternative was better fitting. But other model fit statistics, which penalise numbers of parameters, led to the alternative conclusions. So the simpler model was preferred on the grounds that the more complex model was overfitting the data. So I would argue posterior predictive model checks are a sensible test to perform but must be interpreted carefully as one step among many. Finally, we can compare models using tools like cross-validation.

This article discusses the use of visualisation to aid in this workflow. They use the running example of building a model to estimate exposure to small particulate matter from air pollution across the world. Plots are produced for each of the steps and show just how bad some models can be and how we can refine our model step by step to arrive at a convincing analysis. I agree wholeheartedly with the authors when they write, “Visualization is probably the most important tool in an applied statistician’s toolbox and is an important complement to quantitative statistical procedures.”

Credits