The irrelevance of inference: (almost) 20 years on is it still irrelevant?

The Irrelevance of Inference was a seminal paper published by Karl Claxton in 1999. In it he outlines a stochastic decision making approach to the evaluation of health technologies. A key point that he makes is that we need only to examine the posterior mean incremental net benefit of one technology compared to another to make a decision. Other aspects of the distribution of incremental net benefits are irrelevant – hence the title.

I hated this idea. From a Bayesian perspective estimation and inference is a decision problem. Surely uncertainty matters! But, in the extra-welfarist framework that we generally conduct cost-effectiveness analysis in, it is irrefutable. To see why let’s consider a basic decision making framework.

There are three aspects to a decision problem. Firstly, there is a state of the world, \theta \in \Theta with density \pi(\theta). In this instance it is the net benefits in the population, but could be the state of the economy, or effectiveness of a medical intervention in other contexts, for example. Secondly, there is the possible actions denoted by a \in \mathcal{A}. There might be a discrete set of actions or a continuum of possibilities. Finally, there is the loss function L(a,\theta). The loss function describes the losses or costs associated with making decision a given that \theta is the state of nature. The action that should be taken is the one which minimises expected losses \rho(\theta,a)=E_\theta(L(a,\theta)). Minimising losses can be seen as analogous to maximising utility. We also observe data x=[x_1,...,x_N]' that provide information on the parameter \theta. Our state of knowledge regarding this parameter is then captured by the posterior distribution \pi(\theta|x). Our expected losses should be calculated with respect to this distribution.

Given the data and posterior distribution of incremental net benefits, we need to make a choice about a value (a Bayes estimator), that minimises expected losses. The opportunity loss from making the wrong decision is “the difference in net benefit between the best choice and the choice actually made.” So the decision falls down to deciding whether the incremental net benefits are positive or negative (and hence whether to invest), \mathcal{A}=[a^+,a^-]. The losses are linear if we make the wrong decision:

L(a^+,\theta) = 0 if \theta >0 and L(a^+,\theta) = \theta if \theta <0

L(a^-,\theta) = - \theta if \theta >0 and L(a^+,\theta) = 0 if \theta <0

So we should decide that the incremental net benefits are positive if

E_\theta(L(a^+,\theta)) - E_\theta(L(a^-,\theta)) > 0

which is equivalent to

\int_0^\infty \theta dF^{\pi(\theta|x)}(\theta) - \int_{-\infty}^0 -\theta dF^{\pi(\theta|x)}(\theta) = \int_{-\infty}^\infty \theta dF^{\pi(\theta|x)}(\theta) > 0

which is obviously equivalent to E(\theta|x)>0 – the posterior mean!

If our aim is simply the estimation of net benefits (so \mathcal{A} \subseteq \mathbb{R}), different loss functions lead to different estimators. If we have a squared loss function L(a, \theta)=|\theta-a|^2 then again we should choose the posterior mean. However, other choices of loss function lead to other estimators. The linear loss function, L(a, \theta)=|\theta-a| leads to the posterior median. And a ‘0-1’ loss function: L(a, \theta)=0 if a=\theta and L(a, \theta)=1 if a \neq \theta, gives the posterior mode, which is also the maximum likelihood estimator (MLE) if we have a uniform prior. This latter point does suggest that MLEs will not give the ‘correct’ answer if the net benefit distribution is asymmetric. The loss function is therefore important. But for the purposes of the decision between technologies I see no good reason to reject our initial loss function.

Claxton also noted that equity considerations could be incorporated through ‘adjustments to the measure of outcome’. This could be some kind of weighting scheme. However, this is where I might begin to depart from the claim of the irrelevance of inference. I prefer a social decision maker approach to evaluation in the vein of cost-benefit analysis as discussed by the brilliant Alan Williams. This approach allows for non-market outcomes that extra-welfarism might include but classical welfarism would exclude; their valuations could be arrived at by a political, democratic process or by other means. It also permits inequality aversion and other features that I find are a perhaps more accurate reflection of a political decision making approach. However, one must be aware of all the flaws and failures of this approach, which Williams so neatly describes.

In a social decision maker framework, the decision that should be made is the one that maximises a social welfare function. A utility function expresses social preferences over the distribution of utility in the population, the social welfare function aggregates utility and is usually assumed to be linear (utilitarian). If the utility function is inequality averse then the variance obviously does matter. But, in making this claim I am moving away from the arguments of Claxton’s paper and towards a discussion of the relative merits extra-welfarism and other approaches.

Perhaps the statement that inference was irrelevant was made just to capture our attention. After all the process of updating our knowledge of the net benefits of alternatives from data is inference. But Claxton’s statement refers more to the process of hypothesis testing and p-values (or Bayesian ranges of equivalents), the use of which has no place in decision making. On this point I wholeheartedly agree.

 

Advertisements

Are we estimating the effects of health care expenditure correctly?

It is a contentious issue in philosophy whether an omission can be the cause of an event. At the very least it seems we should consider causation by omission differently from ‘ordinary’ causation. Consider Sarah McGrath’s example. Billy promised Alice to water the plant while she was away, but he did not water it. Billy not watering the plant caused its death. But there are good reasons to suppose that Billy did not cause its death. If Billy’s lack of watering caused the death of the plant, it may well be reasonable to assume that Vladimir Putin and indeed anyone else who did not water the plant were also a cause. McGrath argues that there is a normative consideration here: Billy ought to have watered the plant and that’s why we judge his omission as a cause and not anyone else’s. Similarly, the example from L.A. Paul and Ned Hall’s excellent book Causation: A User’s GuideBilly and Suzy are playing soccer on rival teams. One of Suzy’s teammates scores a goal. Both Billy and Suzy were nearby and could have easily prevented the goal. But our judgement is that the goal should only be credited to Billy’s failure to block the goal as Suzy had no responsibility to.

These arguments may appear far removed from the world of health economics. But, they have practical implications. Consider the estimation of the effect that increasing health care expenditure has on public health outcomes. The government, or relevant health authority, makes a decision about how the budget is allocated. It is often the case that there are allocative inefficiencies: greater gains could be had by reallocating the budget to more effective programs of care. In this case there would seem to be a relevant omission; the budget has not been spent where it could have provided benefits. These omissions are often seen as causes of a loss of health. Karl Claxton wrote of the Cancer Drugs Fund, a pool of money diverted from the National Health Service to provide cancer drugs otherwise considered cost-ineffective, that it was associated with

a net loss of at least 14,400 quality adjusted life years in 2013/14.

Similarly, an analysis of the lack of spending on effective HIV treatment and prevention by the Mbeki administration in South Africa wrote that

More than 330,000 lives or approximately 2.2 million person-years were lost because a feasible and timely ARV treatment program was not implemented in South Africa.

But our analyses of the effects of health care expenditure typically do not take these omissions into account.

Causal inference methods are founded on a counterfactual theory of causation. The aim of a causal inference method is to estimate the potential outcomes that would have been observed under different treatment regimes. In our case this would be what would have happened under different levels of expenditure. This is typically estimated by examining the relationship between population health and levels of expenditure, perhaps using some exogenous determinant of expenditure to identify the causal effects of interest. But this only identifies those changes caused by expenditure and not those changes caused by not spending.

Consider the following toy example. There are two causes of death in the population a and b with associated programs of care and prevention A and B. The total health care expenditure is x of which a proportion p: p\in P \subseteq [0,1] is spent on A and 1-p on B. The deaths due to each cause are y_a and y_b and so the total deaths are y = y_a + y_b. Finally, the effect of a unit increase in expenditure in each program are \beta_a and \beta_b. The question is to determine what the causal effect of expenditure is. If Y_x is the potential outcome for level of expenditure x then the average treatment effect is given by E(\frac{\partial Y_x}{\partial x}).

The country has chosen an allocation between the programmes of care of p_0. If causation by omission is not a concern then, given linear, additive models (and that all the model assumptions are met), y_a = \alpha_a + \beta_a p x + f_a(t) + u_a and y_b = \alpha_b + \beta_b (1-p) x + f_b(t) + u_b, the causal effect is E(\frac{\partial Y_x}{\partial x}) = \beta = \beta_a p_0 + \beta_b (1-p_0). But if causation by omission is relevant, then the net effect of expenditure is the lives gained \beta_a p_0 + \beta_b (1-p_0) less the lives lost. The lives lost are those under all possible things we did not do, so the estimator of the causal effect is \beta' = \beta_a p_0 + \beta_b (1-p_0) -  \int_{P/p_0} [ \beta_ap + \beta_b(1-p) ] dG(p). Now, clearly \beta \neq \beta' unless P/p_0 is the empty set, i.e. there was no other option. Indeed, the choice of possible alternatives involves a normative judgement as we’ve suggested. For an omission to count as a cause, there needs to be a judgement about what ought to have been done. For health care expenditure this may mean that the only viable alternative is the allocatively efficient distribution, in which case all allocations will result in a net loss of life unless they are allocatively efficient, which some may argue is reasonable. An alternative view is simply that the government simply has to not do worse than in the past and perhaps it is also reasonable for the government not to make significant changes to the allocation, for whatever reason. In that case we might say that P \in [p_0,1] and g(p) might be a distribution truncated below p_0 with most mass around p_0 and small variance.

The problem is that we generally do not observe the effect of expenditure in each program of care nor do we know the distribution of possible budget allocations. The normative judgements are also a contentious issue. Claxton clearly believes the government ought not to have initiated the Cancer Drugs Fund, but he does not go so far as to say any allocative inefficiency results in a net loss of life. Some working out of the underlying normative principles is warranted. But if it’s not possible to estimate these net causal effects, why discuss it? Perhaps it’s due to the lack of consistency. We estimate the ‘ordinary’ causal effect in our empirical work, but we often discuss opportunity costs and losses due to inefficiencies as being due to or caused by the spending decisions that are made. As the examples at the beginning illustrate, the normative question of responsibility seeps into our judgments about whether an omission is the cause of an outcome. For health care expenditure the government or other health care body does have a relevant responsibility. I would argue then that causation by omission is important and perhaps we need to reconsider the inferences that we make.

Credits

Visualising PROMs data

The patient reported outcomes measures, or PROMs, is a large database with before and after health-related quality of life (HRQoL) measures for a large number of patients undergoing four key conditions: hip replacement, knee replacement, varicose vein surgery and surgery for groin hernia. The outcome measures are the EQ-5D index and visual analogue scale (and a disease-specific measure for three of the interventions). These data also contain the provider of the operation. Being publicly available, these data allow us to look at a range of different questions: what’s the average effect of the surgery on HRQoL? What are the differences between providers in gains to HRQoL or in patient casemix? Great!

The first thing we should always do with new data is to look at it. This might be in an exploratory way to determine the questions to ask of the data or in an analytical way to get an idea of the relationships between variables. Plotting the data communicates more about what’s going on than any table of statistics alone. However, the plots on the NHS Digital website might be accused of being a little uninspired as they collapse a lot of the variation into simple charts that conceal a lot of what’s going on. For example:

So let’s consider other ways of visualising this data. For all these plots a walk through of the code is at the end of this post.

Now, I’m not a regular user of PROMs data, so what I think are the interesting features of the data may not reflect what the data are generally used for. For this, I think the interesting features are:

  • The joint distribution of pre- and post-op scores
  • The marginal distributions of pre- and post-op scores
  • The relationship between pre- and post-op scores over time

We will pool all the data from six years’ worth of PROMs data. This gives us over 200,000 observations. A scatter plot with this information is useless as the density of the points will be very high. A useful alternative is hexagonal binning, which is like a two-dimensional histogram. Hexagonal tiles, which usefully tessellate and are more interesting to look at than squares, can be shaded or coloured with respect to the number of observations in each bin across the support of the joint distribution of pre- and post-op scores (which is [-0.5,1]x[-0.5,1]). We can add the marginal distributions to the axes and then add smoothed trend lines for each year. Since the data are constrained between -0.5 and 1, the mean may not be a very good summary statistic, so we’ll plot a smoothed median trend line for each year. Finally, we’ll add a line on the diagonal. Patients above this line have improved and patients below it deteriorated.

Hip replacement results

Hip replacement results

There’s a lot going on in the graph, but I think it reveals a number of key points about the data that we wouldn’t have seen from the standard plots on the website:

  • There appear to be four clusters of patients:
    • Those who were in close to full health prior to the operation and were in ‘perfect’ health (score = 1) after;
    • Those who were in close to full health pre-op and who didn’t really improve post-op;
    • Those who were in poor health (score close to zero) and made a full recovery;
    • Those who were in poor health and who made a partial recovery.
  • The median change is an improvement in health.
  • The median change improves modestly from year to year for a given pre-op score.
  • There are ceiling effects for the EQ-5D.

None of this is news to those who study these data. But this way of presenting the data certainly tells more of a story that the current plots on the website.

R code

We’re going to consider hip replacement, but the code is easily modified for the other outcomes. Firstly we will take the pre- and post-op score and their difference and pool them into one data frame.

# df 14/15
df<-read.csv("C:/docs/proms/Record Level Hip Replacement 1415.csv")

df<-df[!is.na(df$Pre.Op.Q.EQ5D.Index),]
df$pre<-df$Pre.Op.Q.EQ5D.Index
df$post<- df$Post.Op.Q.EQ5D.Index
df$diff<- df$post - df$pre

df1415 <- df[,c('Provider.Code','pre','post','diff')]

#
# df 13/14
df<-read.csv("C:/docs/proms/Record Level Hip Replacement 1314.csv")

df<-df[!is.na(df$Pre.Op.Q.EQ5D.Index),]
df$pre<-df$Pre.Op.Q.EQ5D.Index
df$post<- df$Post.Op.Q.EQ5D.Index
df$diff<- df$post - df$pre

df1314 <- df[,c('Provider.Code','pre','post','diff')]

# df 12/13
df<-read.csv("C:/docs/proms/Record Level Hip Replacement 1213.csv")

df<-df[!is.na(df$Pre.Op.Q.EQ5D.Index),]
df$pre<-df$Pre.Op.Q.EQ5D.Index
df$post<- df$Post.Op.Q.EQ5D.Index
df$diff<- df$post - df$pre

df1213 <- df[,c('Provider.Code','pre','post','diff')]

# df 11/12
df<-read.csv("C:/docs/proms/Hip Replacement 1112.csv")

df$pre<-df$Q1_EQ5D_INDEX
df$post<- df$Q2_EQ5D_INDEX
df$diff<- df$post - df$pre
names(df)[1]<-'Provider.Code'

df1112 <- df[,c('Provider.Code','pre','post','diff')]

# df 10/11
df<-read.csv("C:/docs/proms/Record Level Hip Replacement 1011.csv")

df$pre<-df$Q1_EQ5D_INDEX
df$post<- df$Q2_EQ5D_INDEX
df$diff<- df$post - df$pre
names(df)[1]<-'Provider.Code'

df1011 <- df[,c('Provider.Code','pre','post','diff')]

#combine

df1415$year<-"2014/15"
df1314$year<-"2013/14"
df1213$year<-"2012/13"
df1112$year<-"2011/12"
df1011$year<-"2010/11"

df<-rbind(df1415,df1314,df1213,df1112,df1011)
write.csv(df,"C:/docs/proms/eq5d.csv")

Now, for the plot. We will need the packages ggplot2, ggExtra, and extrafont. The latter package is just to change the plot fonts, not essential, but aesthetically pleasing.

require(ggplot2)
require(ggExtra)
require(extrafont)
font_import()
loadfonts(device = "win")

p<-ggplot(data=df,aes(x=pre,y=post))+
 stat_bin_hex(bins=15,color="white",alpha=0.8)+
 geom_abline(intercept=0,slope=1,color="black")+
 geom_quantile(aes(color=year),method = "rqss", lambda = 2,quantiles=0.5,size=1)+
 scale_fill_gradient2(name="Count (000s)",low="light grey",midpoint = 15000,
   mid="blue",high = "red",
   breaks=c(5000,10000,15000,20000),labels=c(5,10,15,20))+
 theme_bw()+
 labs(x="Pre-op EQ-5D index score",y="Post-op EQ-5D index score")+
 scale_color_discrete(name="Year")+
 theme(legend.position = "bottom",text=element_text(family="Gill Sans MT"))

ggMarginal(p, type = "histogram")