OHE Lunchtime Seminar with Sarah Karlsberg, Steve Paling, and Júlia Esquerré on ‘What can NHS trusts do to reduce cancer waiting times?’ To be held on 14th November 2018 from 12 p.m. to 2 p.m.

Rapid diagnosis and access to treatment for cancer are vital for both clinical outcomes and patient experience of care. The NHS Constitution contains several waiting times targets, including that 85% of patients diagnosed with cancer should receive treatment within 62 days of referral. However, waiting times are increasing in England: the 62-day target has not been met since late 2013 and, in July 2018, the NHS recorded its worst performance since records began in October 2009.

This seminar will present evidence on where NHS trusts can take practical steps to reduce cancer waiting times. The work uses patient-level data (Hospital Episode Statistics) from 2016/17 and an econometric model to quantify the potential effects of several recommendations on the average length of patients’ cancer pathways. The project won the 2018 John Hoy Memorial Award for the best piece of economic analysis produced by government economists.

Sarah Karlsberg, Steven Paling, and Júlia González Esquerré work in the NHS Improvement Economics Team, which provides economics expertise to NHS Improvement (previously Monitor and the Trust Development Authority) and the provider sector. Their work covers all aspects of provider policy, including operational and financial performance, quality of care, leadership and strategic change. Sarah is also a Visiting Fellow at OHE.

The seminar will be held in the Sir Alexander Fleming Room, Southside, 7th Floor, 105 Victoria Street, London SW1E 6QT. A buffet lunch will be available from 12 p.m. The seminar will start promptly at 12:30 p.m. and finish promptly at 2 p.m.

If you would like to attend this seminar, please reply to ohegeneral@ohe.org.

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is shared parameter models.

Principles

Missing data and data errors are an inevitability rather than a possibility. If these data were missing as a result of a random computer error, then there would be no problem, no bias would result in estimators of statistics from these data. But, this is probably not why they’re missing. People drop out of surveys and trials often because they choose to, if they move away, or worse if they die. The trouble with this is that those factors that influence these decisions and events are typically also those that affect the outcomes of interest in our studies, thus leading to bias. Unfortunately, missing data is often improperly dealt with. For example, a study of randomised controlled trials (RCTs) in the big four medical journals found that 95% had some missing data, and around 85% of those did not deal with it in a suitable way. An instructive article in the BMJ illustrated the potentially massive biases that dropout in RCTs can generate. Similar effects should be expected from dropout in panel studies and other analyses. Now, if the data are missing at random – i.e. the probability of missing data or dropout is independent of the data conditional on observed covariates – then we could base our inferences on just the observed data. But this is often not the case, so what do we do in these circumstances?

Implementation

If we have a full set of data and a set of indicators for whether each observation is missing , plus some parameters and , then we can factorise their joint distribution, in three ways:

Selection model

Perhaps most familiar to econometricians, this factorisation involves the marginal distribution of the full data and the conditional distribution of missingness given the data. The Heckman selection model is an example of this factorisation. For example, one could specify a probit model for dropout and a normally distributed outcome, and then the full likelihood would involve the product of the two.

Pattern-mixture model

This approach specifies a marginal distribution for the missingness or dropout mechanism and then the distribution of the data differs according to the type of missingness or dropout. The data are a mixture of different patterns, i.e. distributions. This type of model is implied when non-response is not considered missing data per se, and we’re interested in inferences within each sub-population. For example, when estimating quality of life at a given age, the quality of life of those that have died is not of interest, but their dying can bias the estimates.

Shared parameter model

Now, the final way we can model these data posits unobserved variables, , conditional on which and are independent. These models are most appropriate when the dropout or missingness is attributable to some underlying process changing over time, such as disease progression or household attitudes, or an unobserved variable, such as health status.

At the simplest level, one could consider two separate models with correlated random effects, for example, adding in covariates and having a linear mixed model and probit selection model for person at time

and

so that the random effects are multivariate normally distributed.

A more complex and flexible specification for longitudinal settings would permit the random effects to vary over time, differently between models and individuals:

and

As an example, if time were discrete in this model then could be a series of parameters for each time period , what are often referred to as ‘factor loadings’ in the structural equation modelling literature. We will run up against identifiability problems with these more complex models. For example, if the random effect was normally distributed i.e. then we could multiply each factor loading by and then would give us an equivalent model. So, we would have to put restrictions on the parameters. We can set the variance of the random effect to be one, i.e. . We can also set one of the factor loadings to zero, without loss of generality, i.e. .

The distributional assumptions about the random effects can have potentially large effects on the resulting inferences. It is possible therefore to non-parametrically model these as well – e.g. using a mixture distribution. Ultimately, these models are a useful method to deal with data that are missing not at random, such as informative dropout from panel studies.

Software

Estimation can be tricky with these models given the need to integrate out the random effects. For frequentist inferences, expectation maximisation (EM) is one way of estimating these models, but as far as I’m aware the algorithm would have to be coded for the problem specifically in Stata or R. An alternative is using some kind of quadrature based method. The Stata package stjm fits shared parameter models for longitudinal and survival data, with similar specifications to those above.

Otherwise, Bayesian tools, such as Hamiltonian Monte Carlo, may have more luck dealing with the more complex models. For the simpler correlated random effects specification specified above one can use the stan_mvmer command in the rstanarm package. For more complex models, one would need to code the model in something like Stan.

Applications

For a health economics specific discussion of these types of models, one can look to the chapter Latent Factor and Latent Class Models to Accommodate Heterogeneity, Using Structural Equation in the Encyclopedia of Health Economics, although shared parameter models only get a brief mention. However, given that that book is currently on sale for £1,000, it may be beyond the wallet of the average researcher! Some health-related applications may be more helpful. Vonesh et al. (2011) used shared parameter models to look at the effects of diet and blood pressure control on renal disease progression. Wu and others (2011) look at how to model the effects of a ‘concomitant intervention’, which is one applied when a patient’s health status deteriorates and so is confounded with health, using shared parameter models. And, Baghfalaki and colleagues (2017) examine heterogeneous random effect specification for shared parameter models and apply this to HIV data.

When we think of the causaleffect of living in one neighbourhood compared to another we think of how the social interactions and lifestyle of that area produce better outcomes. Does living in an area with more obese people cause me to become fatter? (Quite possibly). Or, if a family moves to an area where people earn more will they earn more? (Read on).

In a previous post, we discussed such effects in the context of slums, where the synergy of poor water and sanitation, low quality housing, small incomes, and high population density likely has a negative effect on residents’ health. However, we also discussed how difficult it is to estimate neighbourhood effects empirically for a number of reasons. On top of this, are the different ways neighbourhood effects can manifest. Social interactions may mean behaviours that lead to better health or incomes rub off on one another. But also there may be some underlying cause of the group’s, and hence each individual’s, outcomes. In the slum, low education may mean poor hygiene habits spread, or the shared environment may contain pathogens, for example. Both of these pathways may constitute a neighbourhood effect, but both imply very different explanations and potential policy remedies.

What should we make then of, not one, but two new articles by Raj Chetty and Nathaniel Henderen in the recent issue of Quarterly Journal of Economics? Both of which use observational data to estimate neighbourhood effects.

The authors have an impressive data set. They use federal tax records from the US between 1996 and 2012 and identify all children born between 1980 and 1988 and their parents (or parent). For each of these family units they determine household income and then the income of the children when they are older. To summarise a rather long exegesis of the methods used, I’ll try to describe the principle finding in one sentence:

Among families moving between commuting zones in the US, the average income percentile of children at age 26 is 0.04 percentile points higher per year spent and per additional percentile point increase in the average income percentile of the children of permanent residents at age 26 in the destination where the family move to. (Phew!)

They interpret this as the outcomes of in-migrating children ‘converging’ to the outcomes of permanently resident children at a rate of 4% per year. That should provide an idea of how the outcomes and treatments were defined, and who constituted the sample. The paper makes the assumption that the effect is the same regardless of the age of the child. Or to perhaps make it a bit clearer, the claim can be interpreted as that human capital, H, does something like this (ignoring growth over childhood due to schooling etc.):

where ‘good’ and ‘bad’ mean ‘good neighbourhood’ and ‘bad neighbourhood’. This could be called the better neighbourhoods cause you to do better hypothesis.

The analyses also take account of parental income at the time of the move and looks at families who moved due to a natural disaster or other ‘exogenous’ shock. The different analyses generally support the original estimate putting the result in the region of 0.03 to 0.05 percentile points.

But are these neighbourhood effects?

A different way of interpreting these results is that there is an underlying effect driving incomes in each area. Areas with higher incomes for their children in the future are those that have a higher market price for labour in the future. So we could imagine that this is what is going on with human capital instead:

This is the those moving to areas where people will earn more in the future, also earn more in the future because of differences in the labour market hypothesis. The Bureau of Labour Statistics, for example, cites the wage rate for a registered nurse as $22.61 in Iowa and $36.13 in California. But we can’t say from the data whether the children are sorting into different occupations or are getting paid different amounts for the same occupations.

The reflection problem

Manksi (1993) called the issue the ‘reflection problem’, which he described as arising when

a researcher observes the distribution of a behaviour in a population and wishes to infer whether the average behaviour in some group influences the behaviour of the individuals that compose the group.

What we have here is a linear-in-means model estimating the effect of average incomes on individual incomes. But what we cannot distinguish between is the competing explanations of, what Manski called, endogenous effects that result from the interaction with families with higher incomes, and correlated effects that lead to similar outcomes due to exposure to the same underlying latent forces, i.e. the market. We could also add contextual effects that manifest due to shared group characteristics (e.g. levels of schooling or experience). When we think of a ‘neighbourhood effect’ I tend to think of them as of the endogenous variety, i.e. the direct effects of living in a certain neighbourhood. For example, under different labour market conditions, both my income and the average income of the permanent residents of the neighbourhood I move to might be lower, but not because of the neighbourhood.

The third hypothesis

There’s also the third hypothesis, families that are better off move to better areas (i.e. effects are accounted for by unobserved family differences):

The paper presents lots of modifications to the baseline model, but none of them can provide an exogenous choice of destination. They look at an exogenous cause of moving – natural disasters – and also instrument with the expected difference in income percentiles for parents from the same zip code, but I can’t see how this instrument is valid. Selection bias is acknowledged in the paper but without some exogenous variation in where a family moves to it’ll be difficult to really claim to have identified a causal effect. The choice to move is in the vast majority of family’s cases based on preferences over welfare and well-being, especially income. Indeed, why would a family move to a worse off area unless their circumstances demanded it of them? So in reality, I would imagine the truth would lie somewhere in between these three explanations.

Robust analysis?

As a slight detour, we might want to consider if these are causal effects, even if the underlying assumptions hold. The paper presents a range of analyses to show that the results are robust. But these analyses represent just a handful of those possible. Given that the key finding is relatively small in magnitude, one wonders what would have happened under different scenarios and choices – the so-called garden of forking paths problem. To illustrate, consider some of the choices that were made about the data and models, and all the possible alternative choices. The sample included only those with a mean positive income between 1996 to 2004 and those living in commuter zones with populations of over 250,000 in the 2000 census. Those whose income was missing were assigned a value of zero. Average income over 1996 to 2000 is a proxy for lifetime income. If the marital status of the parents changed then the child was assigned to the mother’s location. Non-filers were coded as single. Income is measured in percentile ranks and not dollar terms. The authors justify each of the choices, but an equally valid analysis would have resulted from different choices and possibly produced very different results.

The strategy of this paper is much like the first one, except that rather than trying to estimate the average effect of moving to higher or lower income areas, they try to estimate the effect of moving to each of 3,000 counties in the US. To do this they assume that the number of years exposure to the county is as good as random after taking account of i) origin fixed effects, ii) parental income percentile, and iii) a quadratic function of birth cohort year and parental income percentile to try and control for some differences in labour market conditions. An even stronger assumption than before! The hierarchical model is estimated using some complex two-step method for ‘computational tractability’ (I’d have just used a Bayesian estimator). There’s some further strange calculations, like conversion from percentile ranks into dollar terms by regressing the dollar amounts on average income ranks and multiplying everything by the coefficient, rather than just estimating the model with dollars as the outcome (I suspect it’s to do with their complicated estimation strategy). Nevertheless, we are presented with some (noisy) county-level estimates of the effect of an additional year spent there in childhood. There is a weak correlation with the income ranks of permanent residents. Again, though, we have the issue of many competing explanations for the observed effects.

The differences in predicted causal effect by county don’t help distinguish between our hypotheses. Consider this figure:

Do children of poorer parents in the Southern states end up with lower human capital and lower-skilled jobs than in the Midwest? Or does the market mean that people get paid less for the same job in the South? Compare the map above to the maps below showing wage rates of two common lower-skilled professions, cashiers (right) or teaching assistants (left):

A similar pattern is seen. While this is obviously just a correlation, one suspects that such variation in wages is not being driven by large differences in human capital generated through personal interaction with higher earning individuals. This is also without taking into account any differences in purchasing power between geographic areas.

What can we conclude?

I’ve only discussed a fraction of the contents of these two enormous papers. The contents could fill many more blog posts to come. But it all hinges on whether we can interpret the results as the average causal effect of a person moving to a given place. Not nearly enough information is given to know whether families moving to areas with lower future incomes are comparable to those with higher future incomes. Also, we could easily imagine a world where the same people were all induced to move to different areas – this might produce completely different sets of neighbourhood effects since they themselves contribute to those effects. But I feel that the greatest issue is the reflection problem. Even random assignment won’t get around this. This is not to discount the value or interest these papers generate, but I can’t help but feel too much time is devoted to trying to convince the reader of a ‘causal effect’. A detailed exploration of the relationships in the data between parental incomes, average incomes, spatial variation, later life outcomes, and so forth, might have been more useful for generating understanding and future analyses. Perhaps sometimes in economics we spend too long obsessing over estimating unconvincing ‘causal effects’ and ‘quasi-experimental’ studies that really aren’t and forget the value of just a good exploration of data with some nice plots.