Method of the month: Shared parameter models

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is shared parameter models.


Missing data and data errors are an inevitability rather than a possibility. If these data were missing as a result of a random computer error, then there would be no problem, no bias would result in estimators of statistics from these data. But, this is probably not why they’re missing. People drop out of surveys and trials often because they choose to, if they move away, or worse if they die. The trouble with this is that those factors that influence these decisions and events are typically also those that affect the outcomes of interest in our studies, thus leading to bias. Unfortunately, missing data is often improperly dealt with. For example, a study of randomised controlled trials (RCTs) in the big four medical journals found that 95% had some missing data, and around 85% of those did not deal with it in a suitable way. An instructive article in the BMJ illustrated the potentially massive biases that dropout in RCTs can generate. Similar effects should be expected from dropout in panel studies and other analyses. Now, if the data are missing at random – i.e. the probability of missing data or dropout is independent of the data conditional on observed covariates – then we could base our inferences on just the observed data. But this is often not the case, so what do we do in these circumstances?


If we have a full set of data Y and a set of indicators for whether each observation is missing R, plus some parameters \theta and \phi, then we can factorise their joint distribution, f(Y,R;\theta,\phi) in three ways:

Selection model


Perhaps most familiar to econometricians, this factorisation involves the marginal distribution of the full data and the conditional distribution of missingness given the data. The Heckman selection model is an example of this factorisation. For example, one could specify a probit model for dropout and a normally distributed outcome, and then the full likelihood would involve the product of the two.

Pattern-mixture model


This approach specifies a marginal distribution for the missingness or dropout mechanism and then the distribution of the data differs according to the type of missingness or dropout. The data are a mixture of different patterns, i.e. distributions. This type of model is implied when non-response is not considered missing data per se, and we’re interested in inferences within each sub-population. For example, when estimating quality of life at a given age, the quality of life of those that have died is not of interest, but their dying can bias the estimates.

Shared parameter model


Now, the final way we can model these data posits unobserved variables, \alpha, conditional on which Y and R are independent. These models are most appropriate when the dropout or missingness is attributable to some underlying process changing over time, such as disease progression or household attitudes, or an unobserved variable, such as health status.

At the simplest level, one could consider two separate models with correlated random effects, for example, adding in covariates x and having a linear mixed model and probit selection model for person i at time t

Y_{it} = x_{it}'\theta + \alpha_{1,i} + u_{it}

R_{it} = \Phi(x_{it}'\theta + \alpha_{2,i})

(\alpha_{1,i},\alpha_{2,i}) \sim MVN(0,\Sigma) and u_{it} \sim N(0,\sigma^2)

so that the random effects are multivariate normally distributed.

A more complex and flexible specification for longitudinal settings would permit the random effects to vary over time, differently between models and individuals:

Y_{i}(t) = x_{i}(t)'\theta + z_{1,i} (t)\alpha_i + u_{it}

R_{i}(t) = G(x_{i}'\theta + z_{2,i} (t)\alpha_i)

\alpha_i \sim h(.) and u_{it} \sim N(0,\sigma^2)

As an example, if time were discrete in this model then z_{1,i} could be a series of parameters for each time period z_{1,i} = [\lambda_1,\lambda_2,...,\lambda_T], what are often referred to as ‘factor loadings’ in the structural equation modelling literature. We will run up against identifiability problems with these more complex models. For example, if the random effect was normally distributed i.e. \alpha_i \sim N(0,\sigma^2_\alpha) then we could multiply each factor loading by \rho and then \alpha_i \sim N(0,\sigma^2_\alpha / \rho^2) would give us an equivalent model. So, we would have to put restrictions on the parameters. We can set the variance of the random effect to be one, i.e. \alpha_i \sim N(0,1). We can also set one of the factor loadings to zero, without loss of generality, i.e. z_{1,i} = [0,...,\lambda_T].

The distributional assumptions about the random effects can have potentially large effects on the resulting inferences. It is possible therefore to non-parametrically model these as well – e.g. using a mixture distribution. Ultimately, these models are a useful method to deal with data that are missing not at random, such as informative dropout from panel studies.


Estimation can be tricky with these models given the need to integrate out the random effects. For frequentist inferences, expectation maximisation (EM) is one way of estimating these models, but as far as I’m aware the algorithm would have to be coded for the problem specifically in Stata or R. An alternative is using some kind of quadrature based method. The Stata package stjm fits shared parameter models for longitudinal and survival data, with similar specifications to those above.

Otherwise, Bayesian tools, such as Hamiltonian Monte Carlo, may have more luck dealing with the more complex models. For the simpler correlated random effects specification specified above one can use the stan_mvmer command in the rstanarm package. For more complex models, one would need to code the model in something like Stan.


For a health economics specific discussion of these types of models, one can look to the chapter Latent Factor and Latent Class Models to Accommodate Heterogeneity, Using Structural Equation in the Encyclopedia of Health Economics, although shared parameter models only get a brief mention. However, given that that book is currently on sale for £1,000, it may be beyond the wallet of the average researcher! Some health-related applications may be more helpful. Vonesh et al. (2011) used shared parameter models to look at the effects of diet and blood pressure control on renal disease progression. Wu and others (2011) look at how to model the effects of a ‘concomitant intervention’, which is one applied when a patient’s health status deteriorates and so is confounded with health, using shared parameter models. And, Baghfalaki and colleagues (2017) examine heterogeneous random effect specification for shared parameter models and apply this to HIV data.


The trouble with estimating neighbourhood effects, part 2

When we think of the causal effect of living in one neighbourhood compared to another we think of how the social interactions and lifestyle of that area produce better outcomes. Does living in an area with more obese people cause me to become fatter? (Quite possibly). Or, if a family moves to an area where people earn more will they earn more? (Read on).

In a previous post, we discussed such effects in the context of slums, where the synergy of poor water and sanitation, low quality housing, small incomes, and high population density likely has a negative effect on residents’ health. However, we also discussed how difficult it is to estimate neighbourhood effects empirically for a number of reasons. On top of this, are the different ways neighbourhood effects can manifest. Social interactions may mean behaviours that lead to better health or incomes rub off on one another. But also there may be some underlying cause of the group’s, and hence each individual’s, outcomes. In the slum, low education may mean poor hygiene habits spread, or the shared environment may contain pathogens, for example. Both of these pathways may constitute a neighbourhood effect, but both imply very different explanations and potential policy remedies.

What should we make then of, not one, but two new articles by Raj Chetty and Nathaniel Henderen in the recent issue of Quarterly Journal of Economics? Both of which use observational data to estimate neighbourhood effects.

Paper 1: The Impacts of Neighborhoods on Intergenerational Mobility I: Childhood Exposure Effects.

The authors have an impressive data set. They use federal tax records from the US between 1996 and 2012 and identify all children born between 1980 and 1988 and their parents (or parent). For each of these family units they determine household income and then the income of the children when they are older. To summarise a rather long exegesis of the methods used, I’ll try to describe the principle finding in one sentence:

Among families moving between commuting zones in the US, the average income percentile of children at age 26 is 0.04 percentile points higher per year spent and per additional percentile point increase in the average income percentile of the children of permanent residents at age 26 in the destination where the family move to. (Phew!)

They interpret this as the outcomes of in-migrating children ‘converging’ to the outcomes of permanently resident children at a rate of 4% per year. That should provide an idea of how the outcomes and treatments were defined, and who constituted the sample. The paper makes the assumption that the effect is the same regardless of the age of the child. Or to perhaps make it a bit clearer, the claim can be interpreted as that human capital, H, does something like this (ignoring growth over childhood due to schooling etc.):


where ‘good’ and ‘bad’ mean ‘good neighbourhood’ and ‘bad neighbourhood’. This could be called the better neighbourhoods cause you to do better hypothesis.

The analyses also take account of parental income at the time of the move and looks at families who moved due to a natural disaster or other ‘exogenous’ shock. The different analyses generally support the original estimate putting the result in the region of 0.03 to 0.05 percentile points.

But are these neighbourhood effects?

A different way of interpreting these results is that there is an underlying effect driving incomes in each area. Areas with higher incomes for their children in the future are those that have a higher market price for labour in the future. So we could imagine that this is what is going on with human capital instead:


This is the those moving to areas where people will earn more in the future, also earn more in the future because of differences in the labour market hypothesis. The Bureau of Labour Statistics, for example, cites the wage rate for a registered nurse as $22.61 in Iowa and $36.13 in California. But we can’t say from the data whether the children are sorting into different occupations or are getting paid different amounts for the same occupations.

The reflection problem

Manksi (1993) called the issue the ‘reflection problem’, which he described as arising when

a researcher observes the distribution of a behaviour in a population and wishes to infer whether the average behaviour in some group influences the behaviour of the individuals that compose the group.

What we have here is a linear-in-means model estimating the effect of average incomes on individual incomes. But what we cannot distinguish between is the competing explanations of, what Manski called, endogenous effects that result from the interaction  with families with higher incomes, and correlated effects that lead to similar outcomes due to exposure to the same underlying latent forces, i.e. the market. We could also add contextual effects that manifest due to shared group characteristics (e.g. levels of schooling or experience). When we think of a ‘neighbourhood effect’ I tend to think of them as of the endogenous variety, i.e. the direct effects of living in a certain neighbourhood. For example, under different labour market conditions, both my income and the average income of the permanent residents of the neighbourhood I move to might be lower, but not because of the neighbourhood.

The third hypothesis

There’s also the third hypothesis, families that are better off move to better areas (i.e. effects are accounted for by unobserved family differences):


The paper presents lots of modifications to the baseline model, but none of them can provide an exogenous choice of destination. They look at an exogenous cause of moving – natural disasters – and also instrument with the expected difference in income percentiles for parents from the same zip code, but I can’t see how this instrument is valid. Selection bias is acknowledged in the paper but without some exogenous variation in where a family moves to it’ll be difficult to really claim to have identified a causal effect. The choice to move is in the vast majority of family’s cases based on preferences over welfare and well-being, especially income. Indeed, why would a family move to a worse off area unless their circumstances demanded it of them? So in reality, I would imagine the truth would lie somewhere in between these three explanations.

Robust analysis?

As a slight detour, we might want to consider if these are causal effects, even if the underlying assumptions hold. The paper presents a range of analyses to show that the results are robust. But these analyses represent just a handful of those possible. Given that the key finding is relatively small in magnitude, one wonders what would have happened under different scenarios and choices – the so-called garden of forking paths problem. To illustrate, consider some of the choices that were made about the data and models, and all the possible alternative choices. The sample included only those with a mean positive income between 1996 to 2004 and those living in commuter zones with populations of over 250,000 in the 2000 census. Those whose income was missing were assigned a value of zero. Average income over 1996 to 2000 is a proxy for lifetime income. If the marital status of the parents changed then the child was assigned to the mother’s location. Non-filers were coded as single. Income is measured in percentile ranks and not dollar terms. The authors justify each of the choices, but an equally valid analysis would have resulted from different choices and possibly produced very different results.


Paper 2The Impacts of Neighborhoods on Intergenerational Mobility II: County-Level Estimates

The strategy of this paper is much like the first one, except that rather than trying to estimate the average effect of moving to higher or lower income areas, they try to estimate the effect of moving to each of 3,000 counties in the US. To do this they assume that the number of years exposure to the county is as good as random after taking account of i) origin fixed effects, ii) parental income percentile, and iii) a quadratic function of birth cohort year and parental income percentile to try and control for some differences in labour market conditions. An even stronger assumption than before! The hierarchical model is estimated using some complex two-step method for ‘computational tractability’ (I’d have just used a Bayesian estimator). There’s some further strange calculations, like conversion from percentile ranks into dollar terms by regressing the dollar amounts on average income ranks and multiplying everything by the coefficient, rather than just estimating the model with dollars as the outcome (I suspect it’s to do with their complicated estimation strategy). Nevertheless, we are presented with some (noisy) county-level estimates of the effect of an additional year spent there in childhood. There is a weak correlation with the income ranks of permanent residents. Again, though, we have the issue of many competing explanations for the observed effects.

The differences in predicted causal effect by county don’t help distinguish between our hypotheses. Consider this figure:


Do children of poorer parents in the Southern states end up with lower human capital and lower-skilled jobs than in the Midwest? Or does the market mean that people get paid less for the same job in the South? Compare the map above to the maps below showing wage rates of two common lower-skilled professions, cashiers (right) or teaching assistants (left):

A similar pattern is seen. While this is obviously just a correlation, one suspects that such variation in wages is not being driven by large differences in human capital generated through personal interaction with higher earning individuals. This is also without taking into account any differences in purchasing power between geographic areas.

What can we conclude?

I’ve only discussed a fraction of the contents of these two enormous papers. The contents could fill many more blog posts to come. But it all hinges on whether we can interpret the results as the average causal effect of a person moving to a given place. Not nearly enough information is given to know whether families moving to areas with lower future incomes are comparable to those with higher future incomes. Also, we could easily imagine a world where the same people were all induced to move to different areas – this might produce completely different sets of neighbourhood effects since they themselves contribute to those effects. But I feel that the greatest issue is the reflection problem. Even random assignment won’t get around this. This is not to discount the value or interest these papers generate, but I can’t help but feel too much time is devoted to trying to convince the reader of a ‘causal effect’. A detailed exploration of the relationships in the data between parental incomes, average incomes, spatial variation, later life outcomes, and so forth, might have been more useful for generating understanding and future analyses. Perhaps sometimes in economics we spend too long obsessing over estimating unconvincing ‘causal effects’ and ‘quasi-experimental’ studies that really aren’t and forget the value of just a good exploration of data with some nice plots.


Image credits:

Chris Sampson’s journal round-up for 2nd July 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Choice in the presence of experts: the role of general practitioners in patients’ hospital choice. Journal of Health Economics [PubMed] [RePEc] Published 26th June 2018

In the UK, patients are in principle free to choose which hospital they use for elective procedures. However, as these choices operate through a GP referral, the extent to which the choice is ‘free’ is limited. The choice set is provided by the GP and thus there are two decision-makers. It’s a classic example of the principal-agent relationship. What’s best for the patient and what’s best for the local health care budget might not align. The focus of this study is on the applied importance of this dynamic and the idea that econometric studies that ignore it – by looking only at patient decision-making or only at GP decision-making – may give bias estimates. The author outlines a two-stage model for the choice process that takes place. Hospital characteristics can affect choices in three ways: i) by only influencing the choice set that the GP presents to the patient, e.g. hospital quality, ii) by only influencing the patient’s choice from the set, e.g. hospital amenities, and iii) by influencing both, e.g. waiting times. The study uses Hospital Episode Statistics for 30,000 hip replacements that took place in 2011/12, referred by 4,721 GPs to 168 hospitals, to examine revealed preferences. The choice set for each patient is not observed, so a key assumption is that all hospitals to which a GP made referrals in the period are included in the choice set presented to patients. The main findings are that both GPs and patients are influenced primarily by distance. GPs are influenced by hospital quality and the budget impact of referrals, while distance and waiting times explain patient choices. For patients, parking spaces seem to be more important than mortality ratios. The results support the notion that patients defer to GPs in assessing quality. In places, it’s difficult to follow what the author did and why they did it. But in essence, the author is looking for (and in most cases finding) reasons not to ignore GPs’ preselection of choice sets when conducting econometric analyses involving patient choice. Econometricians should take note. And policymakers should be asking whether freedom of choice is sensible when patients prioritise parking and when variable GP incentives could give rise to heterogeneous standards of care.

Using evidence from randomised controlled trials in economic models: what information is relevant and is there a minimum amount of sample data required to make decisions? PharmacoEconomics [PubMed] Published 20th June 2018

You’re probably aware of the classic ‘irrelevance of inference’ argument. Statistical significance is irrelevant in deciding whether or not to fund a health technology, because we ought to do whatever we expect to be best on average. This new paper argues the case for irrelevance in other domains, namely multiplicity (e.g. multiple testing) and sample size. With a primer on hypothesis testing, the author sets out the regulatory perspective. Multiplicity inflates the chance of a type I error, so regulators worry about it. That’s why triallists often obsess over primary outcomes (and avoiding multiplicity). But when we build decision models, we rely on all sorts of outcomes from all sorts of studies, and QALYs are never the primary outcome. So what does this mean for reimbursement decision-making? Reimbursement is based on expected net benefit as derived using decision models, which are Bayesian by definition. Within a Bayesian framework of probabilistic sensitivity analysis, data for relevant parameters should never be disregarded on the basis of the status of their collection in a trial, and it is up to the analyst to properly specify a model that properly accounts for the effects of multiplicity and other sources of uncertainty. The author outlines how this operates in three settings: i) estimating treatment effects for rare events, ii) the number of trials available for a meta-analysis, and iii) the estimation of population mean overall survival. It isn’t so much that multiplicity and sample size are irrelevant, as they could inform the analysis, but rather that no data is too weak for a Bayesian analyst.

Life satisfaction, QALYs, and the monetary value of health. Social Science & Medicine [PubMed] Published 18th June 2018

One of this blog’s first ever posts was on the subject of ‘the well-being valuation approach‘ but, to date, I don’t think we’ve ever covered a study in the round-up that uses this method. In essence, the method is about estimating trade-offs between (for example) income and some measure of subjective well-being, or some health condition, in order to estimate the income equivalence for that state. This study attempts to estimate the (Australian) dollar value of QALYs, as measured using the SF-6D. Thus, the study is a rival cousin to the Claxton-esque opportunity cost approach, and a rival sibling to stated preference ‘social value of a QALY’ approaches. The authors are trying to identify a threshold value on the basis of revealed preferences. The analysis is conducted using 14 waves of the Australian HILDA panel, with more than 200,000 person-year responses. A regression model estimates the impact on life satisfaction of income, SF-6D index scores, and the presence of long-term conditions. The authors adopt an instrumental variable approach to try and address the endogeneity of life satisfaction and income, using an indicator of ‘financial worsening’ to approximate an income shock. The estimated value of a QALY is found to be around A$42,000 (~£23,500) over a 2-year period. Over the long-term, it’s higher, at around A$67,000 (~£37,500), because individuals are found to discount money differently to health. The results also demonstrate that individuals are willing to pay around A$2,000 to avoid a long-term condition on top of the value of a QALY. The authors apply their approach to a few examples from the literature to demonstrate the implications of using well-being valuation in the economic evaluation of health care. As with all uses of experienced utility in the health domain, adaptation is a big concern. But a key advantage is that this approach can be easily applied to large sets of survey data, giving powerful results. However, I haven’t quite got my head around how meaningful the results are. SF-6D index values – as used in this study – are generated on the basis of stated preferences. So to what extent are we measuring revealed preferences? And if it’s some combination of stated and revealed preference, how should we interpret willingness to pay values?