# Sam Watson’s journal round-up for 27th May 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Spatial interdependence and instrumental variable models. Political Science Research and Methods Published 30th January 2019

Things that are closer to one another are more like one another. This could be the mantra of spatial statistics and econometrics. Countries, people, health outcomes, plants, and so forth can all display some form of spatial correlation. Ignoring these dependencies can have important consequences for model-based data analysis, but what those consequences are depend on how we conceive of the data generating process and model we therefore use. Spatial econometrics and geostatistics both deal with the same kind of empirical problem but do it in different ways. To illustrate this consider an outcome $y = [y_1,...,y_n]'$, for some units (e.g. people, countries, etc.) $i$ at locations $l = [l_1,...,l_n]'$ in some area $A \in \mathbb(R)^2$. We are interested in the effect of some variable $x$. The spatial econometric approach is typically to consider that the outcome is “simultaneously” determined along with its neighbours:

$y = \beta x + Wy + u$

where $W$ is a “connectivity” matrix typically indicating which units are neighbours of one another, and $u$ is a vector of random error terms. If the spatial correlation is ignored then the error term would become $v= Wy + u$, which would cause the OLS estimator to be biased since $x$ would be correlated with $v$ because of the presence of $y$.

Contrast this to the model-based geostatistical approach. We assume that there is some underlying, unobserved process $S(l)$ from which we make observations with error:

$y = \beta x + S(l) + e$

Normally we would model $S$ as a zero-mean Gaussian process, which we’ve described in a previous blog post. As a result, if we don’t condition on $S$ the $y$ are mulivariate-normally distributed, $y|x \sim MVN(\beta x, \Sigma)$. Under this model, OLS is not biased but it is inefficient since our effective sample size is $n(tr(\Sigma))/\mathbf{1}^T\Sigma \mathbf{1}$, which is less than $n$.

Another consequence of the spatial econometric model is that an instrumental variable estimator is also biased, particularly if the instrument is also spatially correlated. This article discusses the “spatial 2-stage least squares” estimator, which essentially requires an instrument for both $x$ and $Wy$. This latter instrument can simply be $Wx$. The article explores this by re-estimating the models of the well-known paper Revisiting the Resource Curse: Natural Disasters, the Price of Oil, and Democracy.

The spatial econometric approach clearly has limitations compared to the geostatistical approach. The matrix $W$ has to be pre-specified rather than estimated from the data and is usually limited to just allowing a constant correlation between direct neighbours. It would also be very tricky to interpolate outcomes at new places, and also is rarely used to deal with spatially continuous phenomena. However, its simplicity allows for these instrumental variable approaches to be used more simply for estimating average causal effects. Development of causal models within the geostatistical model framework is still an ongoing research question (of mine!).

Methodological challenges when studying distance to care as an exposure in health research. American Journal of Epidemiology [PubMed] Published 20th May 2019

If you read academic articles when you are sufficiently tired, what you think the authors are writing may start to drift from what they are actually writing. I must confess that this is what has happened to me with this article. I spent a good while debating in my head what the authors were saying about using distance to care as an instrument for exposure in health research rather than distance as an exposure itself. Unfortunately, the latter is not nearly as interesting a discussion for me as the former, but given I have run out of time to find another article I’ll try to weave together the two.

Distance is a very strong determinant of which health services, if any, somebody uses. In a place like the UK it may determine which clinic or hospital of many a patient will attend. In poorer settings it may determine whether a patient seeks health care at all. There is thus interest in understanding how distance affects use of services. This article provides a concise discussion of why the causal effect of distance might not be identified in a simple model. For example, observation of a patient depends on their attendance and hence distance so inducing selection bias in our study. The distance from a facility may also be associated with other key determinants like socioeconomic status introducing further confounding. And finally distance can be measured with some error. These issues are illustrated with maternity care in Botswana.

Since distance is such a strong determinant of health service use, it is also widely used as an instrumental variable for use. My very first published paper used it. So the question now to ask is, how do the above-mentioned issues with distance affect its use as an instrument? For the question of selection bias, it depends on the selection mechanism. Consider the standard causal model shown above, where Y is the outcome, X the treatment, Z the instrument, and U the unobserved variable. If selection depends only on Z and/or U then the instrumental variables estimator is unbiased, whereas i selection depends on Y and/or X then it is biased. If distance is correlated with some other factor that also influences Y then it is no longer a valid instrument if we don’t condition on that factor. The typical criticism of distance as an instrument is that it is associated with socioeconomic status. In UK-based studies, we might condition on some deprivation index, like the Index of Multiple Deprivation. But, these indices are not that precise and are averaged across small areas; there is still likely to be heterogeneity in status within areas. It is not possible to say what the extent of this potential bias is, but it could be substantial. Finally, if distance is measured with error then the instrumental variables estimator will be biased (probably).

This concise discussion was mainly about a paper that doesn’t actually exist. But I think it highlights that actually there is a lot to say about distance as an instrument and its potential weaknesses; the imagined paper could certainly materialise. Indeed, in a systematic review of instrumental variable analyses of health service access and use, in which most studies use distance to facility, only a tiny proportion of studies actually consider that distance might be confounded with unobserved variables.

Credits

## By

• Health economics, statistics, and health services research at the University of Warwick. Also like rock climbing and making noise on the guitar.

This site uses Akismet to reduce spam. Learn how your comment data is processed.