Sam Watson’s journal round-up for 3rd June 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Limits to human life span through extreme value theory. Journal of the American Statistical Association [RePEc] Published 2nd April 2019

The oldest verified person to have ever lived was Jeanne Calment who died in 1997 at the superlative age of 122. No-one else has ever been recorded as living longer than 120, but there have been perhaps a few hundred supercentarians over 110. Whenever someone reaches such a stupendous age, some budding reporter will ask them what the secret was. They will reply that they have stuck to a regimen of three boiled eggs and a glass of scotch every day for 80 years. And this information is of course completely meaningless due to survivorship bias. But as public health and health care improves and with it life expectancy, there remains the question of whether people will ever exceed these extreme ages or whether there is actually a limit to human longevity.

Some studies have attempted to address the question of maximum human longevity by looking at how key biological systems, like getting oxygen to the muscles or vasculature, degrade. They suggest that there would be an upper limit as key systems of the body just cannot last, which is not to say medicine might not find a way to fix or replace them in the future. Another way of addressing this question is to take a purely statistical approach and look at the distribution of the ages of the oldest people alive and try to make inferences about its upper limit. Such an analysis relies on extreme value theory.

There are two types of extreme value data. The first type consists of just the series of maximum values from the distribution. The Fisher-Tippett-Gnedenko theorem shows that these maxima can only be distributed according to one of three distributions. The second type of data are all of the most extreme observations above a certain threshold, and wonderfully there is another triple-barrelled theorem that shows that these data are distributed as a generalised Pareto distribution – the Pickand-Balkema-de Haan theorem. This article makes use of this latter type of data and theorem to estimate: (i) is there an upper limit to the distribution of human life spans? (ii) What is it, if so? And (iii) does it change over time?

The authors use a dataset of the ages of death in days of all Dutch residents who died over the age of 92 between 1986 and 2015. Using these data to estimate the parameters of the generalised Pareto distribution, they find strong evidence to suggest that, statistically at least, it has an upper limit and that this limit is probably around 117-124. Over the years of the study there did not appear to be any change in this limit. This is not to say that it couldn’t change in the future if some new miraculous treatment appeared, but for now, we humans must put up with a short and finite existence.

Infant health care and long-term outcomes. Review of Economics and Statistics [RePEc] Published 13th May 2019

I haven’t covered an article on infant health and economic conditions and longer term outcomes for a while. It used to be that there would be one in every round-up I wrote. I could barely keep up with the literature, which I tried to summarise in a different blog post. Given that it has been a while, I thought I would include a new one. This time we are looking at the effect of mother and child health centres in Norway in the 1930s on the outcomes of adults later in the 20th Century.

Fortunately the health centres were built in different municipalities at different times. The authors note that the “key identifying assumption” is that they were not built at a time related to the health of infants in those areas (well, this and that the model is linear and additive, time trends are linear, etc. etc. something that economists often forget). They don’t go into too much detail on this, but it seems plausible. Another gripe of mine with most empirical economic papers, and indeed in medical and public health fields, is that plotting the data is a secondary concern or doesn’t happen at all. It should be the most important thing. Indeed, in this article much of the discussion can be captured by the figure buried two thirds through. The figure shows that the centres likely led to a big reduction in diarrhoeal disease, probably due to increased rates of breast feeding, but on other outcomes effects are more ambiguous and probably quite small if they exist. Some evidence is provided to suggest that these differences were associated with very modest increases in educational attainment and adult wages. However, a cost-benefit calculation suggests that on the basis of these wage increases the intervention had a annualised rate of return of about 5%.

I should say that this study is well-conducted and fairly solid so any gripes with it are fairly minor. It certainly fits neatly into the wide literature on the topic, and I don’t think anyone would doubt that investing in childhood interventions is likely to have a number of short and long term benefits.

Relationship between poor olfaction and mortality among community-dwelling older adults: a cohort study. Annals of Internal Medicine [PubMed] Published 21st May 2019

I included this last study, not because of any ground-breaking economics or statistics, but because it is interesting. This is one of a number of studies to have looked at the relationship between smell ability and risk of death. These studies have generally found a strong direct relationship between poor olfaction and risk of death in the following years (summarised briefly in this editorial). This study examines a cohort of a couple of thousand older people whose smell was rigourously tested at baseline, among other things. If they died then their death was categorised by a medical examiner into one of four categories: dementia or Parkinson disease, cardiovascular disease, cancer, and respiratory illness.

There was a very strong relationship between poor ability to smell and all-cause death. They found that cumulative risk for death was 46% and 30% higher in persons with a loss of smelling ability at 10 and 13 years respectively. Delving into death by cause, they found that this relationship was most important among those who died of dementia or Parkinson disease, which makes sense as smell is one of the oldest limbic structures and linked to many parts of the brain. Some relationship was seen with cardiovascular disease but not cancer or respiratory illness. They then use a ‘mediation analysis’, i.e. conditioning on post-treatment variables to ‘block’ causal pathways, to identify how much variation is explained and conclude that dementia, Parkinson disease, and weight loss account for about 30% of the observed relationship. However, I am usually suspicious of mediation analyses, and standard arguments would suggest that model parameters would be biased.

Interestingly, olfaction is not normally used as a diagnostic test among the elderly despite sense of smell being one of the strongest predictors of mortality. People do not generally notice their sense of smell waning as it is gradual, so would not likely remark on it to a doctor. Perhaps it is time to start testing it routinely?


Sam Watson’s journal round-up for 27th May 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Spatial interdependence and instrumental variable models. Political Science Research and Methods Published 30th January 2019

Things that are closer to one another are more like one another. This could be the mantra of spatial statistics and econometrics. Countries, people, health outcomes, plants, and so forth can all display some form of spatial correlation. Ignoring these dependencies can have important consequences for model-based data analysis, but what those consequences are depend on how we conceive of the data generating process and model we therefore use. Spatial econometrics and geostatistics both deal with the same kind of empirical problem but do it in different ways. To illustrate this consider an outcome y = [y_1,...,y_n]', for some units (e.g. people, countries, etc.) i at locations l = [l_1,...,l_n]' in some area A \in \mathbb(R)^2. We are interested in the effect of some variable x. The spatial econometric approach is typically to consider that the outcome is “simultaneously” determined along with its neighbours:

y = \beta x + Wy + u

where W is a “connectivity” matrix typically indicating which units are neighbours of one another, and u is a vector of random error terms. If the spatial correlation is ignored then the error term would become v= Wy + u, which would cause the OLS estimator to be biased since x would be correlated with v because of the presence of y.

Contrast this to the model-based geostatistical approach. We assume that there is some underlying, unobserved process S(l) from which we make observations with error:

y = \beta x + S(l) + e

Normally we would model S as a zero-mean Gaussian process, which we’ve described in a previous blog post. As a result, if we don’t condition on S the y are mulivariate-normally distributed, y|x \sim MVN(\beta x, \Sigma). Under this model, OLS is not biased but it is inefficient since our effective sample size is n(tr(\Sigma))/\mathbf{1}^T\Sigma \mathbf{1}, which is less than n.

Another consequence of the spatial econometric model is that an instrumental variable estimator is also biased, particularly if the instrument is also spatially correlated. This article discusses the “spatial 2-stage least squares” estimator, which essentially requires an instrument for both x and Wy. This latter instrument can simply be Wx. The article explores this by re-estimating the models of the well-known paper Revisiting the Resource Curse: Natural Disasters, the Price of Oil, and Democracy.

The spatial econometric approach clearly has limitations compared to the geostatistical approach. The matrix W has to be pre-specified rather than estimated from the data and is usually limited to just allowing a constant correlation between direct neighbours. It would also be very tricky to interpolate outcomes at new places, and also is rarely used to deal with spatially continuous phenomena. However, its simplicity allows for these instrumental variable approaches to be used more simply for estimating average causal effects. Development of causal models within the geostatistical model framework is still an ongoing research question (of mine!).

Methodological challenges when studying distance to care as an exposure in health research. American Journal of Epidemiology [PubMed] Published 20th May 2019

If you read academic articles when you are sufficiently tired, what you think the authors are writing may start to drift from what they are actually writing. I must confess that this is what has happened to me with this article. I spent a good while debating in my head what the authors were saying about using distance to care as an instrument for exposure in health research rather than distance as an exposure itself. Unfortunately, the latter is not nearly as interesting a discussion for me as the former, but given I have run out of time to find another article I’ll try to weave together the two.

Distance is a very strong determinant of which health services, if any, somebody uses. In a place like the UK it may determine which clinic or hospital of many a patient will attend. In poorer settings it may determine whether a patient seeks health care at all. There is thus interest in understanding how distance affects use of services. This article provides a concise discussion of why the causal effect of distance might not be identified in a simple model. For example, observation of a patient depends on their attendance and hence distance so inducing selection bias in our study. The distance from a facility may also be associated with other key determinants like socioeconomic status introducing further confounding. And finally distance can be measured with some error. These issues are illustrated with maternity care in Botswana.

Since distance is such a strong determinant of health service use, it is also widely used as an instrumental variable for use. My very first published paper used it. So the question now to ask is, how do the above-mentioned issues with distance affect its use as an instrument? For the question of selection bias, it depends on the selection mechanism. Consider the standard causal model shown above, where Y is the outcome, X the treatment, Z the instrument, and U the unobserved variable. If selection depends only on Z and/or U then the instrumental variables estimator is unbiased, whereas i selection depends on Y and/or X then it is biased. If distance is correlated with some other factor that also influences Y then it is no longer a valid instrument if we don’t condition on that factor. The typical criticism of distance as an instrument is that it is associated with socioeconomic status. In UK-based studies, we might condition on some deprivation index, like the Index of Multiple Deprivation. But, these indices are not that precise and are averaged across small areas; there is still likely to be heterogeneity in status within areas. It is not possible to say what the extent of this potential bias is, but it could be substantial. Finally, if distance is measured with error then the instrumental variables estimator will be biased (probably).

This concise discussion was mainly about a paper that doesn’t actually exist. But I think it highlights that actually there is a lot to say about distance as an instrument and its potential weaknesses; the imagined paper could certainly materialise. Indeed, in a systematic review of instrumental variable analyses of health service access and use, in which most studies use distance to facility, only a tiny proportion of studies actually consider that distance might be confounded with unobserved variables.


Poor statistical communication means poor statistics

Statistics is a broad and complex field. For a given research question any number of statistical approaches could be taken. In an article published last year, researchers asked 61 analysts to use the same dataset to address the question of whether referees were more likely to give dark skinned players a red card than light skinned players. They got 61 different responses. Each analysis had its advantages and disadvantages and I’m sure each analyst would have defended their work. However, as many statisticians and economists may well know, the merit of an approach is not the only factor that matters in its adoption.

There has, for decades, been criticism about the misunderstanding and misuse of null hypothesis significance testing (NHST). P-values have been a common topic on this blog. Despite this, NHST remains the predominant paradigm for most statistical work. If used appropriately this needn’t be a problem, but if it were being used appropriately it wouldn’t be used nearly as much: p-values can’t perform the inferential role many expect of them. It’s not difficult to understand why things are this way: most published work uses NHST, we teach students NHST in order to understand the published work, students become researchers who use NHST, and so on. Part of statistical education involves teaching the arbitrary conventions that have gone before such as that p-values are ‘significant’ if below 0.05 or a study is ‘adequately powered’ if power is above 80%. One of the most pernicious consequences of this is that these heuristics become a substitute for thinking. The presence of these key figures is expected and their absence often marked by a request from reviewers and other readers for their inclusion.

I have argued on this blog and elsewhere for a wider use of Bayesian methods (and less NHST) and I try to practice what I preach. For an ongoing randomised trial I am involved with, I adopted a Bayesian approach to design and analysis. Instead of the usual power calculation, I conducted a Bayesian assurance analysis (which Anthony O’Hagan has written some good articles on for those wanting more information). I’ll try to summarise the differences between ‘power’ and ‘assurance’ calculations by attempting to define them, which is actually quite hard!

Power calculation. If we were to repeat a trial infinitely many times, what sample size would we need so that in x% of trials the assumed data generating model produces data which would fall in the α% most extreme quantiles of the distribution of data that would be produced from the same data generating model but with one parameter set to exactly zero (or any equivalent hypothesis). Typically we set x%to be 80% (power) and α% to be 5% (statistical significance threshold).

Assurance calculation. For a given data generating model, what sample size do we need so that there is a x% probability that we will be 1-α% certain that the parameter is positive (or any equivalent choice).

The assurance calculation could be reframed in a decision framework as what sample size do we need so that there is a x% probability we will make the right decision about whether a parameter is positive (or any equivalent decision) given the costs of making the wrong decision.

Both of these are complex but I would argue it is the assurance calculation that gives us what we want to know most of the time when designing a trial. The assurance analysis also better represents uncertainty since we specify distributions over all the uncertain parameters rather than choose exact values. Despite this though, the funder of the trial mentioned above, who shall remain nameless, insisted on the results of a power calculation in order to be able to determine whether the trial was worth continuing with because that’s “what they’re used to.”

The main culprit for this issue is, I believe, communication. A simpler explanation with better presentation may have been easier to understand and accept. This is not to say that I do not believe the funder was substituting the heuristic ‘80% or more power = good’ for actually thinking about what we could learn from the trial. But until statisticians, economists, and other data analytic researchers start communicating better, how can we expect others to listen?

Image credit: Geralt