Rachel Houten’s journal round-up for 11th November 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A comparison of national guidelines for network meta-analysis. Value in Health [PubMed] Published October 2019

The evolving treatment landscape results in a greater dependence on indirect treatment comparisons to generate estimates of clinical effectiveness, where the current practice has not been compared to the proposed new intervention in a head-to-head trial. This paper is a review of the guidelines of reimbursement bodies for conducting network meta-analyses. Reassuringly, the authors find that it is possible to meet the needs of multiple agencies with one analysis.

The authors assign three categories to the criteria; “assessment and analysis to test assumptions required for a network meta-analysis, presentation and reporting of results, and justification of modelling choices”, with heterogeneity of the included studies highlighted as one of the key elements to be sure to include if prioritisation of the criteria is necessary. I think this is a simple way of thinking about what needs to be presented but the ‘justification’ category, in my experience, is often given less weight than the other two.

This paper is a useful resource for companies submitting to multiple HTA agencies with the requirements of each national body displayed in tables that are easy to navigate. It meets a practical need but doesn’t really go far enough for me. They do signpost to the PRISMA criteria, but I think it would have been really good to think about the purpose of the submission guidelines; to encourage a logical and coherent summary of the approaches taken so the evidence can be evaluated by decision-makers.

Variation in responsiveness to warranted behaviour change among NHS clinicians: novel implementation of change detection methods in longitudinal prescribing data. BMJ [PubMed] Published 2nd October 2019

I really like this paper. Such a lot of work, from all sectors, is devoted to the production of relevant and timely evidence to inform practice, but if the guidance does not become embedded into the real world then its usefulness is limited.

The authors have managed to utilize a HUGE amount of data to identify the real reaction to two pieces of guidance recommending a change in practice in England. The authors used “trend indicator saturation”, which I’m not ashamed to admit I knew nothing about beforehand, but it is explained nicely. Their thoughtful use of the information available to them results in three indicators of response (in this case the deprescribing of two drugs) around when the change occurs, how quickly it occurs, and how much change occurs.

The authors discover variation in response to the recommendations but suggest an application of their methods could be used to generate feedback to clinicians and therefore drive further response. As some primary care practices took a while to embed the guidance change into their prescribing, the paper raises interesting questions as to where the barriers to the adoption of guidance have occurred.

What is next for patient preferences in health technology assessment? A systematic review of the challenges. Value in Health Published November 2019

It may be that patient preferences have a role to play in the uptake of guideline recommendations, as proposed by the authors of my final paper this week. This systematic review, of the literature around embedding patient preferences into HTA decision-making, groups the discussion in the academic literature into five broad areas; conceptual, normative, procedural, methodological, and practical. The authors state that their purpose was not to formulate their own views, merely to present the available literature, but they do a good job of indicating where to find more opinionated literature on this topic.

Methodological issues were the biggest group, with aspects such as the sample selection, internal and external validity of the preferences generated, and the generalisability of the preferences collected from a sample to the entire population. However, in general, the number of topics covered in the literature is vast and varied.

It’s a great summary of the challenges that are faced, and a ranking based on frequency of topic being mentioned in the literature drives the authors proposed next steps. They recommend further research into the incorporation of preferences within or beyond the QALY and the use of multiple-criteria decision analysis as a method of integrating patient preferences into decision-making. I support the need for “a scientifically and valid manner” to integrate patient preferences into HTA decision-making but wonder if we can first learn of what works well and hasn’t worked so well from the attempts of HTA agencies thus far.


Method of the month: constrained randomisation

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is constrained randomisation.


Randomised experimental studies are one of the best ways of estimating the causal effects of an intervention. They have become more and more widely used in economics; Banerjee and Duflo are often credited with popularising them among economists. When done well, randomly assigning a treatment ensures both observable and unobservable factors are independent of treatment status and likely to be balanced between treatment and control units.

Many of the interventions economists are interested in are at a ‘cluster’ level, be it a school, hospital, village, or otherwise. So the appropriate experimental design would be a cluster randomised controlled trial (cRCT), in which the clusters are randomised to treatment or control and individuals within each cluster are observed either cross-sectionally or longitudinally. But, except in cases of large budgets, the number of clusters participating can be fairly small. When randomising a relatively small number of clusters we could by chance end up with a quite severe imbalance in key covariates between trial arms. This presents a problem if we suspect a priori that these covariates have an influence on key outcomes.

One solution to the problem of potential imbalance is covariate-based constrained randomisation. The principle here is to conduct a large number of randomisations, assess the balance of covariates in each one using some balance metric, and then to randomly choose one of the most balanced according to this metric. This method preserves the important random treatment assignment while ensuring covariate balance. Stratified randomisation also has a similar goal, but in many cases may not be possible if there are continuous covariates of interest or too few clusters to distribute among many strata.


Conducting covariate constrained randomisation is straightforward and involves the following steps:

  1. Specifying the important baseline covariates to balance the clusters on. For each cluster j we have L covariates x_{il}; l=1,...L.
  2. Characterising each cluster in terms of these covariates, i.e. creating the x_{il}.
  3. Enumerating all potential randomisation schemes or simulating a large number of them. For each one, we will need to measure the balance of the x_{il} between trial arms.
  4. Selecting a candidate set of randomisation schemes that are sufficiently balanced according to some pre-specified criterion from which we can randomly choose our treatment allocation.

Balance scores

A key ingredient in the above steps is the balance score. This score needs to be some univariate measure of potentially multivariate imbalance between two (or more) groups. A commonly used score is that proposed by Raab and Butcher:

\sum_{l=1}^{L} \omega_l (\bar{x}_{1l}-\bar{x}_{0l})^2

where \bar{x}_{1l} and \bar{x}_{0l} are the mean values of covariate l in the treatment and control groups respectively, and \omega_l is some weight, which is often the inverse standard deviation of the covariate. Conceptually the score is a sum of standardised differences in means, so lower values indicate greater balance. But other scores would also work. Indeed, any statistic that measures the distance between the distributions of two variables would work and could be summed up over the covariates. This could include the maximum distance:

max_l |x_{1l} - x_{0l}|

the Manhattan distance:

\sum_{l=1}^{L} |x_{1l}-x_{0l}|

or even the Symmetrised Bayesian Kullback-Leibler divergence (I can’t be bothered to type this one out). Grischott has developed a Shiny application to estimate all these distances in a constrained randomisation framework, detailed in this paper.

Things become more complex if there are more than two trial arms. All of the above scores are only able to compare two groups. However, there already exist a number of univariate measures of multivariate balance in the form of MANOVA (multivariate analysis of variance) test statistics. For example, if we have G trial arms and let X_{jg} = \left[ x_{jg1},...,x_{jgL} \right]' then the between group covariance matrix is:

B = \sum_{g=1}^G N_g(\bar{X}_{.g} - \bar{X}_{..})(\bar{X}_{.g} - \bar{X}_{..})'

and the within group covariance matrix is:

W = \sum_{g=1}^G \sum_{j=1}^{N_g} (X_{jg}-\bar{X}_{.g})(X_{jg}-\bar{X}_{.g})'

which we can use in a variety of statistics including Wilks’ Lambda, for example:

\Lambda = \frac{det(W)}{det(W+B)}

No trial has previously used covariate constrained randomisation with multiple groups, as far as I am aware, but this is the subject of an ongoing paper investigating these scores – so watch this space!

Once the scores have been calculated for all possible schemes or a very large number of possible schemes, we select from among those which are most balanced. The most balanced are defined according to some quantile of the balance score, say the top 15%.

As a simple simulated example of how this might be coded in R, let’s consider a trial of 8 clusters with two standard-normally distributed covariates. We’ll use the Raab and Butcher score from above:

#simulate the covariates
n <- 8
x1 <- rnorm(n)
x2 <- rnorm(n)
x <- matrix(c(x1,x2),ncol=2)
#enumerate all possible schemes - you'll need the partitions package here
schemes <- partitions::setparts(c(n/2,n/2))
#write a function that will estimate the score
#for each scheme which we can apply over our
#set of schemes
balance_score <- function(scheme,covs){
treat.idx <- I(scheme==2)
control.idx <- I(scheme==1)
treat.means <- apply(covs[treat.idx,],2,mean)
control.means <- apply(covs[control.idx,],2,mean)
cov.sds <- apply(covs,2,sd)
#Raab-butcher score
score <- sum((treat.means - control.means)^2/cov.sds)
#apply the function
scores <- apply(schemes,2,function(i)balance_score(i,x))
#find top 15% of schemes (lowest scores)
scheme.set <- which(scores <= quantile(scores,0.15))
#choose one at random
scheme.number <- sample(scheme.set,1)
scheme.chosen <- schemes[,scheme.number]


A commonly used method of cluster trial analysis is by estimating a mixed-model, i.e. a hierarchical model with cluster-level random effects. Two key questions are whether to control for the covariates used in the randomisation, and which test to use for treatment effects. Fan Li has two great papers answering these questions for linear models and binomial models. One key conclusion is that the appropriate type I error rates are only achieved in models adjusted for the covariates used in the randomisation. For non-linear models type I error rates can be way off for many estimators especially with small numbers of clusters, which is often the reason for doing constrained randomisation in the first place, so a careful choice is needed here. I would recommend adjusted permutation tests if in doubt to ensure the appropriate type I error rates. Of course, one could take a Bayesian approach to analysis, although there is no analysis that I’m aware of, of the performance of these models for these analyses (another case of “watch this space!”).


There are many trials that used this procedure and listing even a fraction would be a daunting task. But I would be remiss for not noting a trial of my own that uses covariate constrained randomisation. It is investigating the effect of providing an incentive to small and medium sized enterprises to adhere to a workplace well-being programme. There are good applications used as examples in Fan Li’s papers mentioned above. A trial that featured in a journal round-up in February used covariate constrained randomisation to balance a very small number of clusters in a trial of a medicines access programme in Kenya.


Method of the month: Eye-tracking

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is eye-tracking.


Eye-tracking methods can be used to analyse how individuals acquire information and how they make decisions. The method has been extensively used by psychologists in a variety of applications, from identifying cases of dyslexia in children to testing aviation pilots’ awareness. It was made popular by Keith Rayner, but its growing use reflects changes in the availability and affordability of technology. The textbook ‘Eye Tracking: A Comprehensive Guide to Methods and Measures’ provides a great introduction and complements the course offered by Lund University.

Eye-tracking analyses typically depend on the ‘eye-mind hypothesis’ which states “there is no appreciable lag between what is fixated on and what is processed”. In addition to fixing their gaze, individuals make rapid eye movements called ‘saccades’ when they are searching for information or items of interest. There is also research into ‘pupilometery’ which relates pupil size to cognitive burden, where ‘hard’ tasks are hypothesised to cause dilation.

There is a growing interest in using quantitative elicitation methods to understand individuals’ preferences for healthcare goods or services. Quantitative methods, such as the standard gamble, time trade-off, contingent valuation or discrete choice experiments, often employ surveys which are increasingly self-completed and administered online. These valuation methods are often underpinned by economic theories either for utility (random utility theory, expected utility theory) or, in the case of attribute-based approaches, Lancaster’s Theory of consumer demand. If respondents do not answer in line with the supporting theories, the valuations derived from the survey data may be biased. Therefore, various approaches have been employed to understand whether people complete surveys in line with the analysts’ expectations – from restricted models to test for attribute non-attendance to qualitative ‘think-aloud’ interviews. Eye-tracking offers an alternative method of testing these hypotheses.


The research question will dictate the eye-tracking study design, if it’s a reading study or a survey then accuracy will be key. If the experiment seeks to understand how participants respond to large visual stimuli, then it may be preferable to have comfortable equipment which can be used in a setting the participant is familiar with.


In its most basic form, eye-tracking research has involved researcher-individual observation of a participants’ eyes and manual notes on pupil dilation. However, more sophisticated methods have since developed in line with changes to, and availability of, technology. In the 1950s, magnetic search coils were used to track people’s eye movements which involved placing two coils on the eye, with one circling the iris on a contact lens. Nowadays, most eye-tracking involves less invasive equipment, commonly with a camera recording data on a computer and complex algorithms to calculate the location of the individual’s gaze.

To track eyes, almost all modern devices record the corneal reflection on a camera positioned towards the individual’s pupil. The corneal reflection is a glint, usually in the iris, which allows the machine to calculate the direction of the gaze using the distance from 1) the camera to the eye and; 2) the eye to the screen. From the corneal reflection, the X and Y (horizontal and vertical) coordinates, which provide the location of current focus on the screen, are then recorded. The number of times this is logged a second is referred to as the speed (or ‘frequency’) of the tracker. As the eye moves from one position to another, the magnitude of the movement is measured in visual degrees (θ), rather than millimetres, as studies may involve moving stimulus and so the distance between eye and object could change.

Popular manufacturers include Tobii, SensoMotoric Instruments (SMI) and SR Research. Eye-trackers are usually distinguished by their speed and, as a general rule, a ‘good’ eye-tracker has a high frequency and high-resolution camera. A higher frequency allows a more accurate estimation of the fixation duration, as the start of the fixation is revealed earlier and the end revealed later. There is some consensus that a sampling frequency of 500 Hz is sufficiently powerful to accurately determine fixations and saccades. Another determinant of a ‘good’ eye-tracking device is its ‘latency’, which is the time taken for the computer to make a recording. A substantial volume of processing from the headset to screen to recording is required and, for some devices, there is a measurable delay in this process.

There are three broad categories of modern eye-tracking devices.


Head-mounted eye-trackers, such as smart glasses or helmet cameras, offer participants some freedom but are harder to calibrate and can be cumbersome to wear. These eye-trackers are often used to understand how objects are attended to in a dynamic situation, for example, whilst the participant is engaged in a shopping activity.


Remote eye-trackers let the participant move freely but in front of a screen, with algorithms used to detect non-eye movements [PDF]. However, the additional calculations to distinguish head and eye movements are a burden to the processing capacity of the computer and, generally, result in a lower frequency and, as a consequence, have decreased precision.

Head-supported towers

Head-supported towers involve the use of a forehead and chin-rest. Whilst being contactless, these can be uncomfortable and unnatural for some participants. These devices are also often immobile, due to their heavy processing power, and require stability of the head because of their high frequency. However, head-supported towers are the most accurate and precise equipment available for researchers. For studies where the individual is not required to move and the stimuli are stationary (such as a survey), a head-supported tower eye-tracker provides the best quality data. Head-supported towers also offer the most accurate recording of pupil size.

Data collection

Data collection will likely occur in a university lab if a head-supported tower tracker is chosen. Head-mounted and remote trackers are generally mobile and can, therefore, travel to people of interest.

Tracking devices can either record both eyes (binocular) or a single eye (monocular). When both eyes are recorded, an average of the horizontal and vertical coordinates from each eye are taken. However, most people generally have an ‘active’ and ‘lazy’ eye and literature suggests that the active, dominant eye should only be tracked. If a participant performs poorly in the calibration, then an alternative eye should be tried.

It is crucial that the eye-tracker is calibrated for each individual to ensure the eye-tracker is recording correctly. The calibration procedure involves collecting fixation data from simple points on the screen in order to ascertain the true gaze position of the individual before the experiment begins. The points are often shown as dots or crosses which move around the screen whilst fixation data are collected. A test of the calibration can be conducted by re-running the sequence and comparing the secondary fixations to the tracker’s prediction based on the first calibration data.

The calibration should involve points in all corners of the screen to ensure that the tracker is able to record in all areas. In the corners and edges, the corneal reflection can disappear, which therefore invalidates the computer’s calculations as well as resulting in missing data. Similarly, for individuals with visual aids (glasses, contact lenses) or heavy eye-makeup, the far corners can often induce another reflection which may confuse the recording and create anomalous data.

If a respondent is completing a survey, between-page calibration called ‘drift correction’ can also be completed. In this procedure, a small dot is presented in the centre of the screen and the next page appears once the participant has focussed on the spot. If there has been too much movement, the experiment will not progress and the tracker must be recalibrated.

Data analysis

Saccades are easily identifiable as the eye moves quickly in response to or in search of visual ‘stimuli’ or objects of interest. Saccadic behaviour rarely indicates information processing as the movements are so rapid that the brain is unable to consciously realise everything that is scanned, a process known as ‘saccadic suppression’. Instead, saccades most often represent a search for information. Saccades are distinctly different to ‘micro-saccades’, which are involuntary movements whilst an individual is attempting to fixate, and the involuntary movements which occur when an individual blinks. Blinks are quite easily identifiable from regular saccades as they are immediately followed by a missing pupil image on the camera as the eyelid closes.

What constitutes a fixation varies from study to study and is dependent on the stimulus presented. For example, a familiar picture may be processed quicker than text, and a new diagram may be somewhere in between. Although complex algorithms exist for the identification of fixations in eye-tracking data, most studies define a threshold for a fixation as less than one degree of movement (a measure of distance) for between 50 to 200 milliseconds. Aggregation of the total time spent fixating, including recurrent fixations, is defined as the ‘dwell time’ to a stimulus.

Eye-tracking data provide a highly detailed record of all the locations that a user has looked at, so reducing these data to a level that can be easily analysed is challenging. One common approach in the analysis of eye-tracking data involves segmenting coordinates to defined regions or ‘areas of interest’ (AOI). AOI can be defined either prior to the experiment or post-experimentally once eye-movement data have been collected.

Another approach to reducing the data is the generation of a ‘scan path’ describing the overall sequence of movements in terms of both saccades and fixations of a respondent, either imposed on a background image of the stimulus or as a colour-coded heat map.

Pupil size can be more difficult to interpret and analyse. Measurement of the pupil differs by equipment, some use an ellipse whereas others count the number of black pixels on the camera image of the eye. Pupil dilation can be calculated as the difference in pupil size, however, analysing this as a percentage increase can cause inflated estimates when the baseline pupil size is small. Pupil size can also rarely be compared across studies as it highly affected by equipment set-up and setting luminosity.


Many eye-trackers come with manufacturer written software for programming the experiment (such as the EyeLink Experiment Builder). However, access to eye-trackers and related software may be restricted. PsychoPy is an open-source software, written in Python, for eye-tracking (and other neuroscience) experiments. Similarly, data can be analysed either in specialist software such as EyeLink’s Data Viewer (which is useful for creating scan paths) or the data can be exported to other statistical programmes such as Matlab, Stata or R.


Orquin & Loose review how eye-tracking methods have been used in decision-making generally. In health, there are a few examples of survey-based choice experiments which have employed eye-tracking methods to understand more about respondents’ decision-making. Spinks & Mortimer used a remote eye-tracker to identify attribute non-attendance. Krucien et al combined eye-tracking and choice data to model information processing; and Ryan extended this analysis to focus on presentation biases in a subsequent publication. Vass et al use eye-tracking to understand how respondents complete choice experiments and if this differed with the presentation of risk. A forthcoming publication by Rigby et al. will provide an overview of approaches, including eye-tracking, to capturing decision-making in health care choice experiments.