Chris Sampson’s journal round-up for 30th September 2019

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

A need for change! A coding framework for improving transparency in decision modeling. PharmacoEconomics [PubMed] Published 24th September 2019

We’ve featured a few papers in recent round-ups that (I assume) will be included in an upcoming themed issue of PharmacoEconomics on transparency in modelling. It’s shaping up to be a good one. The value of transparency in decision modelling has been recognised, but simply making the stuff visible is not enough – it needs to make sense. The purpose of this paper is to help make that achievable.

The authors highlight that the writing of analyses, including coding, involves personal style and preferences. To aid transparency, we need a systematic framework of conventions that make the inner workings of a model understandable to any (expert) user. The paper describes a framework developed by the Decision Analysis in R for Technologies in Health (DARTH) group. The DARTH framework builds on a set of core model components, generalisable to all cost-effectiveness analyses and model structures. There are five components – i) model inputs, ii) model implementation, iii) model calibration, iv) model validation, and v) analysis – and the paper describes the role of each. Importantly, the analysis component can be divided into several parts relating to, for example, sensitivity analyses and value of information analyses.

Based on this framework, the authors provide recommendations for organising and naming files and on the types of functions and data structures required. The recommendations build on conventions established in other fields and in the use of R generally. The authors recommend the implementation of functions in R, and relate general recommendations to the context of decision modelling. We’re also introduced to unit testing, which will be unfamiliar to most Excel modellers but which can be relatively easily implemented in R. The role of various tools are introduced, including R Studio, R Markdown, Shiny, and GitHub.

The real value of this work lies in the linked R packages and other online material, which you can use to test out the framework and consider its application to whatever modelling problem you might have. The authors provide an example using a basic Sick-Sicker model, which you can have a play with using the DARTH packages. In combination with the online resources, this is a valuable paper that you should have to hand if you’re developing a model in R.

Accounts from developers of generic health state utility instruments explain why they produce different QALYs: a qualitative study. Social Science & Medicine [PubMed] Published 19th September 2019

It’s well known that different preference-based measures of health will generate different health state utility values for the same person. Yet, they continue to be used almost interchangeably. For this study, the authors spoke to people involved in the development of six popular measures: QWB, 15D, HUI, EQ-5D, SF-6D, and AQoL. Their goal was to understand the bases for the development of the measures and to explain why the different measures should give different results.

At least one original developer for each instrument was recruited, along with people involved at later stages of development. Semi-structured interviews were conducted with 15 people, with questions on the background, aims, and criteria for the development of the measure, and on the descriptive system, preference weights, performance, and future development of the instrument.

Five broad topics were identified as being associated with differences in the measures: i) knowledge sources used for conceptualisation, ii) development purposes, iii) interpretations of what makes a ‘good’ instrument, iv) choice of valuation techniques, and v) the context for the development process. The online appendices provide some useful tables that summarise the differences between the measures. The authors distinguish between measures based on ‘objective’ definitions (QWB) and items that people found important (15D). Some prioritised sensitivity (AQoL, 15D), others prioritised validity (HUI, QWB), and several focused on pragmatism (SF-6D, HUI, 15D, EQ-5D). Some instruments had modest goals and opportunistic processes (EQ-5D, SF-6D, HUI), while others had grand goals and purposeful processes (QWB, 15D, AQoL). The use of some measures (EQ-5D, HUI) extended far beyond what the original developers had anticipated. In short, different measures were developed with quite different concepts and purposes in mind, so it’s no surprise that they give different results.

This paper provides some interesting accounts and views on the process of instrument development. It might prove most useful in understanding different measures’ blind spots, which can inform the selection of measures in research, as well as future development priorities.

The emerging social science literature on health technology assessment: a narrative review. Value in Health Published 16th September 2019

Health economics provides a good example of multidisciplinarity, with economists, statisticians, medics, epidemiologists, and plenty of others working together to inform health technology assessment. But I still don’t understand what sociologists are talking about half of the time. Yet, it seems that sociologists and political scientists are busy working on the big questions in HTA, as demonstrated by this paper’s 120 references. So, what are they up to?

This article reports on a narrative review, based on 41 empirical studies. Three broad research themes are identified: i) what drove the establishment and design of HTA bodies? ii) what has been the influence of HTA? and iii) what have been the social and political influences on HTA decisions? Some have argued that HTA is inevitable, while others have argued that there are alternative arrangements. Either way, no two systems are the same and it is not easy to explain differences. It’s important to understand HTA in the context of other social tendencies and trends, and that HTA influences and is influenced by these. The authors provide a substantial discussion on the role of stakeholders in HTA and the potential for some to attempt to game the system. Uncertainty abounds in HTA and this necessarily requires negotiation and acts as a limit on the extent to which HTA can rely on objectivity and rationality.

Something lacking is a critical history of HTA as a discipline and the question of what HTA is actually good for. There’s also not a lot of work out there on culture and values, which contrasts with medical sociology. The authors suggest that sociologists and political scientists could be more closely involved in HTA research projects. I suspect that such a move would be more challenging for the economists than for the sociologists.


Chris Sampson’s journal round-up for 29th August 2016

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Health or happiness? A note on trading off health and happiness in rationing decisions. Value in Health Published 23rd August 2016

Health problems can impact both health and happiness. It seems obvious that individuals would attribute value to the happiness provided by a health technology over and above any health improvement. But what about ‘public’ views? Would people be willing to allocate resources to health care for other people on the basis of the happiness it provides? This study reports on a web-based survey in which 1015 people were asked to make resource allocation choices about groups of patients standing to gain varying degrees of health and/or happiness. Three scenarios were presented – one varying only happiness levels, one varying only health and another varying both. Unfortunately the third scenario was not analysed due to “the many inconsistent choices”. About half of respondents were not willing to make any trade-offs in happiness and health. Those who did make choices attached more weight to health on average. But there were some effects associated with the starting levels of health and happiness – people were less willing to discriminate between groups when starting health (or happiness) was lower, and more weight was given to health. There are a selection of potential biases associated with the responses to the questions, which the authors duly discuss.

Determinants of change in the cost-effectiveness threshold. Medical Decision Making [PubMedPublished 23rd August 2016

Set aside for the moment any theoretical concerns you might have with the ‘threshold’ approach to decision making in health care resource allocation. If we are going to use a willingness to pay threshold, how might it alter over time and in response to particular stimuli? This paper tackles that question using comparative statics and the idea of the ‘cost-effectiveness bookshelf’. If you haven’t come across it before, simply imagine a bookshelf with a book for each technology. The height of the books is determined by the ICER and their width by the budget impact; they’re lined up from shortest to tallest. This paper focuses on the introduction of technologies with ‘marginal’ budget impact, requiring the displacement of one existing technology. But a key idea to remember is that for technologies with large ‘non-marginal’ budget impacts – that is, requiring displacement of more than one existing technology – the threshold will be a weighted average of those technologies that are displaced. The authors describe the impact of changes in 4 different determinants of the threshold: i) the health budget, ii) demand for existing technologies, iii) technical efficiency of existing technologies and iv) funding for new technologies. Some changes (e.g. an increase in the health budget) have unambiguous impacts on the threshold (e.g. to increase it). Others have ambiguous effects – for example a decrease in the cost of a marginal technology might decrease the threshold through reduction of the ICER, or increase the threshold by reducing the budget impact so much that an additional technology could be funded. There’s a nice discussion towards the end about relaxing the assumptions. What if the budget isn’t fixed? What if we aren’t sure we’ve got the books in the right order? The bookshelf analogy is a starting point for these kinds of discussions. The article is an easy read and a good reference point for the threshold debate, even if its practical usefulness may be limited when lining up the NHS’s books seems like a pipedream.

Update to the report of nationally representative values for the noninstitutionalized US adult population for five health-related quality-of-life scores. Value in Health Published 21st August 2016

This paper does what it says on the tin, but it is a useful reference and worth knowing about. The last lot were published in 2006, so this paper is an update to that one using data from 2011. The measures reported are: i) self-rated health, ii) SF-12 mental subscale and (iii) physical subscale, iv) SF-6D and v) Quality of Well-Being Scale. Data come from the Medical Expenditures Panel Survey and the National Health Interview Survey, with 23,906 subjects in the former and 32,242 in the latter. Results are presented by age group (in decades) and by sex. So, for example, we can see that 20-29 year old women reported an average SF-6D index score of 0.809 while for 80-89 year olds the mean was 0.698. For almost all age groups and all measures, men reported higher scores than women. Interestingly, mean SF-6D scores were on average lower than in the 2001 data reported in the previous study.

Use of cost-effectiveness analysis to compare the efficiency of study identification methods in systematic reviews. Systematic Reviews [PubMed] Published 17th August 2016

Health economists have (or at least should have) a bit of a comparative advantage when it comes to economic evaluation. I’ve often thought that we should be leading the way in methods of economic evaluation in economics beyond the subject matter of health, and maybe into other fields. So I was pleased to see this paper using cost-effectiveness analysis for a new purpose. Often, systematic reviews can be mammoth tasks and potentially end up being of little value. Certainly at the margin there are things often done as part of a review (let’s say, including EconLit in a principally clinical review) that in the end prove to be pretty pointless. This study evaluates the cost-effectiveness of 4 alternative approaches to screening titles and abstracts as part of a systematic review. The 4 alternatives are i) ‘double screening’, which is the classic approach used by Cochrane et al whereby two researchers independently review abstracts and then meet to consider disagreements, ii) ‘safety first’, which is a variation on double screening whereby citations are only excludable if both reviewers identify them as such iii) ‘single screening’ with just one reviewer and iv) ‘single screening with text mining’ in which a machine learning process ranks studies by the likelihood of their inclusion. The outcome measure was the number of citations saved from inappropriate exclusion. It’s a big review, starting with 12,477 citations. There wasn’t much in it outcomes-wise, with at most 169 eligible studies and at least 161. But the incremental cost of double screening, compared with single screening plus text mining, was £37,279. This meant an ICER of £4660 per extra study, which seems like a lot. There are some limitations to the study, and the results clearly aren’t generalisable to all reviews. But it’s easy to see how studies-within-studies like this can help guide future research.

Photo credit: Antony Theobald (CC BY-NC-ND 2.0)

The ‘Q’ in the QALY: are we fudging it?

Some recent research from the Centre for Health Economics at Monash University has quantified something that we are all aware of: fudging in the measurement of health-related quality of life. They have found that, on average, randomly changing from one health-related quality of life measure to another changes cost-effectiveness results by 41%. This is clearly huge.

Health-related quality of life?

I am of the opinion that health-related quality of life is not something that, at least in any objective way, actually exists. The extent to which health-related aspects of life affect overall quality of life differs across people, places and time. Discrepancies can become apparent on two levels:

  1. What we perceive as dimensions of health may or may not affect an individual’s subjective level of overall health, and
  2. The relative importance of health in defining overall quality of life, compared with other aspects of life, can vary. This issue has been addressed in relation to adaptation.

These discrepancies translate into an inconsistency in the valuation processes we currently use. The people from whom values are being elicited are seeking to maximise utility (at least, this is what we assume), while the researcher’s chosen maximand is health-related quality of life. This means that any dimension of the chosen measure that can affect non-health-related quality of life will skew the results. As such we end up with a fudge that combines (objective) health characteristics and (subjective) preferences. I believe that, eventually, we will have to settle on a stricter definition of the ‘Q’ in the QALY, and that this will have to be based entirely in either objective heath or (subjective) utility.


An approach to measuring ‘health’ would not be entirely dissimilar to our current approach, but an ‘objective’ health measure would have to be more comprehensive than the EQ-5D, SF-6D, AQoL and other similar measures. Of existing measures, the 15D comes closest. It could include items such as mobility, sensory function, pain, sexual function, fatigue, anxiety, depression and cognition, which the individual may or may not consider dimensions of their health, but which could define health objectively. These would involve a level of subjectivity in that they are being assessed by the individual, but they are less contextual; items such as self-care, emotion, usual activities and relationships, from current measures, are heavily influenced by the context in which they are being completed. The instrument could then be valued using ranking exercises to establish a complete set of health states, ranked from best health to worst health. Dead can be equated to the worst possible health state, as all other outcomes are, in terms of health, an improvement upon death. If all valuations are completed relative to other health states, rather than to ‘death’, much of the distortion of non-health-related considerations will be removed.

I see no reason why the process should involve the elicitation of preferences. A health service does not seek to maximise utility. Evidence-based medicine does not aim to make people happier. Health care – particularly that which is publicly funded – should seek to maximise health, not health-related quality of life. If a person does not wish to improve their own health in a given way, they can choose not to consume that health care (so long as this is not detrimental to the health of others). For example, an individual may choose not to have a cochlear implant if their social network consists largely of deaf people [thanks, Scrubs]. Surely this should be the role of preferences in the allocation of health care.

Quality of life

At the other end of the scale we have a measure of general well-being. In some respects this is the easier approach, though there remains unanswered questions; for example, do we wish to measure present well-being or satisfaction with life? These approaches are simpler insofar as they require only one question such as, ‘how happy are you right now?’ or ‘how satisfied are you with your life overall?’. These questions should be posed to patients. Again, I do not see any benefit of using preferences or capturing decision utility in this case; experienced utility gives a better indication of the impact of health states upon quality of life. This approach could provide us with a measure of utility, so we could implement a cost-utility analysis (which is by no means what we do currently).

The two approaches described here could be used in conjunction. They would provide very different results, as the early findings from Monash demonstrate. A public health service should maintain health as its maximand, but other government departments or private individuals could provide funding for interventions that also benefit people in ways other than their health, or improve the rate at which individuals can derive utility from their health (e.g. education, social housing, social care).

I have little doubt that our current fudging approach – of maximising mainly-but-not-completely-health-related quality of life – is the best thing to do in the meantime, but I suspect it isn’t a long-term solution.

DOI: 10.6084/m9.figshare.1186883