Method of the month: Coding qualitative data

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is coding qualitative data.


Health economists are increasingly stepping away from quantitative datasets and conducting interviews and focus groups, as well as collecting free text responses. Good qualitative analysis requires thought and rigour. In this blog post, I focus on coding of textual data – a fundamental part of analysis in nearly all qualitative studies. Many textbooks deal with this in detail. I have drawn on three in particular in this blog post (and my research): Coast (2017), Miles and Huberman (1994), and Ritchie and Lewis (2003).

Coding involves tagging segments of the text with salient words or short phrases. This assists the researcher with retrieving the data for further analysis and is, in itself, the first stage of analysing the data. Ultimately, the codes will feed into the final themes or model resulting from the research. So the codes – and the way they are applied – are important!


There is no ‘right way’ to code. However, I have increasingly found it useful to think of two phases of coding. First, ‘open coding’, which refers to the initial exploratory process of identifying pertinent phrases and concepts in the data. Second, formal or ‘axial’ coding, involving the application of a clear, pre-specified coding framework consistently across the source material.

Open coding

Any qualitative analysis should start with the researcher being very familiar with both the source material (such as interview transcripts) and the study objectives. This sounds obvious, but it is easy, as a researcher, to get drawn into the narrative of an interview and forget what exactly you are trying to get out of the research and, by extension, the coding. Open coding requires the researcher to go through the text, carefully, line-by-line, tagging segments with a code to denote its meaning. It is important to be inquisitive. What is being said? Does this relate to the research question and, if so, how?

Take, for example, the excerpt below from a speech by the Secretary of State for Health, Jeremy Hunt, on safety and efficiency in the NHS in 2015:

Let’s look at those challenges. And I think we have good news and bad news. If I start with the bad news it is that we face a triple whammy of huge financial pressures because of the deficit that we know we have to tackle as a country, of the ageing population that will mean we have a million more over 70s by 2020, and also of rising consumer expectations, the incredible excitement that people feel when they read about immunotherapy in the newspapers that gives a heart attack to me and Simon Stevens but is very very exciting for the country. The desire for 24/7 access to healthcare. These are expectations that we have to recognise in the NHS but all of these add to a massive pressure on the system.

This excerpt may be analysed, for example, as part of a study into demand pressures on the NHS. And, in this case, codes such as “ageing population” “consumer expectations” “immunotherapy” “24/7 access to healthcare” might initially be identified. However, if the study was investigating the nature of ministerial responsibility for the NHS, one might pull out very different codes, such as “tackle as a country”, “public demands vs. government stewardship” and “minister – chief exec shared responsibility”.

Codes can be anything – attitudes, behaviours, viewpoints – so long as they relate to the research question. It is very useful to get (at least) one other person to also code some of the same source material. Comparing codes will provide new ideas for the coding framework, a different perspective of the meaning of the source material and a check that key sections of the source material have not been missed. Researchers shouldn’t aim to code all (or even most) of the text of a transcript – there is always some redundancy. And, in general, initial codes should be as close to the source text as possible – some interpretation is fine but it is important to not get too abstract too quickly!

Formal or ‘axial’ coding

When the researcher has an initial list of codes, it is a good time to develop a formal coding framework. The aim here is to devise an index of some sort to tag all the data in a logical, systematic and comprehensive way, and in a way that will be useful for further analysis.

One way to start is to chart how the initial codes can be grouped and relate to one another. For example, in analysing NHS demand pressures, a researcher may group “immunotherapy” with other medical innovations mentioned elsewhere in the study. It’s important to avoid having many disconnected codes, and at this stage, many codes will be changed, subdivided, or combined. Much like an index, the resulting codes could be organised into loose chapters (or themes) such as “1. Consumer expectations”, “2. Access” and/or there might be a hierarchical relationship between codes, for example, with codes relating to national and local demand pressures. A proper axial coding framework has categories and sub-categories of codes with interdependencies formally specified.

There is no right number of codes. There could be as few as 10, or as many as 50, or more. It is crucial however that the list of codes are logically organised (not alphabetically listed) and sufficiently concise, so that the researcher can hold them in their head while coding transcripts. Alongside the coding framework itself – which may only be a page – it can be very helpful to put together an explanatory document with more detail on the meaning of each code and possibly some examples.


Once the formal coding framework is finalised it can be applied to the source material. I find this a good stage to use software like Nvivo. While coding in Nvivo takes a similar amount of time to paper-based methods, it can help speed up the process of retrieving and comparing segments of the text later on. Other software packages are available and some researchers prefer to use computer packages earlier in the process or not all – it is a personal choice.

Again, it is a good idea to involve at least one other person. One possibility is for two researchers to apply the framework separately and code the first, say 5 pages of a transcript. Reliability between coders can then be compared, with any discrepancies discussed and used to adjust the coding framework accordingly. The researchers could then repeat the process. Once reliability is at an acceptable level, a researcher should be able to code the transcripts in a much more reproducible way.

Even at this stage, the formal coding framework does not need to be set in stone. If it is based on a subset of interviews, new issues are likely to emerge in subsequent transcripts and these may need to be incorporated. Additionally, analyses may be conducted with sub-samples of participants or the analysis may move from more descriptive to explanatory work, and therefore the coding needs may change.


Published qualitative studies will often mention that transcript data were coded, with few details to discern how this was done. In the study I worked on to develop the ICECAP-A capability measure, we coded to identify influences on quality of life in the first batch of interviews and dimensions of quality of life in later batches of interviews. A recent study into disinvestment decisions highlights how a second rater can be used in coding. Reporting guidelines for qualitative research papers highlight three important items related to coding – number of coders, description of the coding tree (framework), and derivation of the themes – that ought to be included in study write-ups.

Coding qualitative data can feel quite laborious. However, the real benefit of a well organised coding framework comes when reconstituting transcript data under common codes or themes. Codes that relate clearly to the research question, and one another, allow the researcher to reorganise the data with real purpose. Juxtaposing previously unrelated text and quotes sparks the discovery of exciting new links in the data. In turn, this spawns the interpretative work that is the fundamental value of the qualitative analysis. In economics parlance, good coding can improve both the efficiency of retrieving text for analysis and the quality of the analytical output itself.


Chris Sampson’s journal round-up for 8th January 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

An empirical comparison of the measurement properties of the EQ-5D-5L, DEMQOL-U and DEMQOL-Proxy-U for older people in residential care. Quality of Life Research [PubMed] Published 5th January 2018

There is now a condition-specific preference-based measure of health-related quality of life that can be used for people with cognitive impairment: the DEMQOL-U. Beyond the challenge of appropriately defining quality of life in this context, cognitive impairment presents the additional difficulty that individuals may not be able to self-complete a questionnaire. There’s some good evidence that proxy responses can be valid and reliable for people with cognitive impairment. The purpose of this study is to try out the new(ish) EQ-5D-5L in the context of cognitive impairment in a residential setting. Data were taken from an observational study in 17 residential care facilities in Australia. A variety of outcome measures were collected including the EQ-5D-5L (proxy where necessary), a cognitive bolt-on item for the EQ-5D, the DEMQOL-U and the DEMQOL-Proxy-U (from a family member or friend), the Modified Barthel Index, the cognitive impairment Psychogeriatric Assessment Scale (PAS-Cog), and the neuropsychiatric inventory questionnaire (NPI-Q). The researchers tested the correlation, convergent validity, and known-group validity for the various measures. 143 participants self-completed the EQ-5D-5L and DEMQOL-U, while 387 responses were available for the proxy versions. People with a diagnosis of dementia reported higher utility values on the EQ-5D-5L and DEMQOL-U than people without a diagnosis. Correlations between the measures were weak to moderate. Some people reported full health on the EQ-5D-5L despite identifying some impairment on the DEMQOL-U, and some vice versa. The EQ-5D-5L was more strongly correlated with clinical outcome measures than were the DEMQOL-U or DEMQOL-Proxy-U, though the associations were generally weak. The relationship between cognitive impairment and self-completed EQ-5D-5L and DEMQOL-U utilities was not in the expected direction; people with greater cognitive impairment reported higher utility values. There was quite a lot of disagreement between utility values derived from the different measures, so the EQ-5D-5L and DEMQOL-U should not be seen as substitutes. An EQ-QALY is not a DEM-QALY. This is all quite perplexing when it comes to measuring health-related quality of life in people with cognitive impairment. What does it mean if a condition-specific measure does not correlate with the condition? It could be that for people with cognitive impairment the key determinant of their quality of life is only indirectly related to their impairment, and more dependent on their living conditions.

Resolving the “cost-effective but unaffordable” paradox: estimating the health opportunity costs of nonmarginal budget impacts. Value in Health Published 4th January 2018

Back in 2015 (as discussed on this blog), NICE started appraising drugs that were cost-effective but implied such high costs for the NHS that they seemed unaffordable. This forced a consideration of how budget impact should be handled in technology appraisal. But the matter is far from settled and different countries have adopted different approaches. The challenge is to accurately estimate the opportunity cost of an investment, which will depend on the budget impact. A fixed cost-effectiveness threshold isn’t much use. This study builds on York’s earlier work that estimated cost-effectiveness thresholds based on health opportunity costs in the NHS. The researchers attempt to identify cost-effectiveness thresholds that are in accordance with different non-marginal (i.e. large) budget impacts. The idea is that a larger budget impact should imply a lower (i.e. more difficult to satisfy) cost-effectiveness threshold. NHS expenditure data were combined with mortality rates for different disease categories by geographical area. When primary care trusts’ (PCTs) budget allocations change, they transition gradually. This means that – for a period of time – some trusts receive a larger budget than they are expected to need while others receive a smaller budget. The researchers identify these as over-target and under-target accordingly. The expenditure and outcome elasticities associated with changes in the budget are estimated for the different disease groups (defined by programme budgeting categories; PBCs). Expenditure elasticity refers to the change in PBC expenditure given a change in overall NHS expenditure. Outcome elasticity refers to the change in PBC mortality given a change in PBC expenditure. Two econometric approaches are used; an interaction term approach, whereby a subgroup interaction term is used with the expenditure and outcome variables, and a subsample estimation approach, whereby subgroups are analysed separately. Despite the limitations associated with a reduced sample size, the subsample estimation approach is preferred on theoretical grounds. Using this method, under-target PCTs face a cost-per-QALY of £12,047 and over-target PCTs face a cost-per-QALY of £13,464, reflecting diminishing marginal returns. The estimates are used as the basis for identifying a health production function that can approximate the association between budget changes and health opportunity costs. Going back to the motivating example of hepatitis C drugs, a £772 million budget impact would ‘cost’ 61,997 QALYs, rather than the 59,667 that we would expect without accounting for the budget impact. This means that the threshold should be lower (at £12,452 instead of £12,936) for a budget impact of this size. The authors discuss a variety of approaches for ‘smoothing’ the budget impact of such investments. Whether or not you believe the absolute size of the quoted numbers depends on whether you believe the stack of (necessary) assumptions used to reach them. But regardless of that, the authors present an interesting and novel approach to establishing an empirical basis for estimating health opportunity costs when budget impacts are large.

First do no harm – the impact of financial incentives on dental x-rays. Journal of Health Economics [RePEc] Published 30th December 2017

If dentists move from fee-for-service to a salary, or if patients move from co-payment to full exemption, does it influence the frequency of x-rays? That’s the question that the researchers are trying to answer in this study. It’s important because x-rays always present some level of (carcinogenic) risk to patients and should therefore only be used when the benefits are expected to exceed the harms. Financial incentives shouldn’t come into it. If they do, then some dentists aren’t playing by the rules. And that seems to be the case. The authors start out by establishing a theoretical framework for the interaction between patient and dentist, which incorporates the harmful nature of x-rays, dentist remuneration, the patient’s payment arrangements, and the characteristics of each party. This model is used in conjunction with data from NHS Scotland, with 1.3 million treatment claims from 200,000 patients and 3,000 dentists. In 19% of treatments, an x-ray occurs. Some dentists are salaried and some are not, while some people pay charges for treatment and some are exempt. A series of fixed effects models are used to take advantage of these differences in arrangements by modelling the extent to which switches (between arrangements, for patients or dentists) influence the probability of receiving an x-ray. The authors’ preferred model shows that both the dentist’s remuneration arrangement and the patient’s financial status influences the number of x-rays in the direction predicted by the model. That is, fee-for-service and charge exemption results in more x-rays. The combination of these two factors results in a 9.4 percentage point increase in the probability of an x-ray during treatment, relative to salaried dentists with non-exempt patients. While the results do show that financial incentives influence this treatment decision (when they shouldn’t), the authors aren’t able to link the behaviour to patient harm. So we don’t know what percentage of treatments involving x-rays would correspond to the decision rule of benefits exceeding harms. Nevertheless, this is an important piece of work for informing the definition of dentist reimbursement and patient payment mechanisms.


Paul Mitchell’s journal round-up for 1st January 2018

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Does the approach to economic evaluation in health care depend on culture, values and institutional context? European Journal of Health Economics [PubMedPublished 5th December 2017

In last week’s round-up we looked at a paper that attempted to develop guidance for costing across European economic evaluations, even when the guidelines across countries vary as to what should and should not be included in an economic evaluation. Why is it that there is such variation in health economic evaluation methods across countries? Why are economic outcomes like quality-adjusted life years (QALYs) standard practice in some countries yet frowned upon in others? This editorial argues that cultures, values and institutional context play a role in the economic evaluation methodologies applied across countries. It does so by comparing five large European countries in terms of 1. the organisation and governance of the agencies undertaking health technology assessments (HTAs) and economic evaluation, 2. the methods used for economic evaluation, and 3. the use of HTA and economic evaluation in decision making. The authors argue that due to differences in these areas across countries, it is difficult to see how a “one size fits all” economic evaluation framework can be implemented, when health care systems, their regulations and social values towards health care differ. An argument is presented that where greater social value is placed on horizontal equity (equal treatment of equals) over vertical equity (unequal treatment of unequals), the QALY outcome is more likely to be applied in such countries. They argue that of the five largest European countries, the German efficiency frontier model of economic analysis may offer the best off-the-shelf option for countries like the United States who also have similar qualms about the use of QALYs in decision making. However, it may be the case that current economic evaluations lack international application due to other reasons beyond those notable considerations raised in this paper.

Reconciling ethical and economic conceptions of value in health policy using the capabilities approach: a qualitative investigation of Non-Invasive Prenatal Testing. Social Science & Medicine [PubMed] [RePEcPublished 16th November 2017

The capability approach, initially developed by economist and philosopher Amartya Sen, provides an alternative evaluative framework to welfare economics, shifting the focus on individual welfare away from utility and preferences, towards a person’s freedom to do and be valuable things to their life. It has more recently been used as a critique of the current approach to health economic evaluations, specifically what aspects of quality of life are included in the economic outcome, where the current measurement tools used in the generation of QALYs have been argued to have too narrow a focus on health outcomes, with a number of capability measures now developed as alternatives. This study, on the other hand, applies the capability approach to tackle health technologies that pose difficult ethical challenges where standard clinical and economic outcomes used in cost-effectiveness analysis may be in conflict with social values. The authors propose why they think the evaluative framework of the capability approach may be advantageous in such areas, using non-invasive prenatal testing (NIPT), a screening test that analyses cell-free fetal DNA circulating in maternal blood in order to gain information about the fetal genotype, as a case study. The authors propose that adopting a capability evaluative framework in NIPT may account for the enhancement of valuable options available to prospective parents and families, as well as capabilities that may be diminished if NIPT was made routinely available, such as the option of refusing a test as an informed choice. A secondary analysis of qualitative data was conducted on women with experience of NIPT in Canada. Using a constructivist orientation to directed qualitative content analysis, interviews were analysed to see how NIPT related to a pre-existing list of ten Central Human Capabilities developed by philosopher Martha Nussbaum. From the analysis, they found eight of the ten Nussbaum capabilities emerge from the interviewees who were not directly asked to consider capability in the interview. As well as these eight (life; bodily health; bodily integrity; senses, imagination and thought; emotions; practical reason; affiliation; control over one’s environment), a new capability emerged related to care-taking as a result of NIPT, both for potential children and also the impact on existing children. The next challenge for the authors will be trying to formulate their findings into a usable outcome measure for decision-making. However, the analysis undertaken here is a good example of how economists can attempt to tackle the assessment of ethically challenging technologies as a way of dealing with standard economic outcomes that might be considered counter-productive in such evaluations.

Quality of life in a broader perspective: does ASCOT reflect the capability approach? Quality of Life Research [PubMedPublished 14th December 2017

The Adult Social Care Outcomes Toolkit (ASCOT) is a measure developed specifically for the economic assessment of social care interventions in the UK. Although a number of versions of ASCOT have been developed, the most recent version of ASCOT has been argued to be a measure influenced by the capability approach, even though previous versions of the measure were not justified similarly, so it remains to be seen how influential the capability approach is in the composition of this outcome measure. This study attempts to add justification of linking the capability approach with the ASCOT by conducting a literature review on the capability approach to identify key issues of quality of life measurement and how ASCOT deals with these issues. The methods for conducting the literature review are not described in detail in this paper, but the authors state that three primary issues with quality of life measurement in the capability approach literature that emerge from their review are concerned with 1. the measurement of capability, 2. non-reliance on adaptive preferences, and 3. focus on a multidimensional evaluative space. The authors argue that capability measurement is tackled by ASCOT, through the use of “as I want” phraseology at the top level on the ASCOT dimensions. Adaptive preferences are argued to be tackled by the use of general population preferences of different states on ASCOT and the outcome addresses several dimensions of quality of life. I would argue that there is much more to measuring capability beyond these three areas identified by the authors. Although the authors rightly question if the “as I want” phraseology is adequate to measure capability in their conclusion, the other two criteria could equally justify most measures for generating QALYs, so the criteria they use to be a capability measure is set at a very low benchmark. I remain unconvinced about how much of a capability measure ASCOT actually is in practice.