Method of the month: Coding qualitative data

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is coding qualitative data.

Principles

Health economists are increasingly stepping away from quantitative datasets and conducting interviews and focus groups, as well as collecting free text responses. Good qualitative analysis requires thought and rigour. In this blog post, I focus on coding of textual data – a fundamental part of analysis in nearly all qualitative studies. Many textbooks deal with this in detail. I have drawn on three in particular in this blog post (and my research): Coast (2017), Miles and Huberman (1994), and Ritchie and Lewis (2003).

Coding involves tagging segments of the text with salient words or short phrases. This assists the researcher with retrieving the data for further analysis and is, in itself, the first stage of analysing the data. Ultimately, the codes will feed into the final themes or model resulting from the research. So the codes – and the way they are applied – are important!

Implementation

There is no ‘right way’ to code. However, I have increasingly found it useful to think of two phases of coding. First, ‘open coding’, which refers to the initial exploratory process of identifying pertinent phrases and concepts in the data. Second, formal or ‘axial’ coding, involving the application of a clear, pre-specified coding framework consistently across the source material.

Open coding

Any qualitative analysis should start with the researcher being very familiar with both the source material (such as interview transcripts) and the study objectives. This sounds obvious, but it is easy, as a researcher, to get drawn into the narrative of an interview and forget what exactly you are trying to get out of the research and, by extension, the coding. Open coding requires the researcher to go through the text, carefully, line-by-line, tagging segments with a code to denote its meaning. It is important to be inquisitive. What is being said? Does this relate to the research question and, if so, how?

Take, for example, the excerpt below from a speech by the Secretary of State for Health, Jeremy Hunt, on safety and efficiency in the NHS in 2015:

Let’s look at those challenges. And I think we have good news and bad news. If I start with the bad news it is that we face a triple whammy of huge financial pressures because of the deficit that we know we have to tackle as a country, of the ageing population that will mean we have a million more over 70s by 2020, and also of rising consumer expectations, the incredible excitement that people feel when they read about immunotherapy in the newspapers that gives a heart attack to me and Simon Stevens but is very very exciting for the country. The desire for 24/7 access to healthcare. These are expectations that we have to recognise in the NHS but all of these add to a massive pressure on the system.

This excerpt may be analysed, for example, as part of a study into demand pressures on the NHS. And, in this case, codes such as “ageing population” “consumer expectations” “immunotherapy” “24/7 access to healthcare” might initially be identified. However, if the study was investigating the nature of ministerial responsibility for the NHS, one might pull out very different codes, such as “tackle as a country”, “public demands vs. government stewardship” and “minister – chief exec shared responsibility”.

Codes can be anything – attitudes, behaviours, viewpoints – so long as they relate to the research question. It is very useful to get (at least) one other person to also code some of the same source material. Comparing codes will provide new ideas for the coding framework, a different perspective of the meaning of the source material and a check that key sections of the source material have not been missed. Researchers shouldn’t aim to code all (or even most) of the text of a transcript – there is always some redundancy. And, in general, initial codes should be as close to the source text as possible – some interpretation is fine but it is important to not get too abstract too quickly!

Formal or ‘axial’ coding

When the researcher has an initial list of codes, it is a good time to develop a formal coding framework. The aim here is to devise an index of some sort to tag all the data in a logical, systematic and comprehensive way, and in a way that will be useful for further analysis.

One way to start is to chart how the initial codes can be grouped and relate to one another. For example, in analysing NHS demand pressures, a researcher may group “immunotherapy” with other medical innovations mentioned elsewhere in the study. It’s important to avoid having many disconnected codes, and at this stage, many codes will be changed, subdivided, or combined. Much like an index, the resulting codes could be organised into loose chapters (or themes) such as “1. Consumer expectations”, “2. Access” and/or there might be a hierarchical relationship between codes, for example, with codes relating to national and local demand pressures. A proper axial coding framework has categories and sub-categories of codes with interdependencies formally specified.

There is no right number of codes. There could be as few as 10, or as many as 50, or more. It is crucial however that the list of codes are logically organised (not alphabetically listed) and sufficiently concise, so that the researcher can hold them in their head while coding transcripts. Alongside the coding framework itself – which may only be a page – it can be very helpful to put together an explanatory document with more detail on the meaning of each code and possibly some examples.

Software

Once the formal coding framework is finalised it can be applied to the source material. I find this a good stage to use software like Nvivo. While coding in Nvivo takes a similar amount of time to paper-based methods, it can help speed up the process of retrieving and comparing segments of the text later on. Other software packages are available and some researchers prefer to use computer packages earlier in the process or not all – it is a personal choice.

Again, it is a good idea to involve at least one other person. One possibility is for two researchers to apply the framework separately and code the first, say 5 pages of a transcript. Reliability between coders can then be compared, with any discrepancies discussed and used to adjust the coding framework accordingly. The researchers could then repeat the process. Once reliability is at an acceptable level, a researcher should be able to code the transcripts in a much more reproducible way.

Even at this stage, the formal coding framework does not need to be set in stone. If it is based on a subset of interviews, new issues are likely to emerge in subsequent transcripts and these may need to be incorporated. Additionally, analyses may be conducted with sub-samples of participants or the analysis may move from more descriptive to explanatory work, and therefore the coding needs may change.

Applications

Published qualitative studies will often mention that transcript data were coded, with few details to discern how this was done. In the study I worked on to develop the ICECAP-A capability measure, we coded to identify influences on quality of life in the first batch of interviews and dimensions of quality of life in later batches of interviews. A recent study into disinvestment decisions highlights how a second rater can be used in coding. Reporting guidelines for qualitative research papers highlight three important items related to coding – number of coders, description of the coding tree (framework), and derivation of the themes – that ought to be included in study write-ups.

Coding qualitative data can feel quite laborious. However, the real benefit of a well organised coding framework comes when reconstituting transcript data under common codes or themes. Codes that relate clearly to the research question, and one another, allow the researcher to reorganise the data with real purpose. Juxtaposing previously unrelated text and quotes sparks the discovery of exciting new links in the data. In turn, this spawns the interpretative work that is the fundamental value of the qualitative analysis. In economics parlance, good coding can improve both the efficiency of retrieving text for analysis and the quality of the analytical output itself.

Credit

Chris Sampson’s journal round-up for 9th October 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Evaluating the relationship between visual acuity and utilities in patients with diabetic macular edema enrolled in intravitreal aflibercept studies. Investigative Ophthalmology & Visual Science [PubMed] Published October 2017

Part of my day job involves the evaluation of a new type of screening programme for diabetic eye disease, including the use of a decision analytic model. Cost-effectiveness models usually need health state utility values for parameters in order to estimate QALYs. There are some interesting challenges in evaluating health-related quality of life in the context of vision loss; does vision in the best eye or worst eye affect quality of life most; do different eye diseases have different impacts independent of sight loss; do generic preference-based measures even work in this context? This study explores some of these questions. It combines baseline and follow-up EQ-5D and VFQ-UI (a condition-specific preference-based measure) responses from 1,320 patients from 4 different studies, along with visual acuity data. OLS and random effects panel models are used to predict utility values dependent on visual acuity and other individual characteristics. Best-seeing eye seems to be a more important determinant than worst-seeing eye, which supports previous studies. But worst-seeing eye is still important, with about a third of the impact of best-seeing eye. So economic evaluations shouldn’t ignore the bilateral nature of eye disease. Visual acuity – in both best- and worst-seeing eye – was more strongly associated with the condition-specific VFQ-UI than with the EQ-5D index, leading to better predictive power, which is not a big surprise. One way to look at this is that the EQ-5D underestimates the impact of visual acuity on utility. An alternative view could be that the VFQ-UI valuation process overestimates the impact of visual acuity on utility. This study is a nice demonstration of the fact that selecting health state utility values for a model-based economic evaluation is not straightforward. Attention needs to be given to the choice of measure (e.g. generic or condition-specific), but also to the way states are defined to allow for accurate utility values to be attached.

Do capability and functioning differ? A study of U.K. survey responses. Health Economics [PubMed] Published 24th September 2017

I like the capability approach in theory, but not in practice. I’ve written before about some of my concerns. One of them is that we don’t know whether capability measures (such as the ICECAP) offer anything beyond semantic nuance. This study sought to address that. A ‘functioning and capability’ instrument was devised, which reworded the ICECAP-A by changing phrases like “I am able to be” to phrases like “I am”, so that each question could have a ‘functioning’ version as well as a ‘capability’ version. Then, both the functioning and capability versions of the domains were presented in tandem. Questionnaires were sent to 1,627 individuals who had participated in another study about spillover effects in meningitis. Respondents (n=1,022) were family members of people experiencing after-effects of meningitis. The analysis focusses on the instances where capabilities and functionings diverge. Across the sample, 34% of respondents reported a difference between capability and functioning on at least one domain. For all domain-level responses, 12% were associated with higher capability than functioning, while 2% reported higher functioning. Some differences were observed between different groups of people. Older people tended to be less likely to report excess capabilities, while those with degree-level education reported greater capabilities. Informal care providers had lower functionings and capabilities but were more likely to report a difference between the two. Women were more likely to report excess capabilities in the ‘attachment’ domain. These differences lead the author to conclude that the wording of the ICECAP measure enables researchers to capture something beyond functioning, and that the choice of a capability measure could lead to different resource allocation decisions. I’m not convinced. The study makes an error that is common in this field; it presupposes that the changes in wording successfully distinguish between capabilities and functionings. This is implemented analytically by dropping from the primary analysis the cases where capabilities exceeded functionings, which are presumed to be illogical. If we don’t accept this presupposition (and we shouldn’t) then the meaning of the findings becomes questionable. The paper does outline most of the limitations of the study, but it doesn’t dedicate much space to alternative explanations. One is to do with the distinction between ‘can’ and ‘could’. If people answer ‘capability’ questions with reference to future possibilities, then the difference could simply be driven by optimism about future functionings. This future-reference problem is most obvious in the ‘achievement and progress’ domain, which incidentally, in this study, was the domain with the greatest probability of showing a discrepancy between capabilities and functionings. Another alternative explanation is that showing someone two slightly different questions coaxes them into making an artificial distinction that they wouldn’t otherwise make. In my previous writing on this, I suggested that two things needed to be identified. The first was to see whether people give different responses with the different wording. This study goes some way towards that, which is a good start. The second was to see whether people value states defined in these ways any differently. Until we have answers to both these questions I will remain sceptical about the implications of the ICECAP’s semantic nuance.

Estimating a constant WTP for a QALY—a mission impossible? The European Journal of Health Economics [PubMed] Published 21st September 2017

The idea of estimating willingness to pay (WTP) for a QALY has fallen out of fashion. It’s a nice idea in principle but, as the title of this paper suggests, it’s not easy to come up with a meaningful answer. One key problem has been that WTP for a QALY is not constant in the number of QALYs being gained – that is, people are willing to pay less (at the margin) for greater QALY gains. But maybe that’s OK. NICE and their counterparts tend not to use a fixed threshold but rather a range: £20,000-£30,000 per QALY, say. So maybe the variability in WTP for a QALY can be reflected in this range. This study explores some of the reasons – including uncertainty – for differences in elicited WTP values for a QALY. A contingent valuation exercise was conducted using a 2014 Internet panel survey of 1,400 Swedish citizens. The survey consisted 21 questions about respondents’ own health, sociodemographics, prioritisation attitudes, WTP for health improvements, and a societal decision-making task. Respondents were randomly assigned to one of five scenarios with different magnitudes and probabilities of health gain, with yes/no responses for five different WTP ‘bids’. The estimated WTP for a QALY – using the UK EQ-5D-3L tariff – was €17,000. But across the different scenarios, the WTP ranged from €10,600 to over a million. Wide confidence intervals abound. The authors’ findings only partially support an assumption of weak scope sensitivity – that more QALYs are worth paying more for – and do not at all support a strong assumption of scope sensitivity that WTP is proportional to QALY gain. This is what is known as scope bias, and this insensitivity to scope also applied to the variability in uncertainty. The authors also found that using different EQ-5D or VAS tariffs to estimate health state values resulted in variable differences in WTP estimates. Consistent relationships between individuals’ characteristics and their WTP were not found, though income and education seemed to be associated with higher willingness to pay across the sample. It isn’t clear what the implications of these findings are, except for the reinforcement of any scepticism you might have about the sociomathematical validity (yes, I’m sticking with that) of the QALY.

Credits

Thesis Thursday: Lidia Engel

On the third Thursday of every month, we speak to a recent graduate about their thesis and their studies. This month’s guest is Dr Lidia Engel who graduated with a PhD from Simon Fraser University. If you would like to suggest a candidate for an upcoming Thesis Thursday, get in touch.

Title
Going beyond health-related quality of life for outcome measurement in economic evaluation
Supervisors
David Whitehurst, Scott Lear, Stirling Bryan
Repository link
https://theses.lib.sfu.ca/thesis/etd10264

Your thesis explores the potential for expanding the ‘evaluative space’ in economic evaluation. Why is this important?

I think there are two answers to this question. Firstly, methods for economic evaluation of health care interventions have existed for a number of years but these evaluations have mainly been applied to more narrowly defined ‘clinical’ interventions, such as drugs. Interventions nowadays are more complex, where benefits cannot be simply measured in terms of health. You can think of areas such as public health, mental health, social care, and end-of-life care, where interventions may result in broader benefits, such as increased control over daily life, independence, or aspects related to the process of health care delivery. Therefore, I believe there is a need to re-think the way we measure and value outcomes when we conduct an economic evaluation. Secondly, ignoring broader outcomes of health care interventions that go beyond the narrow focus of health-related quality of life can potentially lead to misallocation of scarce health care resources. Evidence has shown that the choice of outcome measure (such as a health outcome or a broader measure of wellbeing) can have a significant influence on the conclusions drawn from an economic evaluation.

You use both qualitative and quantitative approaches. Was this key to answering your research questions?

I mainly applied quantitative methods in my thesis research. However, Chapter 3 draws upon some qualitative methodology. To gain a better understanding of ‘benefits beyond health’, I came across a novel approach, called Critical Interpretive Synthesis. It is similar to meta-ethnography (i.e. a synthesis of qualitative research), with the difference that the synthesis is not of qualitative literature but of methodologically diverse literature. It involves an iterative approach, where searching, sampling, and synthesis go hand in hand. It doesn’t only produce a summary of existing literature but enables the development of new interpretations that go beyond those originally offered in the literature. I really liked this approach because it enabled me to synthesise the evidence in a more effective way compared with a conventional systematic review. Defining and applying codes and themes, as it is traditionally done in qualitative research, allowed me to organize the general idea of non-health benefits into a coherent thematic framework, which in the end provided me with a better understanding of the topic overall.

What data did you analyse and what quantitative methods did you use?

I conducted three empirical analyses in my thesis research, which all made use of data from the ICECAP measures (ICECAP-O and ICECAP-A). In my first paper, I used data from the ‘Walk the Talk (WTT)‘ project to investigate the complementarity of the ICECAP-O and the EQ-5D-5L in a public health context using regression analyses. My second paper used exploratory factor analysis to investigate the extent of overlap between the ICECAP-A and five preference-based health-related quality of life measures, using data from the Multi Instrument Comparison (MIC) project. I am currently finalizing submission of my third empirical analysis, which reports findings from a path analysis using cross-sectional data from a web-based survey. The path analysis explores three outcome measurement approaches (health-related quality of life, subjective wellbeing, and capability wellbeing) through direct and mediated pathways in individuals living with spinal cord injury. Each of the three studies addressed different components of the overall research question, which, collectively, demonstrated the added value of broader outcome measures in economic evaluation when compared with existing preference-based health-related quality of life measures.

Thinking about the different measures that you considered in your analyses, were any of your findings surprising or unexpected?

In my first paper, I found that the ICECAP-O is more sensitive to environmental features (i.e. social cohesion and street connectivity) when compared with the EQ-5D-5L. As my second paper has shown, this was not surprising, as the ICECAP-A (a measure for adults rather than older adults) and the EQ-5D-5L measure different constructs and had only limited overlap in their descriptive classification systems. While a similar observation was made when comparing the ICECAP-A with three other preference-based health-related quality of life measures (15D, HUI-3, and SF-6D), a substantial overlap was observed between the ICECAP-A and the AQoL-8D, which suggests that it is possible for broader benefits to be captured by preference-based health-related measures (although some may not consider the AQoL-8D to be exclusively ‘health-related’, despite the label). The findings from the path analysis confirmed the similarities between the ICECAP-A and the AQoL-8D. However, the findings do not imply that the AQoL-8D and ICECAP-A are interchangeable instruments, as a mediation effect was found that requires further research.

How would you like to see your research inform current practice in economic evaluation? Is the QALY still in good health?

I am aware of the limitations of the QALY and although there are increasing concerns that the QALY framework does not capture all benefits of health care interventions, it is important to understand that the evaluative space of the QALY is determined by the dimensions included in preference-based measures. From a theoretical point of view, the QALY can embrace any characteristics that are important for the allocation of health care resources. However, in practice, it seems that QALYs are currently defined by what is measured (e.g. the dimensions and response options of EQ-5D instruments) rather than the conceptual origin. Therefore, although non-health benefits have been largely ignored when estimating QALYs, one should not dismiss the QALY framework but rather develop appropriate instruments that capture such broader benefits. I believe the findings of my thesis have particular relevance for national HTA bodies that set guidelines for the conduct of economic evaluation. While the need to maintain methodological consistency is important, the assessment of the real benefits of some health care interventions would be more accurate if we were less prescriptive in terms of which outcome measure to use when conducting an economic evaluation. As my thesis has shown, some preference-based measures already adopt a broad evaluative space but are less frequently used.