Alastair Canaway’s journal round-up for 18th September 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Selection of key health domains from PROMIS® for a generic preference-based scoring system. Quality of Life Research [PubMedPublished 19th August 2017

The US Panel on Cost-Effectiveness recommends the use of QALYs. It doesn’t, however, instruct (unlike the UK) as to what measure should be used. This leaves the door ajar for both new and established measures. This paper sets about developing a new preference-based measure from the Patient-Reported Outcomes Measurement System (PROMIS). PROMIS is a US National Institutes of Health funded suite of person-centred measures of physical, mental, and social health. Across all the PROMIS measures there exist over 70 domains of health relevant to adult health. For all its promise, the PROMIS system does not produce a summary score amenable to the calculation of QALYs, nor for general descriptive purposes such as measuring HRQL over time. This study aimed to reduce the 70 items down to a number suitable for valuation. To do this, Delphi methods were used. The Delphi approach is something that seems to be increasing in popularity in the health economics world. For those unfamiliar, it essentially involves obtaining the opinions of experts independently and iteratively conducting rounds of questioning to reach a consensus (over two or more rounds). In this case nine health outcomes experts were recruited, they were presented with ‘all 37 domains’ (no mention is made of how they got from 70 to 37!) and asked to remove any domains that were not appropriate for inclusion in a general health utility measure or were redundant due to another PROMIS domain. If more than seven experts agreed, then the domain was removed. Responses were combined and presented until consensus was reached. This left 10 domains. They then used a community sample of 50 participants to test for independence of domains using a pairwise independence evaluation test. They were given the option of removing a domain they felt was not important to overall HRQL and asked to rate the importance of remaining domains using a VAS. These findings were used by the research team to whittle down from nine domains to seven. The final domains were: Cognitive function- abilities; Depression; Fatigue; Pain Interference; Physical Function; Ability to participate in social roles and activities; and Sleep disturbance. Many of these are common to existing measures but I did rather like the inclusion of cognitive function and fatigue – something that is missing in many, and to me appear important. The next step is valuation. Upon valuation, this is a promising candidate for use in economic evaluation – particularly in the US where the PROMIS measurement suite is already established.

Predictive validation and the re-analysis of cost-effectiveness: do we dare to tread? PharmacoEconomics [PubMedPublished 22nd August 2017

PharmacoEconomics treated us to a provocative editorial regarding predictive validation and re-analysis of cost-effectiveness models – a call to arms of sorts. For those (like me) who are not modelling experts, predictive validation (aka 4th order validation) refers to the comparison of model outputs with data that are collected after the initial analysis of the model. So essentially you’re comparing what you modelled would happen with what actually happened. The literature suggests that predictive validation is widely ignored. The importance of predictive validity is highlighted with a case study where predictive-validity was examined three years after the end of a trial – upon reanalysis the model was poor. This was then revised, which led to a much better fit of the prospective data. Predictive validation can, therefore, be used to identify sources of inaccuracies in models. If predictive validity was examined more commonly, improvements in model quality more generally are possible. Furthermore, it might be possible to identify specific contexts where poor predictive validity is prevalent and thus require further research. The authors highlight the field of advanced cancers as a particularly relevant context where uncertainty around survival curves is prevalent. By actively scheduling further data collection and updating the survival curves we can reduce the uncertainty surrounding the value of high-cost drugs. Predictive validation can also inform other aspects of the modelling process, such as the best choice of time point from which to extrapolate, or credible rates of change in predicted hazards. The authors suggest using expected value of information analysis to identify technologies with the largest costs of uncertainty to prioritise where predictive validity could be assessed. NICE and other reimbursement bodies require continued data collection for ‘some’ new technologies, the processes are therefore in place for future studies to be designed and implemented in a way to capture such data which allows later re-analysis. Assessing predictive validity seems eminently sensible, there are however barriers. Money is the obvious issue, extended prospective data collection and re-analysis of models requires resources. It does, however, have the potential to save money and improve health in the long run. The authors note how in a recent study they demonstrated that a drug for osteoporosis that had been recommended by Australia’s Pharmaceutical Benefits Advisory Committee was not actually cost-effective when further data were examined. There is clearly value to be achieved in predictive validation and re-analysis – it’s hard to disagree with the authors and we should probably be campaigning for longer term follow-ups, re-analysis and increased acknowledgement of the desirability of predictive validity.

How should cost-of-illness studies be interpreted? The Lancet Psychiatry [PubMed] Published 7th September 2017

It’s a good question – cost of illness studies are commonplace, but are they useful from a health economics perspective? A comment piece in The Lancet Psychiatry examines this issue using the case study of self-harm and suicide. It focuses on a recent publication by Tsiachristas et al, which examines the hospital resource use and care costs for all presentations of self-harm in a UK hospital. Each episode of self-harm cost £809, and when extrapolated to the UK cost £162 million. Over 30% of these costs were psychological assessments which despite being recommended by NICE only 75% of self-harming patients received. If all self-harming patients received assessments as recommended by NICE then another £51 million would be added to the bill. The author raises the question of how much use is this information for health economists. Nearly all cost of illness studies end up concluding that i) they cost a lot, and ii) money could be saved by reducing or ameliorating the underlying factors that cause the illness. Is this helpful? Well, not particularly, by focusing only on one illness there is no consideration of the opportunity cost: if you spend money preventing one condition then that money will be displacing resources elsewhere, likewise, resources spent reducing one illness will likely be balanced by increased spending on another illness. The author highlights this with a thought experiment: “imagine a world where a cost of illness study has been done for every possible diseases and that the total cost of illness was aggregated. The counterfactual from such an exercise is a world where nobody gets sick and everybody dies suddenly at some pre-determined age”. Another issue is that more often than not, cost of illness studies identify that more, not less should be spent on a problem, in the self-harm example it was that an extra £51 million should be spent on psychological assessments. Similarly, it highlights the extra cost of psychological assessments, rather than the glaring issue that 25% who attend hospital for self-harm are not getting the required psychological assessments. This very much links into the final point that cost of illness studies neglect the benefits being achieved. Now all the negatives are out the way, there are at least a couple of positives I can think of off the top of my head i) identification of key cost drivers, and ii) information for use in economic models. The take home message is that although there is some use to cost of illness studies, from a health economics perspective we (as a field) would be better off spending our time steering clear.


Widespread misuse of statistical significance in health economics

Despite widespread cautionary messages, p-values and claims of statistical significance are continuously misused. One of the most common errors is to mistake statistical significance for economic, clinical, or political significance. This error may manifest itself by authors interpreting only ‘statistically significant’ results as important, or even neglecting to examine the magnitude of estimated coefficients. For example, we’ve written previously about a claim of how statistically insignificant results are ‘meaningless’. Another common error is to ‘transpose the conditional’, that is to interpret the p-value as the posterior probability of a null hypothesis. For example, in an exchange on Twitter recently, David Colquhoun, whose discussions of p-values we’ve also previously covered, made the statement:

However, the p-value does not provide probability/evidence of a null hypothesis (that an effect ‘exists’). P-values are correlated with the posterior probability of the null hypothesis in a way that depends on statistical power, choice of significance level, and prior probability of the null. But observing a significant p-value only means that the data were unlikely to be produced by a particular model, not that the alternative hypothesis is true. Indeed, the null hypothesis may be a poor explanation for the observed data, but that does not mean it is a better explanation than the alternative. This is the essence of Lindley’s paradox.

So what can we say about p-values? The six principles of the ASA’s statement on p-values are:

  1. P-values can indicate how incompatible the data are with a specified statistical model.
  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
  4. Proper inference requires full reporting and transparency.
  5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.


In 1996, Deirdre McClosky and Stephen Ziliak surveyed economics papers published in the American Economic Review in the 1980s for p-value misuse. Overall, 70% did not distinguish statistical from economic significance and 96% misused a test statistic in some way. Things hadn’t improved when they repeated the study ten years later. Unfortunately, these problems are not exclusive to the AER. A quick survey of a top health economics journal, Health Economics, finds similar misuse as we discuss below. This journal is not singled out for any particular reason beyond that it’s one of the key journals in the field covered by this blog, and frequently features in our journal round-ups. Similarly, no comment is made on the quality of the studies or authors beyond the claims and use of statistical significance. Nevertheless, where there are p-values, there are problems. For such a pivotal statistic, one that careers can be made or broken on, we should at least get it right!

Nine studies were published in the May 2017 issue of Health Economics. The list below shows some examples of p-value errors in the text of the articles. The most common issue was using the p-value to interpret whether an effect exists or not, or using it as the (only) evidence to support or reject a particular hypothesis. As described above, the statistical significance of a coefficient does not imply the existence of an effect. Some of the statements claimed below to be erroneous may be contentious as, in the broader context of the paper, they may make sense. For example, claiming that a statistically significant estimate is evidence of an effect may be correct where the broader totality of the evidence suggests that any observed data would be incompatible with a particular model. However, this is generally not the way the p‘s are used.

Examples of p-value (mis-)statements

Even the CMI has no statistically significant effect on the facilitation ratio. Thus, the diversity and complexity of treated patients do not play a role for the subsidy level of hospitals.

the coefficient for the baserate is statistically significant for PFP hospitals in the FE model, indicating that a higher price level is associated with a lower level of subsidies.

Using the GLM we achieved nine significant effects, including, among others, Parkinson’s disease and osteoporosis. In all components we found more significant effects compared with the GLM approach. The number of significant effects decreases from component 2 (44 significant effects) to component 4 (29 significant effects). Although the GLM lead to significant results for intestinal diverticulosis, none of the component showed equivalent results. This might give a hint that taking the component based heterogeneity into account, intestinal diverticulosis does not significantly affect costs in multimorbidity patients. Besides this, certain coefficients are significant in only one component.

[It is unclear what ‘significant’ and ‘not significant’ refer to or how they are calculated but appear to refer to t>1.96. Not clear if corrections for multiple comparisons.]

There is evidence of upcoding as the coefficient of spreadp_posis statistically significant.

Neither [variable for upcoding] is statistically significant. The incentive for upcoding is, according to these results, independent of the statutory nature of hospitals.

The checkup significantly raises the willingness to pay any positive amount, although it does not significantly affect the amount reported by those willing to pay some positive amount.

[The significance is with reference to statistical significance].

Similarly, among the intervention group, there were lower probabilities of unhappiness or depression (−0.14, p = 0.045), being constantly under strain (0.098, p = 0.013), and anxiety or depression (−0.10, p = 0.016). There was no difference between the intervention group and control group 1 (eligible non-recipients) in terms of the change in the likelihood of hearing problems (p = 0.64), experiencing elevate blood pressure (p = 0.58), and the number of cigarettes smoked (p = 0.26).

The ∆CEs are also statistically significant in some educational categories. At T + 1, the only significant ∆CE is observed for cancer survivors with a university degree for whom the cancer effect on the probability of working is 2.5 percentage points higher than the overall effect. At T + 3, the only significant ∆CE is observed for those with no high school diploma; it is 2.2 percentage points lower than the overall cancer effect on the probability of working at T + 3.

And, just for balance, here is a couple from this year’s winner of the Arrow prize at iHEA, which gets bonus points for the phrase ‘marginally significant’, which can be used both to confirm and refute a hypothesis depending on the inclination of the author:

Our estimated net effect of waiting times for high-income patients (i.e., adding the waiting time coefficient and the interaction of waiting times and high income) is positive, but only marginally significant (p-value 0.055).

We find that patients care about distance to the hospital and both of the distance coefficients are highly significant in the patient utility function.


As we’ve argued before, p-values should not be the primary result reported. Their interpretation is complex and so often leads to mistakes. Our goal is to understand economic systems and to determine the economic, clinical, or policy relevant effects of interventions or modifiable characteristics. The p-value does provide some useful information but not enough to support the claims made from it.


Chris Sampson’s journal round-up for 3rd July 2017

Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.

Role of cost on failure to access prescribed pharmaceuticals: the case of statins. Applied Health Economics and Health Policy [PubMed] Published 28th June 2017

Outside work, I find that people often like to tell me how to solve health economics problems. A common one is the idea that the NHS could save a load of money by enforcing prescription charges. It’s a textbook life-ain’t-that-simple situation. One of the reasons it isn’t that simple is that, if you start charging for prescriptions, people will be less likely to take their meds. That’s probably bad news for patients and for doctors. “But it’s only a few quid”. Well… As in many countries, Australians have to cough up a co-payment to fill their prescriptions. The size of the copayment depends on i) whether or not the patient is concessional (e.g. a pensioner) and ii) whether or not a threshold has been reached for total family prescription expenditure in one year. Concessional patients have a lower co-payment, a lower threshold and no co-payment once the threshold is met. This study looks at statin use in this context for 94,000 over-45s in New South Wales from 2005-2011. Separate logistic regressions are run for each of the 4 groups (concessional/non-concessional, pre-threshold/post-threshold) to predict statin adherence, controlling for a good range of sociodemographic and health-related variables. The size of the copayment comes out as the biggest barrier to adherence. More than 75% of people who weren’t adherent before reaching their threshold became so after reaching it – that is, once their co-payment was either much-reduced or zero. Poorest adherence was observed in non-concessional low-income people who hadn’t reached the threshold, who faced the highest co-payment. Income, age group and holding private insurance were also important determinants. In short, charging people for their statins, even if it isn’t much money, reduces the likelihood that they will take them. There is the possibility that adherence is correlated with the likelihood of having reached the threshold, which could undermine these results. I’m not entirely convinced that the analysis cuts the mustard, but I’ll let the more econometrically minded amongst you figure that out.

Conceptualizations of the societal perspective within economic evaluations: a systematic review. International Journal of Technology Assessment in Health Care [PubMed] Published 23rd June 2017

In my last round-up, I included a study looking at resource use measures for intersectoral costs and benefits; costs and benefits that occur outside the health sector. This week we have a study looking at how the inclusion of intersectoral costs and benefits influences results, and how researchers have interpreted the ‘societal perspective’. A systematic review was conducted for economic evaluations purporting to use a societal perspective, published since the CHEERS statement was released, including 107 studies. Only 74 provided a conceptualisation of the societal perspective. Reported conceptualisations of the societal perspective were grouped according to the specificity of their definition – 18 general, 50 specific, 6 both – and assessed using content analysis. Of these, 25 referred to a guideline or other source in their conceptualisation. A total of 10 general and 56 specific clusters of conceptualisations were identified, demonstrating major inconsistency. For some studies – namely trial-based economic evaluations in musculoskeletal or mental disorders – the authors dug deeper and extracted additional information. In both cases, where data were adequately reported, the intersectoral costs tended to make up more than 50% of total costs. But in general the specific intersectoral items were not fully reported and relevant costs (e.g. in education or criminal justice) were not identified. It probably won’t come as a surprise that the general impression is that a lot of researchers interpret the societal perspective – in practice, if not in theory – as health costs plus productivity losses. And usually, that’s not really good enough.

Annual direct medical costs associated with diabetes-related complications in the event year and in subsequent years in Hong Kong. Diabetic Medicine [PubMed] Published 21st June 2017

There are a lot of high-quality decision models built for the evaluation of interventions in diabetes. See Mt Hood. But some are still a bit primitive when it comes to estimating the costs associated with the many clinical pathways and complications associated with diabetes, especially when multimorbidity can be important. So studies like this are very welcome. This study contributes cost estimates for a wide range of complications (13, to be precise) for what should be a representative sample of (Chinese) people with diabetes. It includes public health care expenditure for more than 120,000 people with diabetes in Hong Kong, with 5-year follow-up. For private health care costs, a cross-section of 1275 people was recruited through other studies and provided information about service use by telephone. Fixed effects panel data regressions were used for the public medical costs. During the follow-up, 17% developed at least one complication. The models estimate the impact on total cost of new disease and existing disease separately, in order to identify first-year and subsequent-year cost estimates. Generalised linear models were used for the private health care costs. The base case of a 65-year old with no complications was US$1500/year in costs to the public purse. The biggest effect on costs was a first-year multiplier of 9.38 for lower limb ulcer (1.62 in subsequent years). Other costly complications were stroke, heart failure, end-stage renal disease and acute myocardial infarction. Private costs were much smaller, at $187 for the base case. These figures may prove useful to decision modellers, even outside the Hong Kong setting.

Financing and distribution of pharmaceuticals in the United States. JAMA [PubMed] Published 15th May 2017

The purpose of this article seems to be to demonstrate the complexity of the financing and distribution of pharmaceuticals in the US. It describes distributors, retailers and patients on the distribution side, and pharmacy benefit managers and health insurers on the financing side, with manufacturers in the middle. But the system that is shown in the article’s figure strikes me as surprisingly simple for an industry in which such vast amounts of money are sloshing around. It’s far more straightforward than any diagram you might see relating to the organisation of NHS services. I would imagine that a freer market would be associated with more complexity as upstarts might muscle-in on smaller corners of the market and become new intermediaries. But the article is still enlightening. It outlines some of the features of the market, particularly the high levels of concentration, characteristics of the key players and the staggering sums of money changing hands.