WinBUGS is a widely used free software program within health-economics. It allows for Bayesian statistical modelling, using Gibbs sampling. (Hence the name: the Windows version of Bayesian inference Using Gibbs Sampling). One of the drawbacks of WinBUGS is the notoriously uninformative error messages you can receive. While Google is usually a Fountain of Knowledge on solving errors, where WinBUGS is concerned it often only serves up other people asking the same question, and hardly any answers. This post is about one error message that I found, the solution that’s sometimes offered which I think only partly solves the problem and the solution I found which solves it completely.
The error message itself is “Trap 66 (postcondition violated)”. Variance priors have been identified as the culprits. The suggested solutions I could find (for example here, here and here) all point towards those priors being too big. The standard advice is then to reduce the priors (for example from dunif(0,100) to dunif(0,10)) and rerun it. This usually solves the problem.
However, to me, this doesn’t make a whole lot of sense theoretically. And, in a rare case of the two aligning, it also didn’t solve my problems in practice. I have been performing a simulation study, in which I have about 8,000 similar, but different data sets (8 scenarios, 1000 repetitions in each). They all represent mixed treatment comparisons (MTC), which are analyzed by WinBUGS. I used SAS to create the data, send it to WinBUGS and collect and analyse the outcomes. When I started the random effects MTC, the “Trap 66 (postcondition violated)” popped up around dataset 45. Making the priors smaller, as suggested, solved the problem for this particular dataset, but it came back on data set 95. The funny thing is that making the priors higher also solved the problem for the original dataset, but once again the same problem arose at a different data set (this time number 16).
Whenever I tried to recreate the problem, it would give the same error message at the exact same point in time, even though it’s a random sampler. From this it seems to be that the reason why the suggested solution works for one data is that the generated ‘chains’, as they are called in WinBUGS, are the same with the same priors and initial values. Defining a smaller prior will give a different chain which is likely not to cause problems. But so will a larger prior or a different initial value. However, it didn’t really solve the problem.
The solution I have found to work for all 8,000 data sets is to not look at the maximum value of the prior, but at the minimum value. The prior that is given for a standard error usually looks something like dunif(0,X). In my case, I did an extra step, with a prior on a variable called tau, for which I specify a uniform prior. The precision (one divided by the variance) that goes into the link function is defined by
prec <- pow(tau,-2)
This does not make any difference for the problem or the solution. My hypothesis is that when Trap 66 comes up, the chain generates a tau (or standard error, if that’s what you modelled directly) equal to 0, which resolves into a precision equal to 1 divided by 0, or infinity. The solution is to let the prior not start at 0, but at a small epsilon. I used dunif(0.001,10), which solved all my problems.
This solution is related to a different problem I once had when programming a probability. I mistakenly used a dunif(0,1) prior. Every now and then the chain will generate exactly 0 or 1, which does not sit well with the binomial link function. The error message is different (“Undefined real result”), but the solution is again to use a prior that does not include the extremes. In my case, using a flat dbeta instead (which I should have done to begin with) solved that problem.
Any suggestions and comments are much appreciated. You can download WinBUGS, the free immortality key and lots of examples from the BUGS project website here. It also contains a list of common error messages and their solutions. Not Trap 66, obviously.
[…] BUGS, either OpenBUGS or WinBUGS, which is a great piece of kit. But it could be slow, it throws up indecipherable error messages, and sometimes struggled with convergence. Then I discovered STAN, which is named […]
I found this quite interesting and thought I would share with you my thoughts on what you have written.
It is true that WinBUGS error messages are uninformative, although there are usually obvious causes for Trap messages. Traps usually occur because of prior-data conflicts (i. e. the data is inconsistent with the prior distribution), because the prior distribution is too vague or because the initial values are inconsistent with the posterior distributions.
You say that you are performing a random effects network meta-analysis (mixed treatment comparison) but you do not make it clear what type do data you are analysing. Values of the between study standard deviation (it is not a standard error) for a log odds ratio, log relative risk and log hazard ratio between 0 and 0.5 are indicative of mild heterogeneity, between 0.5 and 1 are indicative of mioderate heterogeneity, and greater than 1 are indicative of extreme heterogeneity. For a continuous outcome measure, plausible values depend on the scale of the data.
There is no such thing as a non-informative prior except under exceptional circumstances. A proper Bayesian analysis require genuine specification of a prior distribution. However, when there is sufficient sample data to dominate the prior information the effort involved in defining a genuine prior distribution may involve unnecessary effort. In contrast, when there is limited sample data, the prior distribution will be informative.
Prior distributions should not be used unthinkingly and, in the absence of sufficient sample data, weakly informative prior distributions (my phrase in preference to non-informative prior distributions) will be influential. Many examples exist where so-called Bayesian analyses are being done without proper consideration of the prior distribution but with limited data that give inappropriate inferences.
Whilst so-called non-informative prior distributions for log odds ratios, log relative risks and log hazard ratios are often specified using a U(0, 2) distribution, it is only meaningful when there is suffient sample data to allow Bayesian updating (unless you genuinely believe that extreme heterogeneity is equally plausible as mild heterogeneity. If you are estimating such parameters and are using a U(0, 10) distribution then you are giving equal weight to mild and extremely large values which will only give meaningful inferences when there is sufficient sample data.
It is unlikely in most practical meta-analyses to believe that the between study standard deviation will be zero, and there is evidence supports this. Incidently, I have never had a problem with MCMC having a problem with generating a posterior distribution for a between study standard deviation and have never had to truncate small values.
Finally, simulation is not appropriate for evaluating Bayesian methods.