How to understand medical decisions

Replication

It is not possible for all doctors to collect the careful, controlled, well documented personal experience needed to estimate the probability of what will happen to all new patients.  Special training and support is required to collect such information, which is then published in professional journals.  A doctor who reads such publications has to decide if the information that had been provided by the author would apply to his own patients in their own settings.  In other words, if the reader were to repeat the observations described in the paper, would they be replicated?

 

A number of questions would have to be answered to decide this.  The aim of these questions is to try to show that the probability of non-replication due to a variety of reasons would be low, so that the probability of replication would be high.  These questions are:

 

1. Would the probability of non-replication due to poor reporting of the results be low?  The answer to this question would depend on the training, competence and honesty of the authors and also the clarity of their writing.  (The journal editors and peer reviewers would usually refuse publication if there was evidence of poor reporting).  

 

2. Would the methods and type of patients be similar in the reader’s setting so that the probability of non-replication due to such differences would be low?  It is only the reader who can answer these questions by comparing patients, settings and methods used by the author with those of the reader.

 

3. Are there contradictory results in other studies, so that the probability of replicating the result in this study would be low?  In other words, if the other results were pooled with those of the current study, would this pooled data give a different overall result?  This can be explored formally using ‘meta-analysis’.  In addition to the results of other studies, Bayesian statisticians think that the reader’s prior expectation of what the result should be, based on theory or informal, anecdotal, unpublished experience should also be taken into account by specifying a prior probability of various study outcomes.

 

4. Have an adequate number of observations been made in the study so that if the study were repeated, the probability would be low that a different result could be found by chance?  This question can be answered by using statistical tests of different kinds to estimate the probability of replication.  There are different ways of doing this.  

 

Conventional 'frequentist' statistics estimates the degree of ‘confidence’ (e.g. of 99%) with which the result of a study (e.g. an average blood sugar of 6.0mmol/l from 78 patients) would lie in an interval between two limits (e.g. 5.6 and 6.4mmol/l) if the study was repeated an infinite number of times with an infinite number of subjects.  This ‘confidence’ is widely interpreted as a ‘probability’ (e.g. of 0.99), although this is not how a 99% confidence is defined.  The precise meaning of an upper 99% confidence interval of 6.4mmol/l is that if 6.4 was the true population average and if 78 subjects were selected at random from this population there is a 0.5% chance that the average of the selected 78 patients would be 6.0mmol/l or less.  The meaning of a lower 99% confidence interval of 5.6mmol/l is that if 5.6 was the true population average and that if 78 subjects were selected at random from this population there is a 0.5% chance that the average of the selected 78 patients would be 6.0mmol/l or more.  In each case the true population spread is assumed to be similar to that of the population of 78 observed patients with an average of 6.0mmol/l.  this is described as the ‘frequentist’ approach.

 

Bayesian school of statisticians advocate calculating the probability of replication within such an interval of 5.6 and 6.4mmol/l (which they call a ‘credibility interval’ after repeating a study with an infinite number of subjects by also taking into account prior information that was available before the study was done (see 3 above).  If the prior information was that all possible values of the blood sugar were equally probable, then the probability of replicating the study within 6.6 and 6.4mmol/l would be about 99%, and similar to the degree of ‘confidence’ that it would lie within the 99% confidence interval.

 

Another approach is the ‘normlaised likelihood’ method, which involves finding the probability of replication between an interval (e.g. 5.6 and 6.4mmol/l) or the frequency with which an infinite number of studies with the same average and spread of results were repeated with the same number of subjects (e.g. 78).  If the number of subjects are high (e.g. over 100) and method of calculation involves assumptions based on mathematical models (e.g. the Gaussian distribution) then the probability of replication within confidence interval will be the same as the degree of confidence (e.g. 95% or 99%).

 

All these approaches (the ‘Bayesian’, the ‘frequentist’ and ‘identical number’ approach) may give similar results (e.g. that the probability will be about 0.99) that the result will be replicated between the same limit (e.g. 5.6 and 6.4mmol/l).  In other words, the result of the calculation would be similar if it was based on the study being repeated with the same number of patients (e.g. 78) an infinite number of times or if the study was repeated an infinite number of times using an infinite number of patients each time.  If the probability of replication was high based on the study numbers, then the probability of non-replication would be low using each method, (e.g. about 1-0.99 = 0.01).  

 

In conclusion, (1) if the probability of non-replication within some interval due to poor reporting of results was low, (2) if the probability of non-replication when the study was repeated with local methods and different patients were low, (3) if the probability of non-replication is low due to no other studies showing contradictory results and (4) if the probability of non-replication was low (e.g. about 0.01) due to the number of measurements made, the reader would conclude that the probability of replicating the result would be high.

 

© Huw Llewelyn 2016