How to understand medical decisions

The problem with 'specificity'

The concepts of 'sensitivity' and 'specificity' were used for radar during Word War II.  If the 'receiver-operator' turned the detection knob one way, it increased sensitivity but tended to detect unwanted objects (e.g. sea birds) by decreasing the specificity.  If it was turned the other way, it it decreased sensitivity but was at risk of missing smaller enemy aircraft).  However, this reduced the risk of detecting unwanted objects (e.g. sea birds) by being more specific.  

 

It was assumed by some that this is how diagnosis works too. The 'sensitivity' of a finding with respect to a diagnosis is the frequecy with which that patient's  finding occurs in those with the diagnosis e.g. the frequency of localised right lower quadrant (LRLQ) pain in patients with appendicitis e.g. 50/100.

 

However the 'specificity' is the frequency with which those WITHOUT the finding occurs in those WITHOUT the diagnosis.  In turn, the false positive rate is '1 minus the specificity': the frequency with which a finding e.g. LRLQ pain occurs in people WITHOUT appendicitis.  It is also assumed that if a test occurs equally frequently in those with and without a diagnosis, then it is useless when used alone or in combination with other findings.

 

There is a big problem with 'specificity' and 'false posotive rate (i.e. '1 minus the specificity'). It is the issue of who should we regard as 'those without a diagnosis' e.g. who should we regard as those 'without appendicitis'?  Is it those in a ward, in a whole hospital or in the whole community?  In other words, these values depend on the population in which 'those without' the diagnosis or finding were counted.  There is also another problem is that 'those without a diagnosis' will include patients with other diagnoses (the differential diagnosis of the findings and this represents important knowledge during the diagnostic process. To understand this, look at Figure 1 below and then read on.

 

If localised right lower quadrant (LRLQ) pain occurs in 50% of those with appendicitis and 50% of those without appendicitis, the likelihood ratio is 1 and it seems unhelpful.  Similarly, if guarding occurs in 50% of those with appendicitis and 50% of those without appendicitis, the likelihood ratio is 1 and guarding also seems to be unhelpful.  These two findings will also seem unhelpful if used in combination as the combined likelihood ratio assuming statistical independence is 1 x 1 = 1.  

 

However, assume that 50% of those without appendicitis had ‘non-specific abdominal pain’ (NSAP) and all these patients with NSAP had LRLQ pain, the others without appendicitis or NSAP never having LRLQ pain. Also assume that of those without appendicitis who had NSAP, none had guarding (see Figure 1).  

 

This means that if a patient has LRLQ pain, he or she must have appendicitis or NSAP.  If the patient has guarding then as this never occurs in NSAP but often occurs in appendicitis, the diagnosis must be appendicitis.  So despite all the likelihood ratios being 1 (and apparently being useless), the combination of LRLQ pain and guarding predict appendicitis with certainty (showing that they are very useful indeed).  This is how reasoning by elimination between LRLQ pain and guarding works.

 

This is a serious issue because 'sensitivity' and 'specificity' currently play a central role in deciding whether the results of new tests are going to be useful for diagnosis and therefore whether use of that test should be allowed. However, from this example, it can be seen that 'specificity' is unreliable and that in reasoning by elimination, only the 'sensitivities' are used: the frequency of patients with the finding in those with the diagnosis (e.g. the frequency of guarding in those with appendicitis (50%) and NSAP (0%).

Figure 1:

R by E for A using R & G