11  Screening and Diagnostic Tests

11.1 Introduction

In clinical practice it is desirable to have a simple test which, depending on the presence or absence of an indicator (for example, faecal occult blood), provides a good prediction to whether or not a patient has a particular condition (for example, colorectal cancer).

To evaluate a potential diagnostic test, we apply the test to a group of individuals whose true disease status is known. We then draw up a 2 × 2 table of frequencies.

11.2 The 2 × 2 Table

code
two_by_two <- tibble(
  ` ` = c("Test positive", "Test negative", "**Total**"),
  `Disease` = c("a (true positive)", "c (false negative)", "**a + c**"),
  `No disease` = c("b (false positive)", "d (true negative)", "**b + d**"),
  `Total` = c("a + b", "c + d", "**n = a + b + c + d**")
)

two_by_two |>
  kable(escape = FALSE) |>
  kable_styling(bootstrap_options = c("striped", "hover")) |>
  add_header_above(c(" " = 1, "True Disease Status" = 2, " " = 1))
Table 11.1: Structure of a 2 × 2 table for evaluating diagnostic tests
True Disease Status
Disease No disease Total
Test positive a (true positive) b (false positive) a + b
Test negative c (false negative) d (true negative) c + d
**Total** **a + c** **b + d** **n = a + b + c + d**

Of the n individuals studied:

  • a + c individuals have the disease
  • b + d do not have the disease

11.3 Sensitivity and Specificity

Sensitivity, specificity and predictive values are measures for assessing the effectiveness of the test.

Sensitivity

Sensitivity is the proportion of individuals with the disease who are correctly identified by the test.

\[\text{Sensitivity} = \frac{a}{a + c}\]

A highly sensitive test will detect most people with the disease (few false negatives).

Specificity

Specificity is the proportion of individuals without the disease who are correctly identified by the test.

\[\text{Specificity} = \frac{d}{b + d}\]

A highly specific test will correctly identify most people without the disease (few false positives).

Key Point

Sensitivity and specificity quantify the diagnostic ability of the test. They are properties of the test itself and do not change with disease prevalence.

11.4 Predictive Values

Positive Predictive Value

Positive predictive value (PPV) is the proportion of individuals with a positive test result who have the disease.

\[\text{Positive predictive value} = \frac{a}{a + b}\]

Negative Predictive Value

Negative predictive value (NPV) is the proportion of individuals with a negative test result who do not have the disease.

\[\text{Negative predictive value} = \frac{d}{c + d}\]

The predictive values indicate how likely it is that the individual has or does not have the disease, given the test result.

Effect of Prevalence

Prevalence and Predictive Values

Predictive values are dependent on the prevalence of the disease in the population being studied. Prevalence is the proportion of the population who have the disease.

\[\text{Prevalence} = \frac{a + c}{n}\]

In populations where the disease is common, the positive predictive value of a given test will be higher than in populations where the disease is rare.

11.5 Likelihood Ratios

The likelihood ratio (LR) for a positive test result is the ratio of the probability of a positive result if the patient has the disease (sensitivity) to the probability of a positive result if the patient does not have the disease (1-specificity).

\[\text{Likelihood ratio} = \frac{\text{Sensitivity}}{1 - \text{Specificity}}\]

For example, a LR of 4 for a positive result indicates that a positive result is four times as likely to occur in an individual with the disease compared to one without it.

11.6 Cut-off Values

Sometimes a diagnostic test needs to be performed on the basis of a continuous numerical measurement. Often there is no threshold above (or below) which the disease definitely occurs. In this situation, a cut-off value is identified at which it is believed an individual has a very high chance of having the disease.

11.7 ROC Curves

The receiver operating characteristic (ROC) curve provides a way of assessing an optimal cut-off value for a test. A ROC curve plots sensitivity against (1 - specificity) at all potential cut-off points. It essentially compares the probabilities of a positive test result in those with and without disease.

The overall accuracy can be assessed by the area under the curve (AUC):

  • AUC = 0.5: No discrimination (test is no better than chance)
  • AUC = 1.0: Perfect discrimination
  • AUC > 0.7: Generally considered acceptable
  • AUC > 0.8: Good discrimination
  • AUC > 0.9: Excellent discrimination
code
# Create example ROC curve data
set.seed(123)
roc_data <- tibble(
  specificity = seq(1, 0, by = -0.01),
  sensitivity = pbeta(1 - specificity, 2, 1)  # Creates a realistic ROC curve shape
)

# Calculate AUC (approximately)
auc <- round(mean(roc_data$sensitivity), 2)

ggplot(roc_data, aes(x = 1 - specificity, y = sensitivity)) +
  geom_line(colour = "#3498db", linewidth = 1.2) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", colour = "grey50") +
  geom_ribbon(aes(ymin = 1 - specificity, ymax = sensitivity), 
              fill = "#3498db", alpha = 0.2) +
  annotate("text", x = 0.6, y = 0.3, 
           label = paste("AUC =", auc), size = 5) +
  labs(x = "1 - Specificity (False Positive Rate)",
       y = "Sensitivity (True Positive Rate)") +
  coord_equal() +
  theme_minimal(base_size = 14)
Figure 11.1: Example ROC curve showing trade-off between sensitivity and specificity

11.8 Worked Example: PSA Testing for Prostate Cancer

code
psa_table <- tibble(
  ` ` = c("PSA ≥2.1 ng/ml (positive)", "PSA <2.1 ng/ml (negative)", "**Total**"),
  `Prostate cancer` = c("167", "282", "**449**"),
  `No prostate cancer` = c("508", "1993", "**2501**"),
  `Total` = c("675", "2275", "**2950**")
)

psa_table |>
  kable(escape = FALSE) |>
  kable_styling(bootstrap_options = c("striped", "hover")) |>
  add_header_above(c(" " = 1, "True Disease Status" = 2, " " = 1))
Table 11.2: PSA test results for prostate cancer detection (threshold ≥2.1 ng/ml)
True Disease Status
Prostate cancer No prostate cancer Total
PSA ≥2.1 ng/ml (positive) 167 508 675
PSA <2.1 ng/ml (negative) 282 1993 2275
**Total** **449** **2501** **2950**

Calculating the Measures

code
# Values from the table
a <- 167  # True positive
b <- 508  # False positive
c <- 282  # False negative
d <- 1993 # True negative
n <- a + b + c + d

# Calculate measures
sensitivity <- a / (a + c)
specificity <- d / (b + d)
ppv <- a / (a + b)
npv <- d / (c + d)
prevalence <- (a + c) / n
lr <- sensitivity / (1 - specificity)

Sensitivity:

\[\text{Sensitivity} = \frac{167}{167 + 282} = 0.37\]

Using this test, if prostate cancer is present there is a 37% chance of detecting it.

Specificity:

\[\text{Specificity} = \frac{1993}{508 + 1993} = 0.8\]

If there is no prostate cancer, there is an 80% chance of a negative result. 20% of people will have a false positive result.

Positive Predictive Value:

\[\text{PPV} = \frac{167}{167 + 508} = 0.25\]

There is a 25% chance that if the test is positive the patient actually has prostate cancer.

Negative Predictive Value:

\[\text{NPV} = \frac{1993}{282 + 1993} = 0.88\]

There is an 88% chance, if the test is negative, that the patient does not have prostate cancer. This means there is a 12% chance of a false negative result.

Likelihood Ratio:

\[\text{LR} = \frac{0.37}{1 - 0.8} = 1.83\]

If the test is positive, the patient is 1.83 times (almost twice) as likely to have prostate cancer as not have it.

Summary of Results

code
summary_table <- tibble(
  Measure = c("Sensitivity", "Specificity", "Positive Predictive Value", 
              "Negative Predictive Value", "Prevalence", "Likelihood Ratio"),
  Value = c(
    paste0(round(sensitivity * 100, 1), "%"),
    paste0(round(specificity * 100, 1), "%"),
    paste0(round(ppv * 100, 1), "%"),
    paste0(round(npv * 100, 1), "%"),
    paste0(round(prevalence * 100, 1), "%"),
    round(lr, 2)
  ),
  Interpretation = c(
    "37% of cancers detected",
    "80% of non-cancers correctly identified",
    "25% of positive tests are true cancers",
    "88% of negative tests are truly cancer-free",
    "15% of population has prostate cancer",
    "Positive test ~2× more likely in cancer"
  )
)

summary_table |>
  kable() |>
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 11.3: Summary of diagnostic test measures for PSA ≥2.1 ng/ml
Measure Value Interpretation
Sensitivity 37.2% 37% of cancers detected
Specificity 79.7% 80% of non-cancers correctly identified
Positive Predictive Value 24.7% 25% of positive tests are true cancers
Negative Predictive Value 87.6% 88% of negative tests are truly cancer-free
Prevalence 15.2% 15% of population has prostate cancer
Likelihood Ratio 1.83 Positive test ~2× more likely in cancer

11.9 Summary

Measure Formula Interpretation
Sensitivity a / (a + c) Proportion of diseased correctly identified
Specificity d / (b + d) Proportion of non-diseased correctly identified
PPV a / (a + b) Probability of disease given positive test
NPV d / (c + d) Probability of no disease given negative test
Prevalence (a + c) / n Proportion of population with disease
Likelihood ratio Sens / (1 - Spec) How much more likely is positive test in disease
AUC Area under ROC Overall discriminative ability of test