13  Epidemiological Studies

13.1 Introduction

Epidemiology is the study of the occurrence and determinants of ill health in the population.

Epidemiological studies assess the relationship between factors of interest (these may include biological, social, behavioural, and environmental) and the occurrence of disease in the population (for example the incidence of cancer, heart disease, hypertension).

Epidemiological studies are mostly observational in design (in contrast to experimental studies which involve interventions to affect an outcome).

13.2 Routinely Collected Data

Routinely collected administrative data such as death certification, cancer registration and hospital discharge records can be used for epidemiological purposes and provide insights into the health of the population.

Death certification provides estimates of the annual death rates in the population. These rates are usually age standardised to take account of differences in the age structure of the population over time or between places.

13.3 Types of Observational Studies

Observational epidemiological studies fall into three broad groups:

  • Cross-sectional studies
  • Cohort studies
  • Case-control studies

The unit of observation is usually individuals but can also be groups of individuals (for example – different populations defined by a shared geography – these are called ecological studies).

  • Studies in which the health event of interest has yet to happen are called prospective
  • Studies in which the health event has already occurred are called retrospective
code
# Create a visual comparison of study designs
study_designs <- tibble(
  Design = c("Cross-sectional", "Cohort", "Case-control"),
  Direction = c("Single time point", "Forward in time", "Backward in time"),
  `Time Frame` = c("Present", "Prospective", "Retrospective"),
  `Outcome Measure` = c("Prevalence", "Incidence/Relative Risk", "Odds Ratio")
)

study_designs |>
  kable() |>
  kable_styling(bootstrap_options = c("striped", "hover"))
Design Direction Time Frame Outcome Measure
Cross-sectional Single time point Present Prevalence
Cohort Forward in time Prospective Incidence/Relative Risk
Case-control Backward in time Retrospective Odds Ratio
Figure 13.1: Comparison of epidemiological study designs

13.4 Cross-sectional Study

A cross-sectional study is carried out at a single point in time. A health survey is a type of cross-sectional study where the aim is to describe health behaviours or health status in a large sample of the population.

A cross-sectional study is suitable for estimating the prevalence of a condition in the population.

Definition

Prevalence is the proportion (or percent) of individuals with a particular condition in the population at a point in time.

13.5 Cohort Study

A cohort study takes a group of individuals and follows them forward in time. These studies are usually prospective.

The aim is to assess whether exposure to a particular factor affects the incidence of disease in the future.

Definitions

Incidence is the number of new cases of a condition occurring in a population over a set time period.

Incidence rate is the number of new cases divided by the person-time at risk, and is usually expressed in terms of person-years.

Analysis of Cohort Studies

The analysis of cohort studies can be summarised by the ratio of incidence rates in the exposed and non-exposed groups (incidence rate ratios or relative risk).

code
cohort_table <- tibble(
  ` ` = c("Exposed: Yes", "Exposed: No", "**Total**"),
  `Disease: Yes` = c("a", "c", "a + c"),
  `Disease: No` = c("b", "d", "b + d"),
  `Total` = c("a + b", "c + d", "n"),
  `Incidence Rate` = c("a / (a + b)", "c / (c + d)", "")
)

cohort_table |>
  kable(escape = FALSE) |>
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 13.1: Structure of a cohort study analysis table
Disease: Yes Disease: No Total Incidence Rate
Exposed: Yes a b a + b a / (a + b)
Exposed: No c d c + d c / (c + d)
**Total** a + c b + d n

Relative Risk

The relative risk (RR) indicates the increased (or decreased) risk of disease associated with exposure to the factor of interest.

\[RR = \frac{a/(a+b)}{c/(c+d)}\]

Relative Risk Interpretation
> 1 An increased risk in the exposed group
= 1 Risk is the same in the exposed and unexposed groups
< 1 A reduced risk in the exposed group

A relative risk of one indicates that the risk is the same in the exposed and unexposed groups. A relative risk greater than one indicates that there is an increased risk in the exposed group compared with the unexposed group; a relative risk less than one indicates a reduction in the risk of disease in the exposed group.

Worked Example: Serum Ferritin and Cancer Mortality

A cohort study investigated whether elevated serum ferritin levels (a marker of iron overload) were associated with increased cancer mortality. Researchers followed 1,420 patients for 5 years after measuring their baseline serum ferritin levels.

code
ferritin_cohort <- tibble(
  `Serum Ferritin` = c("High (>400 μg/L)", "Normal (≤400 μg/L)", "**Total**"),
  `Cancer Death` = c("48", "82", "**130**"),
  `Alive` = c("352", "938", "**1290**"),
  `Total` = c("**400**", "**1020**", "**1420**"),
  `Incidence Rate` = c("48/400 = 0.120", "82/1020 = 0.080", "")
)

ferritin_cohort |>
  kable(escape = FALSE) |>
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 13.2: Cohort study of serum ferritin and cancer mortality
Serum Ferritin Cancer Death Alive Total Incidence Rate
High (>400 μg/L) 48 352 **400** 48/400 = 0.120
Normal (≤400 μg/L) 82 938 **1020** 82/1020 = 0.080
**Total** **130** **1290** **1420**

Calculating the Relative Risk

code
# Define the values from the 2×2 table
exposed_disease <- 48      # a: High ferritin with cancer death
exposed_no_disease <- 352  # b: High ferritin, alive
unexposed_disease <- 82    # c: Normal ferritin with cancer death
unexposed_no_disease <- 938 # d: Normal ferritin, alive

# Calculate incidence rates
ir_exposed <- exposed_disease / (exposed_disease + exposed_no_disease)
ir_unexposed <- unexposed_disease / (unexposed_disease + unexposed_no_disease)

# Calculate relative risk
rr <- ir_exposed / ir_unexposed

\[RR = \frac{48/400}{82/1020} = \frac{0.12}{0.08} = 1.49\]

The relative risk of 1.49 indicates that patients with high serum ferritin levels have a 1.49 times higher risk of cancer death compared to those with normal ferritin levels over the 5-year follow-up period.

Interpretation

Since RR = 1.49 > 1, there is an increased risk in the exposed group (high ferritin). Specifically:

  • The risk of cancer death in the high ferritin group is 12%
  • The risk of cancer death in the normal ferritin group is 8%
  • Patients with high ferritin are 49% more likely to die from cancer

If a 95% confidence interval for this RR does not include 1.0, we would conclude the association is statistically significant.

13.6 Case-Control Study

A case-control study compares the characteristics of a group of patients with a particular disease (the cases) to a group of individuals without the disease (the controls), to see whether exposure to a factor occurred more or less frequently in the cases than the controls.

Because patients are selected on the basis of their disease status, it is not possible to estimate the risk of disease. For cases and controls we can estimate the odds of being exposed to the risk factor.

code
cc_table <- tibble(
  ` ` = c("Exposed: Yes", "Exposed: No", "**Total**", "Odds of exposure"),
  `Case (Disease)` = c("a", "c", "a + c", "a / c"),
  `Control (No Disease)` = c("b", "d", "b + d", "b / d"),
  `Total` = c("a + b", "c + d", "n", "")
)

cc_table |>
  kable(escape = FALSE) |>
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 13.3: Structure of a case-control study analysis table
Case (Disease) Control (No Disease) Total
Exposed: Yes a b a + b
Exposed: No c d c + d
**Total** a + c b + d n
Odds of exposure a / c b / d

Odds Ratio

The odds ratio (OR) gives an indication of the increased (or decreased) odds associated with exposure to the factor of interest.

\[OR = \frac{a/c}{b/d} = \frac{ad}{bc}\]

Odds Ratio Interpretation
< 1 A reduced odds of disease in the exposed group
= 1 Odds is the same in the exposed and unexposed groups
> 1 An increased odds of disease in the exposed group

An odds ratio of one indicates that the odds is the same in the exposed and unexposed groups; an odds ratio greater than one indicates that the odds of disease is greater in the exposed group than in the unexposed group.

13.7 Worked Example: Oral Contraceptives and Breast Cancer

code
oc_table <- tibble(
  `Oral contraceptives` = c("Ever used", "Never used", "**Total**"),
  `Case (Breast Cancer)` = c("537", "639", "**1176**"),
  `Control` = c("554", "622", "**1176**")
)

oc_table |>
  kable(escape = FALSE) |>
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 13.4: Case-control study of oral contraceptives and breast cancer
Oral contraceptives Case (Breast Cancer) Control
Ever used 537 554
Never used 639 622
**Total** **1176** **1176**

The cases were women recently diagnosed with breast cancer in a certain hospital. The controls were women inpatients in the same hospital.

Calculating the Odds Ratio

code
a <- 537  # Cases exposed
b <- 554  # Controls exposed
c <- 639  # Cases not exposed
d <- 622  # Controls not exposed

or <- (a * d) / (b * c)

\[OR = \frac{537/639}{554/622} = \frac{537 \times 622}{554 \times 639} = 0.94\]

The OR < 1 and this indicates that the odds of breast cancer patients using contraceptives is 6% (1 - 0.94) smaller than among controls.

Another interpretation is that the odds of contraceptive users developing breast cancer is 6% smaller compared to those who do not use contraceptives.

13.8 Pros and Cons of Study Designs

Case-Control Studies

Advantages:

  • Useful for investigating rare diseases
  • Relatively quick, cheap, and easy to perform
  • A wide range of risk factors can be investigated in each study
  • There is no loss to follow-up

Disadvantages:

  • Selection of appropriate controls can be difficult
  • Not efficient when exposures are rare
  • Cannot be used to establish incidence (retrospective nature)
  • Subject to recall bias and data inaccuracy
  • Cannot infer causation if onset of disease preceded exposure

Cohort Studies

Advantages:

  • Data recording tends to be more accurate
  • Multiple outcomes can be studied
  • Incidence rates can be established
  • The time sequence of events can be assessed
  • Can study exposure to factors that are rare
  • Reduced recall and selection bias compared with case-control studies

Disadvantages:

  • Need to follow up subjects over a long period of time
  • Expensive
  • Prone to subjects dropping out (loss to follow-up)
  • Not efficient for rare diseases
  • Difficult to maintain consistency of measurements over time

13.9 Summary

Study Design Direction Main Measure Best For
Cross-sectional Single time point Prevalence Describing current health status
Cohort Forward (prospective) Relative Risk Assessing incidence and causation
Case-control Backward (retrospective) Odds Ratio Investigating rare diseases
Measure Formula Interpretation
Prevalence Cases / Population Proportion with disease at a point in time
Incidence New cases / Person-time at risk Rate of new cases over time
Relative Risk Riskexposed / Riskunexposed How much more likely disease is in exposed group
Odds Ratio (a×d) / (b×c) How much higher odds of exposure in cases vs controls