Epidemiology is the study of the occurrence and determinants of ill health in the population.
Epidemiological studies assess the relationship between factors of interest (these may include biological, social, behavioural, and environmental) and the occurrence of disease in the population (for example the incidence of cancer, heart disease, hypertension).
Epidemiological studies are mostly observational in design (in contrast to experimental studies which involve interventions to affect an outcome).
13.2 Routinely Collected Data
Routinely collected administrative data such as death certification, cancer registration and hospital discharge records can be used for epidemiological purposes and provide insights into the health of the population.
Death certification provides estimates of the annual death rates in the population. These rates are usually age standardised to take account of differences in the age structure of the population over time or between places.
13.3 Types of Observational Studies
Observational epidemiological studies fall into three broad groups:
Cross-sectional studies
Cohort studies
Case-control studies
The unit of observation is usually individuals but can also be groups of individuals (for example – different populations defined by a shared geography – these are called ecological studies).
Studies in which the health event of interest has yet to happen are called prospective
Studies in which the health event has already occurred are called retrospective
code
# Create a visual comparison of study designsstudy_designs <-tibble(Design =c("Cross-sectional", "Cohort", "Case-control"),Direction =c("Single time point", "Forward in time", "Backward in time"),`Time Frame`=c("Present", "Prospective", "Retrospective"),`Outcome Measure`=c("Prevalence", "Incidence/Relative Risk", "Odds Ratio"))study_designs |>kable() |>kable_styling(bootstrap_options =c("striped", "hover"))
Design
Direction
Time Frame
Outcome Measure
Cross-sectional
Single time point
Present
Prevalence
Cohort
Forward in time
Prospective
Incidence/Relative Risk
Case-control
Backward in time
Retrospective
Odds Ratio
Figure 13.1: Comparison of epidemiological study designs
13.4 Cross-sectional Study
A cross-sectional study is carried out at a single point in time. A health survey is a type of cross-sectional study where the aim is to describe health behaviours or health status in a large sample of the population.
A cross-sectional study is suitable for estimating the prevalence of a condition in the population.
Definition
Prevalence is the proportion (or percent) of individuals with a particular condition in the population at a point in time.
13.5 Cohort Study
A cohort study takes a group of individuals and follows them forward in time. These studies are usually prospective.
The aim is to assess whether exposure to a particular factor affects the incidence of disease in the future.
Definitions
Incidence is the number of new cases of a condition occurring in a population over a set time period.
Incidence rate is the number of new cases divided by the person-time at risk, and is usually expressed in terms of person-years.
Analysis of Cohort Studies
The analysis of cohort studies can be summarised by the ratio of incidence rates in the exposed and non-exposed groups (incidence rate ratios or relative risk).
Table 13.1: Structure of a cohort study analysis table
Disease: Yes
Disease: No
Total
Incidence Rate
Exposed: Yes
a
b
a + b
a / (a + b)
Exposed: No
c
d
c + d
c / (c + d)
**Total**
a + c
b + d
n
Relative Risk
The relative risk (RR) indicates the increased (or decreased) risk of disease associated with exposure to the factor of interest.
\[RR = \frac{a/(a+b)}{c/(c+d)}\]
Relative Risk
Interpretation
> 1
An increased risk in the exposed group
= 1
Risk is the same in the exposed and unexposed groups
< 1
A reduced risk in the exposed group
A relative risk of one indicates that the risk is the same in the exposed and unexposed groups. A relative risk greater than one indicates that there is an increased risk in the exposed group compared with the unexposed group; a relative risk less than one indicates a reduction in the risk of disease in the exposed group.
Worked Example: Serum Ferritin and Cancer Mortality
A cohort study investigated whether elevated serum ferritin levels (a marker of iron overload) were associated with increased cancer mortality. Researchers followed 1,420 patients for 5 years after measuring their baseline serum ferritin levels.
Table 13.2: Cohort study of serum ferritin and cancer mortality
Serum Ferritin
Cancer Death
Alive
Total
Incidence Rate
High (>400 μg/L)
48
352
**400**
48/400 = 0.120
Normal (≤400 μg/L)
82
938
**1020**
82/1020 = 0.080
**Total**
**130**
**1290**
**1420**
Calculating the Relative Risk
code
# Define the values from the 2×2 tableexposed_disease <-48# a: High ferritin with cancer deathexposed_no_disease <-352# b: High ferritin, aliveunexposed_disease <-82# c: Normal ferritin with cancer deathunexposed_no_disease <-938# d: Normal ferritin, alive# Calculate incidence ratesir_exposed <- exposed_disease / (exposed_disease + exposed_no_disease)ir_unexposed <- unexposed_disease / (unexposed_disease + unexposed_no_disease)# Calculate relative riskrr <- ir_exposed / ir_unexposed
The relative risk of 1.49 indicates that patients with high serum ferritin levels have a 1.49 times higher risk of cancer death compared to those with normal ferritin levels over the 5-year follow-up period.
Interpretation
Since RR = 1.49 > 1, there is an increased risk in the exposed group (high ferritin). Specifically:
The risk of cancer death in the high ferritin group is 12%
The risk of cancer death in the normal ferritin group is 8%
Patients with high ferritin are 49% more likely to die from cancer
If a 95% confidence interval for this RR does not include 1.0, we would conclude the association is statistically significant.
13.6 Case-Control Study
A case-control study compares the characteristics of a group of patients with a particular disease (the cases) to a group of individuals without the disease (the controls), to see whether exposure to a factor occurred more or less frequently in the cases than the controls.
Because patients are selected on the basis of their disease status, it is not possible to estimate the risk of disease. For cases and controls we can estimate the odds of being exposed to the risk factor.
code
cc_table <-tibble(``=c("Exposed: Yes", "Exposed: No", "**Total**", "Odds of exposure"),`Case (Disease)`=c("a", "c", "a + c", "a / c"),`Control (No Disease)`=c("b", "d", "b + d", "b / d"),`Total`=c("a + b", "c + d", "n", ""))cc_table |>kable(escape =FALSE) |>kable_styling(bootstrap_options =c("striped", "hover"))
Table 13.3: Structure of a case-control study analysis table
Case (Disease)
Control (No Disease)
Total
Exposed: Yes
a
b
a + b
Exposed: No
c
d
c + d
**Total**
a + c
b + d
n
Odds of exposure
a / c
b / d
Odds Ratio
The odds ratio (OR) gives an indication of the increased (or decreased) odds associated with exposure to the factor of interest.
\[OR = \frac{a/c}{b/d} = \frac{ad}{bc}\]
Odds Ratio
Interpretation
< 1
A reduced odds of disease in the exposed group
= 1
Odds is the same in the exposed and unexposed groups
> 1
An increased odds of disease in the exposed group
An odds ratio of one indicates that the odds is the same in the exposed and unexposed groups; an odds ratio greater than one indicates that the odds of disease is greater in the exposed group than in the unexposed group.
13.7 Worked Example: Oral Contraceptives and Breast Cancer