The goal of this workshop is to introduce core statistical concepts using simple base R syntax and a real clinical dataset from the OncoDataSets package.
Topics covered:
Task: Install and load the required packages.
Suggested commands:
#install.packages("OncoDataSets")
#install.packages("survival")
library(OncoDataSets)
library(survival)
We will use the BreastCancer_df dataset, which contains
demographic, tumour, and survival information for patients with breast
cancer. Paste these commands into your script to rename the dataset and
generate an (artificial) tumour size variable, which would not otherwise
be present in the dataset.
Task: Load the dataset into R.
Suggested command:
data("WBreastCancer_tbl_df")
BreastCancer_df <- WBreastCancer_tbl_df
BreastCancer_df$tumour_size <- abs(round(15 + rnorm(length(BreastCancer_df$age),25,10)
- 0.25*BreastCancer_df$age + BreastCancer_df$histgrad
*-(rgamma(length(BreastCancer_df$histgrad),3)),0))
Suggested commands:
head() str() summary()
Task: Create a histogram of patient age.
Suggested commands:
hist()
Questions:
Task: Create a bar chart showing ER-positive vs ER-negative patients.
Suggested commands:
table() barplot()
Questions:
Task: Visualize tumour size.
Suggested commands:
hist()
Questions:
Is patient age different between ER-positive and ER-negative breast cancer?
Tasks:
Suggested commands:
table() t.test()
Questions:
Is ER status associated with nodal involvement?
Tasks:
Suggested commands:
table() chisq.test()
Questions:
Does age predict tumour size?
Tasks:
Suggested commands:
lm() summary() plot() abline()
Questions:
Does ER status affect overall survival?
Tasks:
Suggested commands:
Surv() survfit() plot() legend()
Questions:
Is the difference in survival between ER groups statistically significant?
Task: Perform a log-rank test.
Suggested command:
survdiff()
Questions: