BIOS/BCB 784
Fall 2024 – 1:25-2:40 MoWe – McGavran 2306
This course makes extensive use of R and assumes basic familiarity with base R (not packages) as a prerequisite. A self-quiz is available here, with answers provided here. You can also find a list of base R functions that one should be familiar with.
For 2024 BCB students: BCB 720 (background on statistical inference, i.e. working with the likelihood for parameter inference, conditional probabilities) is suitable as a pre-requisite for BIOS/BCB 784.
For Rmd
files, go to the course repo and navigate the directories, or best of all to clone the repo and navigate within RStudio.
Week | Topic | Dir. | HW | HTML | Title |
---|---|---|---|---|---|
Bio Intro / GitHub | - |
github | RStudio, git, and GitHub | ||
Simple EDA | eda |
EDA | Exploratory data analysis | ||
NAs | Missing values in R | ||||
brain RNA | Exploring brain RNA | ||||
Bioconductor I | bioc |
objects | Bioc data objects | ||
ranges | Genomic ranges | ||||
GRL | GRangesList: lists of ranges | ||||
Bioconductor II | anno | Accessing annotations | |||
strings | Manipulating DNA strings | ||||
Multiple testing | test |
multtest | FDR and Benjamini-Hochberg | ||
localfdr | Local false discovery rate | ||||
IDR | Irreproducible discovery rate | ||||
Distances & norm. I | dist |
distances | Distances in high dimensions | ||
hclust | Hierarchical clustering | ||||
Models and EM | model |
EM | Expectation maximization | ||
motif | EM for finding DNA motifs | ||||
ChIP-seq | (Slides on Sakai) | ||||
Motifs part II | (In-class EM notes posted to GH) | ||||
Distances & norm. II | dist |
batch | Batch effects and sources | ||
sva | Surrogate variable analysis | ||||
Batch effect solutions | |||||
Hierarchical models | hier |
hierarchical | Hierarchical models | ||
jamesstein | James-Stein estimator app | ||||
Signal processing | signal |
hmm | Hidden Markov Models | ||
Tidy genomics | - |
tidy | Tidy ranges tutorial | ||
Network analysis | net |
network | Network analysis |
Some R resources
This is not nearly a complete list of topics in computational biology. The students taking the course are mostly graduate students in biostatistics, who have statistical background but not much exposure to genomic or biological datasets. Classic computational biology topics, such as alignment algorithms or molecular dynamics, are not covered, but instead the focus is on exploring genomic datasets and introducing the key statistical models that flourish in the high throughput setting (normalization, false discovery rate calculation, EM algorithm, hierarchical models, HMM, etc.). The course also focuses on R/Bioconductor, as this is a familiar tool for most of the students, and allows them to jump in to the data analysis. The goal is that exposure to these topics and these datasets will allow them to more effectively read the literature and pursue topics in biology and biomedical research.
This page was last updated on 03/20/2024.