Course information


Fall 2024 – 1:25-2:40 MoWe – McGavran 2306

2022 syllabus

This course makes extensive use of R and assumes basic familiarity with base R (not packages) as a prerequisite. A self-quiz is available here, with answers provided here. You can also find a list of base R functions that one should be familiar with.

For 2024 BCB students: BCB 720 (background on statistical inference, i.e. working with the likelihood for parameter inference, conditional probabilities) is suitable as a pre-requisite for BIOS/BCB 784.

Schedule and course notes

For Rmd files, go to the course repo and navigate the directories, or best of all to clone the repo and navigate within RStudio.

Week Topic Dir. HW HTML Title
Bio Intro / GitHub - github RStudio, git, and GitHub
Simple EDA eda EDA Exploratory data analysis
NAs Missing values in R
brain RNA Exploring brain RNA
Bioconductor I bioc objects Bioc data objects
ranges Genomic ranges
GRL GRangesList: lists of ranges
Bioconductor II anno Accessing annotations
strings Manipulating DNA strings
Multiple testing test multtest FDR and Benjamini-Hochberg
localfdr Local false discovery rate
IDR Irreproducible discovery rate
Distances & norm. I dist distances Distances in high dimensions
hclust Hierarchical clustering
Models and EM model EM Expectation maximization
motif EM for finding DNA motifs
ChIP-seq (Slides on Sakai)
Motifs part II (In-class EM notes posted to GH)
Distances & norm. II dist batch Batch effects and sources
sva Surrogate variable analysis
Batch effect solutions
Hierarchical models hier hierarchical Hierarchical models
jamesstein James-Stein estimator app
Signal processing signal hmm Hidden Markov Models
Tidy genomics - tidy Tidy ranges tutorial
Network analysis net network Network analysis

Reading list


Some R resources

Wait, how come ___ is missing?

This is not nearly a complete list of topics in computational biology. The students taking the course are mostly graduate students in biostatistics, who have statistical background but not much exposure to genomic or biological datasets. Classic computational biology topics, such as alignment algorithms or molecular dynamics, are not covered, but instead the focus is on exploring genomic datasets and introducing the key statistical models that flourish in the high throughput setting (normalization, false discovery rate calculation, EM algorithm, hierarchical models, HMM, etc.). The course also focuses on R/Bioconductor, as this is a familiar tool for most of the students, and allows them to jump in to the data analysis. The goal is that exposure to these topics and these datasets will allow them to more effectively read the literature and pursue topics in biology and biomedical research.

This page was last updated on 06/03/2024.