Intro to Computational Biology

Course information

BIOS/BCB 784

Instructor details:

Assoc. Prof. Michael Love, love [at] unc [dot] edu

Course time and place:

Fall 2026
8/17/2026-12/11/2025
Tue/Thur 12:30-1:45pm
Michael Hooker Research Center room 0003
https://sph.unc.edu/room/0003-0004-mhrc/

Previous year course details:

2024

Pre-reqs and expectations:

This course makes extensive use of R and assumes basic familiarity with base R (not packages) as a prerequisite. A self-quiz is available here, with answers provided here. You can also find a list of base R functions that one should be familiar with.
This course also assumes basic familiarity with statistical concepts such as parameter inference, hypothesis testing, basic probability (e.g. conditional probability and conditional expectation). For BCB students, BCB 720, which covers statistical inference, is suitable as a pre-requisite for BIOS/BCB 784. For other students not sure about pre-requisites, email me with any questions with BIOS 784 in the subject line.

Schedule and course notes

For Rmd or qmd files, go to the course repo and navigate the directories, or clone the repo and navigate within RStudio.

Week	Topic	Dir.	HW	HTML	Title
	Data analysis
Aug 19	Biological intro / GitHub	`-`	HW0	github	RStudio, git, and GitHub
Aug 21	Exploring bio data	`eda`		EDA	Exploratory data analysis
				brain RNA	Exploring brain RNA
Aug 26 & 28	R/Bioconductor I	`bioc`	HW1	objects	Bioc data objects
				ranges	Genomic ranges
				GRL	GRangesList: lists of ranges
Sep 2	Labor day
Sep 4	“tidyomics”	`TO@GH`		tidy intro	Tidiness-in-Bioconductor intro
				tidy ranges	Tidy ranges tutorial
Sep 9 & 11	R/Bioconductor II	`bioc`	HW2	anno	Accessing annotations
				strings	Manipulating DNA strings
Sep 16 & 18	Distances & norm. I	`dist`	HW3	distances	Distances in high dimensions
				transform_clust	Transformations and clustering
				vst_math	VST math
Sep 23	Wellness day
Sep 25	Spatial data analysis (guest lecture)				Peyton Kuhlers (Hoadley & Raab labs, UNC)
Sep 30	BCB retreat
Oct 2	Sources of technical bias
Oct 7 & 9	Distances & norm. II	`dist`		batch	Batch effects and sources
				sva	Surrogate variable analysis
					Batch effect solutions
				ruv script	RUV and friends
Oct 7	Midterm assigned, due Oct 21
	Data modeling
Oct 14 & 16	Hierarchical models	`hier`		hierarchical	Hierarchical models
				jamesstein	James-Stein estimator
Oct 21 & 23	Models and EM	`model`		EM	Expectation maximization
				motif	EM for finding DNA motifs
Oct 28 & 30	Markov models	`markov`	HW4	hmm	Hidden Markov Models
Nov 4	HMM: Baum-Welch
Nov 6	ASHG
Nov 11	Biostatistics 75th event
Nov 13	Class time for projects
Nov 18 & 20	Class time for projects
Nov 25	Gene regulatory networks	`net`		network	Network analysis
Nov 27	Thanksgiving
Dec 2 & 4	Final project presentations
Extra lectures
	multiple testing	`multiple`		multtest	FDR and Benjamini-Hochberg
				localfdr	Local false discovery rate

Reading list

What is the role of the computational biologist / statistician?
- All biology is computational biology Florian Markowetz
- Questions, Answers and Statistics Deborah Nolan
- 50 Years of Data Science David Donoho
- The Future of Data Analysis John Tukey (this article, discussed by Donoho, is from 1962)
- Ten Simple Rules for Effective Statistical Practice Kass, Caffo, Davidian, Meng, Yu, and Reid
- Statistical Modeling: The Two Cultures Leo Breiman
Exploratory data analysis
Bioconductor
- Orchestrating high-throughput genomic analysis with Bioconductor Huber et al
Distances and normalization
- Differential expression analysis for sequence count data Simon Anders and Wolfgang Huber
- Tackling the widespread and critical impact of batch effects in high-throughput data Leek et al
- Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis Jeffrey Leek and John Storey
- Normalization of RNA-seq data using factor analysis of control genes or samples Risso et al
- Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses Stegle et al
- More on factor analysis methods:
- RUV - Using control genes to correct for unwanted variation in microarray data Gagnon-Bartsch and Speed et al., 2012
- RUV - Removing Unwanted Variation from High Dimensional Data with Negative Controls Gagnon-Bartsch et al., 2013
- RUVSeq - Normalization of RNA-seq data using factor analysis of control genes or samples Risso et al., 2014
- ZINB-WaVE - A general and flexible method for signal extraction from single-cell RNA-seq data Risso and Perraudeau et al., 2018
- NewWave - A scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell RNA-seq data Agostinis et al., 2022
- GLM-PCA - Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model Townes et al., 2019
Multiple testing
- Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Yoav Benjamini and Yosef Hochberg
- A direct approach to false discovery rates John Storey
- Statistical significance for genomewide studies John Storey and Robert Tibshirani
- Large-scale simultaneous hypothesis testing Bradley Efron
- Empirical Bayes Analysis of a Microarray Experiment Efron et al
- Measuring reproducibility of high-throughput experiments Li et al
Expectation maximization
- What is the expectation maximization algorithm? Chuong B Do and Serafim Batzoglou
- Gaussian mixture models and the EM algorithm Ramesh Sridharan
- EM algorithm notes Andrew Ng
- MEME: discovering and analyzing DNA and protein sequence motifs Bailey et al
Hierarchical models
- Linear models and empirical Bayes methods for assessing differential expression in microarray experiments Gordon Smyth
- Analyzing ’omics data using hierarchical models Hongkai Ji and X Shirley Liu
- Stein’s Paradox in Statistics Bradley Efron and Carl Morris
- Stein’s estimation rule and its competitors - an empirical Bayes approach Bradley Efron and Carl Morris
Signal processing
- An Introduction to Hidden Markov Models Lawrence Rabiner and Biing-Hwang Juang
- Hidden Markov models approach to the analysis of array CGH data Fridlyand et al
Network analysis
- Static And Dynamic DNA Loops Form AP-1 Bound Activation Hubs During Macrophage Development Phanstiel et al

Resources

Online R Classes and Resources
Rafael Irizarry and Michael Love, “Data Analysis for the Life Sciences” Free PDF, HTML
Kasper Hansen, “Bioconductor for Genomic Data Science”
Aaron Quinlan, “Applied Computational Genomics” (Slides)
Jennifer Bryan et al, Stat 545
Florian Markowetz, “You Are Not Working for Me; I Am Working with You”
Tips to succeed in Computational Biology research

Some R resources

This page was last updated on 05/13/2026.

Intro to Computational Biology - UNC BIOS/BCB 784

Michael Love (he/him)

Course information

Schedule and course notes

Reading list

Resources