Course information

The goals of this class will be to:

  1. Achieve proficiency in R (specifically the “Tidyverse”)
  2. Gain familiarity with a suite of data science tools
  3. Master the practices of good data science

Topics will include gaining proficiency with R, data wrangling, data quality control and cleaning, data visualization, exploratory data analysis, modeling, collaboration, reproducible research and excellent communication.

The course will also develop familiarity with another programming language—Python—and several software tools for data science best practices, such as Git, Docker, Jupyter, and Make.

The course will emphasize “learning by doing”, with the bulk of the grade coming from several creative data science projects.

Portfolio

By the end of the semester, students will have produced a portfolio of work hosted on Github. The three projects that will serve as the foundation for the portfolio include:

  1. A “complete” analysis in R, demonstrating data wrangling, modeling, visualization and delivery using R markdown.
  2. An interactive dashboard built with Shiny.
  3. A hybrid analysis using R, Python, Make and Docker.

Course Schedule

Fall 2019, class will be held on Mondays and Wednesdays from 3:35–4:50 PM. Tuesday afternoons there will be an additional lab period from 2:00 PM–3:00 PM run by the teaching assistant(s). Lab is held in Rosenau Rm 0235. The lab schedule can be found here.

Before class, students will have read or listened to the assigned material (for example, from the textbook R for Data Science). Class time will be spent discussing and practicing new material, and will generally include a quiz.

Lab time will be spent working on the assigned data science projects and reviewing material from class (as needed).

The final exam block for the course is scheduled for Saturday, December 7 at 4:00 PM. We will likely use this time for final project presentations (yes, we have to use that time).

Class Num Date Material Pre-class Prep Homework Due Portfolio
1 8/21/2019 Intro to course, syllabus, purpose of course, R studio NA Quiz NA
2 8/26/2019 Github, project demo NA Quiz NA
3 8/28/2019 ggplot R4DS Chp. 3 Quiz NA
NA 9/2/2019 Labor day: no class NA NA NA
4 9/4/2019 ggplot, Rmd R4DS Chp. 27–28 Quiz NA
5 9/9/2019 Project organization Cookiecutter post Quiz, hw1-github, ggplot due Start github repo for project1 with correct directory structure
6 9/11/2019 Wrangling 1 R4DS Chp. 5 Quiz NA
7 9/16/2019 Wrangling 2 R4DS Chps. 12, 13 & 18 Quiz NA
8 9/18/2019 Agile Data Science Agile Manifesto Quiz Project concept README due
9 9/23/2019 Modeling: clustering Clustering Post Quiz, hw2-wrangling due NA
10 9/25/2019 Modeling: classification TDS Classification Quiz, feedback practice Project draft due
11 9/30/2019 Modeling: Parameter Fitting and Optimization R4DS Chp. 23 Quiz, hw3-clustering due NA
12 10/2/2019 Modeling: Data Prep and Model Validation Caret Pre-Processing Quiz Feedback due
13 10/7/2019 Project presentations NA NA Final project and return feedback due
14 10/9/2019 Shiny Dashboards Shiny 1st Example Quiz Start github repo for project2
15 10/14/2019 Bash and Remote Servers Navigation Tutorial Quiz Project concept README due
16 10/16/2019 Bash Scripts Bash Scripts Quiz, hw4-shiny due NA
17 10/21/2019 Make Why Use Make Mid-semester feedback Project draft due
18 10/23/2019 Docker Docker for data science Quiz, hw5-bash-servers due Feedback due
19 10/28/2019 Docker NA Quiz Final project and return feedback due
20 10/30/2019 Bash, Make, Docker Wrap-up Enough Docker to be useful Quiz, hw6-makefiles due NA
21 11/4/2019 Introduction to Python NA Quiz Start github repo for project3
22 11/6/2019 Pandas 1 PDSH Chp. 3 Quiz, hw7-dockerfiles due NA
23 11/11/2019 Pandas 2 NA Quiz Project concept README due
24 11/13/2019 Scikit-Learn 1 PDSH Chp. 5: Intro to Scikit Learn Quiz, hw8-python-base due NA
25 11/18/2019 Strings and Regex Python Regex Resource Quiz Project draft due
26 11/20/2019 Python Wrap-up NA Quiz, hw9-pandas due NA
27 11/25/2019 Data science ethics NA Quiz Feedback due
NA 11/27/2019 Thanksgiving break: No class NA NA NA
28 12/2/2019 Project presentations NA NA Final project and return feedback due
29 12/4/2019 Project presentations NA NA NA

Projects

Each project will progress through the following steps:

  1. Students will submit an initial proposal “README” file describing the project
  2. Students will work individually to produce a first draft and submit it on Github
  3. Each student will review a handful of project drafts and provide thoughtful feedback
  4. Students will rate the quality of the feedback received from their peers
  5. Students will submit a final project draft
  6. Graders will review the project for high level organization and readability
  7. Students will give a short presentation about their project (only projects 1 and 3)

The grade will be based on the 1) quality of feedback provided to peers, 2) the grader’s review, and 3) the presentation.

Project 1 (rubric)

  1. A “complete” analysis in R, demonstrating data wrangling, modeling, visualization and delivery using R markdown.

Project 2 (rubric)

  1. An interactive dashboard built with Shiny.

Project 3 (rubric)

  1. A polyglot analysis using R, Python, Make and Docker.

Connecting to UNC’s Virtual Computing Lab (VCL)

Instructions to connect to the VCL. The VCL provides transient, remote compute environments with pre-installed software where students can complete assignments. Because VCL resources are transient, students may wish to connect VCL instances to longer-term storage on Longleaf.

Slack

Slack is a communication tool often employed by data science teams in industry. Students will use the course Slack workspace (bios611.slack.com) to ask questions, and anyone (including other students, the professor and the TAs) will be able to post answers publicly, so that everyone can learn together. Slack will also be used for posting relevant and interesting resources related to course content (such as blog posts about data science practices, podcasts, etc.).

Grades

The grade will be based on in-class quizzes (20%), homework assignments (30%), projects 1, 2 and 3 (50%).

No late quizes or homework will be accepted. However, to account for the inevitable interruptions of life, each student’s lowest homework grade and two lowest quiz grades will automatically be dropped.

For graduate students, grades will be assigned as follows:

For undergraduates, final grades will be assigned as follows (where “90–95”" includes anything from 90 to 94.99999…):


Course Policies

Because we will be actively practicing coding in class, the student and his/her peers will benefit greatly from consistent attendance.

No late work will be accepted. Final decisions in exceptional circumstances will be made by the professor.

The course final exam is given in compliance with UNC final exam regulations and according to the UNC Final Exam calendar.

Syllabus Changes

The professor reserves to right to make changes to the syllabus, including project due dates and test dates (hurricanes happen…). These changes will be announced as early as possible.

Honor Code

Students are expected to abide by the University Honor Pledge The students can expect the professor to communicate what help is authorized for each assignment.

Specifically, students are encouraged to work together on in-class programming exercises, and to help each other with technical aspects of the homework and projects. However, the work turned in for homeworks and projects must be the student’s own work (the project had to have been conceived by the student, code had to have been written by the student, and communicated by the student).