The goals of this class will be to:
Topics will include gaining proficiency with R, data wrangling, data quality control and cleaning, data visualization, exploratory data analysis, modeling, collaboration, reproducible research and excellent communication.
The course will also develop familiarity with another programming language—Python—and several software tools for data science best practices, such as Git, Docker, Jupyter, and Make.
The course will emphasize “learning by doing”, with the bulk of the grade coming from several creative data science projects.
By the end of the semester, students will have produced a portfolio of work hosted on Github. The three projects that will serve as the foundation for the portfolio include:
Fall 2019, class will be held on Mondays and Wednesdays from 3:35–4:50 PM. Tuesday afternoons there will be an additional lab period from 2:00 PM–3:00 PM run by the teaching assistant(s). Lab is held in Rosenau Rm 0235. The lab schedule can be found here.
Before class, students will have read or listened to the assigned material (for example, from the textbook R for Data Science). Class time will be spent discussing and practicing new material, and will generally include a quiz.
Lab time will be spent working on the assigned data science projects and reviewing material from class (as needed).
The final exam block for the course is scheduled for Saturday, December 7 at 4:00 PM. We will likely use this time for final project presentations (yes, we have to use that time).
Class Num | Date | Material | Pre-class Prep | Homework Due | Portfolio |
---|---|---|---|---|---|
1 | 8/21/2019 | Intro to course, syllabus, purpose of course, R studio | NA | Quiz | NA |
2 | 8/26/2019 | Github, project demo | NA | Quiz | NA |
3 | 8/28/2019 | ggplot | R4DS Chp. 3 | Quiz | NA |
NA | 9/2/2019 | Labor day: no class | NA | NA | NA |
4 | 9/4/2019 | ggplot, Rmd | R4DS Chp. 27–28 | Quiz | NA |
5 | 9/9/2019 | Project organization | Cookiecutter post | Quiz, hw1-github, ggplot due | Start github repo for project1 with correct directory structure |
6 | 9/11/2019 | Wrangling 1 | R4DS Chp. 5 | Quiz | NA |
7 | 9/16/2019 | Wrangling 2 | R4DS Chps. 12, 13 & 18 | Quiz | NA |
8 | 9/18/2019 | Agile Data Science | Agile Manifesto | Quiz | Project concept README due |
9 | 9/23/2019 | Modeling: clustering | Clustering Post | Quiz, hw2-wrangling due | NA |
10 | 9/25/2019 | Modeling: classification | TDS Classification | Quiz, feedback practice | Project draft due |
11 | 9/30/2019 | Modeling: Parameter Fitting and Optimization | R4DS Chp. 23 | Quiz, hw3-clustering due | NA |
12 | 10/2/2019 | Modeling: Data Prep and Model Validation | Caret Pre-Processing | Quiz | Feedback due |
13 | 10/7/2019 | Project presentations | NA | NA | Final project and return feedback due |
14 | 10/9/2019 | Shiny Dashboards | Shiny 1st Example | Quiz | Start github repo for project2 |
15 | 10/14/2019 | Bash and Remote Servers | Navigation Tutorial | Quiz | Project concept README due |
16 | 10/16/2019 | Bash Scripts | Bash Scripts | Quiz, hw4-shiny due | NA |
17 | 10/21/2019 | Make | Why Use Make | Mid-semester feedback | Project draft due |
18 | 10/23/2019 | Docker | Docker for data science | Quiz, hw5-bash-servers due | Feedback due |
19 | 10/28/2019 | Docker | NA | Quiz | Final project and return feedback due |
20 | 10/30/2019 | Bash, Make, Docker Wrap-up | Enough Docker to be useful | Quiz, hw6-makefiles due | NA |
21 | 11/4/2019 | Introduction to Python | NA | Quiz | Start github repo for project3 |
22 | 11/6/2019 | Pandas 1 | PDSH Chp. 3 | Quiz, hw7-dockerfiles due | NA |
23 | 11/11/2019 | Pandas 2 | NA | Quiz | Project concept README due |
24 | 11/13/2019 | Scikit-Learn 1 | PDSH Chp. 5: Intro to Scikit Learn | Quiz, hw8-python-base due | NA |
25 | 11/18/2019 | Strings and Regex | Python Regex Resource | Quiz | Project draft due |
26 | 11/20/2019 | Python Wrap-up | NA | Quiz, hw9-pandas due | NA |
27 | 11/25/2019 | Data science ethics | NA | Quiz | Feedback due |
NA | 11/27/2019 | Thanksgiving break: No class | NA | NA | NA |
28 | 12/2/2019 | Project presentations | NA | NA | Final project and return feedback due |
29 | 12/4/2019 | Project presentations | NA | NA | NA |
Each project will progress through the following steps:
The grade will be based on the 1) quality of feedback provided to peers, 2) the grader’s review, and 3) the presentation.
Instructions to connect to the VCL. The VCL provides transient, remote compute environments with pre-installed software where students can complete assignments. Because VCL resources are transient, students may wish to connect VCL instances to longer-term storage on Longleaf.
Slack is a communication tool often employed by data science teams in industry. Students will use the course Slack workspace (bios611.slack.com) to ask questions, and anyone (including other students, the professor and the TAs) will be able to post answers publicly, so that everyone can learn together. Slack will also be used for posting relevant and interesting resources related to course content (such as blog posts about data science practices, podcasts, etc.).
The grade will be based on in-class quizzes (20%), homework assignments (30%), projects 1, 2 and 3 (50%).
No late quizes or homework will be accepted. However, to account for the inevitable interruptions of life, each student’s lowest homework grade and two lowest quiz grades will automatically be dropped.
For graduate students, grades will be assigned as follows:
For undergraduates, final grades will be assigned as follows (where “90–95”" includes anything from 90 to 94.99999…):
Because we will be actively practicing coding in class, the student and his/her peers will benefit greatly from consistent attendance.
No late work will be accepted. Final decisions in exceptional circumstances will be made by the professor.
The course final exam is given in compliance with UNC final exam regulations and according to the UNC Final Exam calendar.
The professor reserves to right to make changes to the syllabus, including project due dates and test dates (hurricanes happen…). These changes will be announced as early as possible.
Students are expected to abide by the University Honor Pledge The students can expect the professor to communicate what help is authorized for each assignment.
Specifically, students are encouraged to work together on in-class programming exercises, and to help each other with technical aspects of the homework and projects. However, the work turned in for homeworks and projects must be the student’s own work (the project had to have been conceived by the student, code had to have been written by the student, and communicated by the student).