Teaches important concepts and skills for statistical software development using case studies. After this course, students will have an understanding of the process of statistical software development, knowledge of existing resources for software development, and the ability to produce reliable and efficient statistical software
Dr. Naim Rashid
Associate Professor
Department of
Office: 20-020 Lineberger
email: naim[at]unc.edu
web: https://naimurashid.github.io/
Amber Young (ayoung31 [at] live.unc.edu)
Xuejun Sun (xuejun_sun
[at] unc.edu)
Dr. Rashid’s office hours: (just see me after class)
office hours: TBD
1:25-3:10 MW, RO 228
This class teaches important concepts and skills for statistical computing, numerical optimization, and machine learning using case studies. After this course, students will have a good understanding of the process of producing high-quality and sharable statistical programs (module 1, 3 weeks), algorithms for optimization and numerical integration (module 2, 6 weeks), and will be able to implement and apply some common and powerful machine learning methods (module 3, 3 weeks). Modules 1 and 3 were originally developed by Dr. Michael Love.
The course format will consist of bi-weekly lectures with optional readings (specified in the the lecture notes). Lectures are supplemented with in-class exercises, group activities, weekly homework, and projects.
There are no required texts for this course. Any texts referenced in class lecture are either published online as open-access material or are available as free E-books through UNC libraries.
The instructor reserves the right to make changes to the syllabus, including topics, readings, assignments, and due dates. In particular, availability of guest speakers may alter the course schedule. Any changes will be announced as early as possible. For the most recent session-by- session schedule and assignments, please see the course schedule at the following link: https://biodatascience.github.io/statcomp/
A general overview of topics is provided below.
for writing C++ code called from RThe class will be taught through three modules. One final project (30%), an initial proposal for the final project (10%), a final presentation (20%), class participation (10%), and weekly homework assignments (30%) will make up the final grade for the course. The initial proposal (due March 10th) can be resubmitted for a regrade as many times as desired up until March 31st.
At the beginning of the course, students will learn to create an R package which they will update as the class progresses, implementing the methods they learn in each module. This R package will be applied to each project and homework assignment. Each student’s R package will be hosted and iteratively updated on GitHub. Homework and projects will be similarly submitted to course instructors through GitHub Classrooms.
Each student will also be assigned a Virtual Machine (VM) that contains all necessary software to run the course notes, examples, homework assignments, and class projects. Students are also encouraged to install required course software locally on their own computers, where the VM provides an alternative if installation issues arise. Instructions for how to log into the VM will be sent out the first week of classes. The VM instances are hosted on bios department servers, and any issues should be forwarded to bios IT.
The School of Public Health grading system is designed so that the mode of the grading distribution is P. The last graded assignment will be due on the last week of regular classes.
Homework policy - students are allowed and encouraged to talk about ideas and approaches to homework problems in groups, though students should write up the code for the assignments independently. Copying someone else’s work is always an honor code violation. Expulsion from the university is possible if the honor code is violated, and receiving 0% on the assignment in question is a certainty. Late work will be penalized at 10% per week late.
