Preview Workflow

Viewing: ST 563 : Introduction to Statistical Learning

Last approved: Mon, 10 Oct 2016 14:30:56 GMT

Last edit: Mon, 10 Oct 2016 14:30:52 GMT

Change Type
Major
ST (Statistics)
563
032379
Dual-Level Course
No
Cross-listed Course
No
Introduction to Statistical Learning
Statistical Learning
College of Sciences
Statistics (17ST)
Term Offering
Spring and Summer
Offered Every Year
Spring 2016
Previously taught as Special Topics?
Yes
1
 
Course Prefix/NumberSemester/Term OfferedEnrollment
ST 590Spring 201616
Course Delivery
Distance Education (DELTA)

Grading Method
Graded/Audit
3
15
Contact Hours
(Per Week)
Component TypeContact Hours
Lecture3
Course Attribute(s)


If your course includes any of the following competencies, check all that apply.
University Competencies

Course Is Repeatable for Credit
No
 
 
Howard Bondell
Professor
Full

Open when course_delivery = campus OR course_delivery = blended OR course_delivery = flip
Open when course_delivery = distance OR course_delivery = online OR course_delivery = remote
Delivery FormatPer SemesterPer SectionMultiple Sections?Comments
LEC2020No
Prerequisite: ST 512 or ST 514 or ST 515 or ST 517
Is the course required or an elective for a Curriculum?
No
This course will introduce common statistical learning methods for supervised and unsupervised predictive learning in both the regression and classification settings. Topics covered will include linear and polynomial regression, logistic regression and discriminant analysis, cross-validation and the bootstrap, model selection and regularization methods, splines and generalized additive models, principal components, hierarchical clustering, nearest neighbor, kernel, and tree-based methods, ensemble methods, boosting, and support-vector machines.

Statistical learning methods provide a toolbox from which to pull when faced with applied predictive problems.In today's data-driven society. These methods are critical in the new age of Data Science. The Data Scientist must be able to understand when and how to apply the appropriate method when faced with data. This course is designed to fill that gap in exposing the students to the variety of techniques available and how to implement them. This course has two target audiences: 1) Students in the newly created joint certificate in Data Science Foundations that has just been approved as a joint program from Statistics and Computer Science and 2) Students in the Masters of Statistics program who are interested in a Data Science focus.


No

Is this a GEP Course?
GEP Categories

Humanities Open when gep_category = HUM
Each course in the Humanities category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

Mathematical Sciences Open when gep_category = MATH
Each course in the Mathematial Sciences category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

Natural Sciences Open when gep_category = NATSCI
Each course in the Natural Sciences category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

Social Sciences Open when gep_category = SOCSCI
Each course in the Social Sciences category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

Interdisciplinary Perspectives Open when gep_category = INTERDISC
Each course in the Interdisciplinary Perspectives category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Visual & Performing Arts Open when gep_category = VPA
Each course in the Visual and Performing Arts category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

Health and Exercise Studies Open when gep_category = HES
Each course in the Health and Exercise Studies category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
&
 

 
 

 
 

 
 

Global Knowledge Open when gep_category = GLOBAL
Each course in the Global Knowledge category of the General Education Program will provide instruction and guidance that help students to achieve objective #1 plus at least one of objectives 2, 3, and 4:
 
 

 
 

 
Please complete at least 1 of the following student objectives.
 

 
 

 
 

 
 

 
 

 
 

US Diversity Open when gep_category = USDIV
Each course in the US Diversity category of the General Education Program will provide instruction and guidance that help students to achieve at least 2 of the following objectives:
Please complete at least 2 of the following student objectives.
 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Requisites and Scheduling
 
a. If seats are restricted, describe the restrictions being applied.
 

 
b. Is this restriction listed in the course catalog description for the course?
 

 
List all course pre-requisites, co-requisites, and restrictive statements (ex: Jr standing; Chemistry majors only). If none, state none.
 

 
List any discipline specific background or skills that a student is expected to have prior to taking this course. If none, state none. (ex: ability to analyze historical text; prepare a lesson plan)
 

Additional Information
Complete the following 3 questions or attach a syllabus that includes this information. If a 400-level or dual level course, a syllabus is required.
 
Title and author of any required text or publications.
 

 
Major topics to be covered and required readings including laboratory and studio topics.
 

 
List any required field trips, out of class activities, and/or guest speakers.
 

College(s)Contact NameStatement Summary
College of EngineeringGeorge Rouskas (CS)Computer Science faculty who teach CSC 522 (Automated Learning and Data Analysis) have identified significant overlap in the topics covered by ST 563 and CSC 522. However, they also point out that the emphasis is very different: CSC 522 focuses on algorithmic, whereas ST 563 focuses on statistical properties of the various techniques. Therefore, Computer Science does not have any objections with ST 563.
Since the course is currently being taught as a special topics course, no new resources are required.

Students will gain a basic competency in using statistical learning methods for predictive data science. They will be able to use statistical software to perform regression, classification, and unsupervised learning tasks and be able to distinguish the appropriate method for the task.


Student Learning Outcomes

By the end of the course, the students will be able to:


(i) Apply basic statistical learning methods to build predictive models


(ii) Properly tune and select the appropriate method in a given situation


(iii) Correctly assess model fit and error


(iv) Build an ensemble of statistical learning algorithms


(v) Implement the methods using statistical software


Evaluation MethodWeighting/Points for EachDetails
Homework15%
Midterm30%
Final Exam35%
Project15%
Discussion5%Online
TopicTime Devoted to Each TopicActivity
Overview of Statistical Learning1 WeekPrediction vs. Interpretation, Test Error, Bias-Variance Tradeoff
Review of Linear Regression1 WeekSimple Linear Regression, Multiple Linear Regression, Diagnostics
Logistic Regression1 WeekEstimation and Prediction, Multi-Class Logistic Regression
Classification Methods1 WeekBayes Rule and the Bayes Classifier, K-Nearest Neighbors Classification, Evaluating Classification Error via Confusion Matrix and ROC curve

Discriminant Analysis1 WeekLinear Discriminant Analysis, Quadratic Discriminant Analysis, Naive Bayes
Resampling Methods1 WeekCross-Validation, Bootstrap
Model Selection Methods1 WeekSubset Selection, Forward and Backward Selection, Penalization Methods: Ridge Regression and Lasso
Variable Selection and Dimension Reduction1 WeekPenalization Methods - Continued, Principal Components Regression and Partial Least Squares

Midterm Exam1 Week
Moving Beyond Linearity2 WeeksPolynomial Regression, Basis Functions, Regression Splines, Smoothing Splines, Local Regression, Generalized Additive Models

Decision Tree-Based Methods1 WeekRegression Trees, Classification Trees

Ensemble Tree Methods1 WeekBagging, Random Forests, Boosting
Support Vector Machines1 WeekMaximal Margin Classifier, Linear SVM, Kernel SVM
Unsupervised Learning1 WeekPrincipal Components Analysis, Association Analysis; Clustering: K-Means Clustering, Hierarchical Clustering
mlnosbis 8/4/2016: Consultation provided above. No further consultation required.

ghodge 8/9/2016 Ready for ABGS reviewers

ABGS Reviewer Comments:
-No concerns
allloyd (Thu, 04 Aug 2016 14:35:17 GMT): Passed college committee
Key: 9979