Preview Workflow

Viewing: ST 442/CSC 442 : Introduction to Data Science

Last approved: Thu, 08 Dec 2016 09:02:04 GMT

Last edit: Thu, 08 Dec 2016 09:02:04 GMT

Catalog Pages referencing this course
Change Type
Major
ST (Statistics)
442
032402
Dual-Level Course
No
Cross-listed Course
Yes
Course Prefix:
CSC
Introduction to Data Science
Introduction to Data Science
College of Sciences
Statistics (17ST)
Term Offering
Fall Only
Offered Every Year
Fall 2016
Previously taught as Special Topics?
Yes
2
 
Course Prefix/NumberSemester/Term OfferedEnrollment
ST 495-002/CSC 495-002Fall 201449
ST 495-001/CSC 495-001Fall 201539
Course Delivery
Face-to-Face (On Campus)

Grading Method
Graded with S/U option
3
16
Contact Hours
(Per Week)
Component TypeContact Hours
Lecture3
Course Attribute(s)


If your course includes any of the following competencies, check all that apply.
University Competencies

Course Is Repeatable for Credit
No
 
 
Alyson Wilson
Professor

Open when course_delivery = campus OR course_delivery = blended OR course_delivery = flip
Enrollment ComponentPer SemesterPer SectionMultiple Sections?Comments
Lecture4545NoEnrollment is totaled across the ST and CSC listings
Open when course_delivery = distance OR course_delivery = online OR course_delivery = remote
P: (MA 305 or MA 405) and (ST 305 or ST 312 or ST 370 or ST 372) and (CSC 111 or CSC 112 or CSC 113 or CSC 116 or ST 114 or ST 445)
Is the course required or an elective for a Curriculum?
No
Overview of data structures, data lifecycle, statistical inference. Data management, queries, data cleaning, data wrangling. Classification and prediction methods to include linear regression, logistic regression, k-nearest neighbors, classification and regression trees. Association analysis. Clustering methods. Emphasis on analyzing data, use and development of software tools, and comparing methods.

Data science has become increasingly important in nearly every industry sector and academic field. It has gained significant national attention and interest by combining techniques from several fields including computer science, statistics, and mathematics to extract knowledge from data. This course provides an overview of several foundational topics in data science.


No

Is this a GEP Course?
No
GEP Categories

Humanities Open when gep_category = HUM
Each course in the Humanities category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

Mathematical Sciences Open when gep_category = MATH
Each course in the Mathematial Sciences category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

Natural Sciences Open when gep_category = NATSCI
Each course in the Natural Sciences category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

Social Sciences Open when gep_category = SOCSCI
Each course in the Social Sciences category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

Interdisciplinary Perspectives Open when gep_category = INTERDISC
Each course in the Interdisciplinary Perspectives category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Visual & Performing Arts Open when gep_category = VPA
Each course in the Visual and Performing Arts category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

Health and Exercise Studies Open when gep_category = HES
Each course in the Health and Exercise Studies category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
&
 

 
 

 
 

 
 

Global Knowledge Open when gep_category = GLOBAL
Each course in the Global Knowledge category of the General Education Program will provide instruction and guidance that help students to achieve objective #1 plus at least one of objectives 2, 3, and 4:
 
 

 
 

 
Please complete at least 1 of the following student objectives.
 

 
 

 
 

 
 

 
 

 
 

US Diversity Open when gep_category = USDIV
Each course in the US Diversity category of the General Education Program will provide instruction and guidance that help students to achieve at least 2 of the following objectives:
Please complete at least 2 of the following student objectives.
 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Requisites and Scheduling
 
a. If seats are restricted, describe the restrictions being applied.
 

 
b. Is this restriction listed in the course catalog description for the course?
 

 
List all course pre-requisites, co-requisites, and restrictive statements (ex: Jr standing; Chemistry majors only). If none, state none.
 

 
List any discipline specific background or skills that a student is expected to have prior to taking this course. If none, state none. (ex: ability to analyze historical text; prepare a lesson plan)
 

Additional Information
Complete the following 3 questions or attach a syllabus that includes this information. If a 400-level or dual level course, a syllabus is required.
 
Title and author of any required text or publications.
 

 
Major topics to be covered and required readings including laboratory and studio topics.
 

 
List any required field trips, out of class activities, and/or guest speakers.
 

This will be taught as part of the regular course load for Dr. Wilson or other faculty in the department (to include Laber, Zhou, Chi). Requires use of Virtual Computing Lab to provide in-class computing resources.

  • Students will represent, query, and prepare data for analysis.

  • Students will use and adapt software for classification, association analysis, and clustering.

  • Students will evaluate and compare the results of different analysis algorithms.


Student Learning Outcomes

Our goal is to help you gain skills in handling and analyzing data from “end to end.” To mirror data science working environments, some of your assignments and in-class work will be done in multidisciplinary teams. By the end of this course, we want you to be able to:



  • Use software tools to do data cleaning and data wrangling

  • Use software tools to view and query stored data in a variety of formats

  • Use software tools to implement k-nearest-neighbors, tree-based methods, multiple regression, and logistic regression for classification

  • Develop association rules using the a priori algorithm

  • Use software tools to implement k-means and hierarchical clustering

  • Evaluate and compare the performance of algorithms


Evaluation MethodWeighting/Points for EachDetails
Homework20Assignments completed individually (4 total)
Homework20Assignments completed in teams (4 total)
Midterm20In-class written exam
Project30Team project. Three short assignments leading to final poster, presentation, and short written document.
Quizzes105 total
TopicTime Devoted to Each TopicActivity
R Programming BootcampWeek 1Students install and become familiar with the R programming language through a series of partner programming exercises.
Statistical Inference or Data Management BootcampsWeek 2-3Students self-select into statistical inference or data management bootcamps. The statistical inference bootcamp will provide a review of confidence intervals, hypothesis testing, and p-values. The data management bootcamp will provide an overview of basic data structures, data life cycles, and relevant computing architectures.
Queries and Data WranglingWeeks 4-6Finding data, cleaning data, representing data, data queries (MongoDB, SQL).
VisualizationWeek 7Graphics appropriate for exploring large and streaming data.
Classification and PredictionWeeks 8-10Linear and logistic regression, k-nearest neighbor, decision and classification trees. Metrics for evaluation of results and comparison of methods.
Association AnalysisWeek 11Development of association rules using the a priori algorithm.
ClusteringWeeks 12-13K-means and hierarchical clustering.
Geospatial AnalyticsWeek 14Hands-on exercises using geospatial analytics tools to explore spatially structured data.
Guest LecturesWeek 15At various points in the course, guest lectures from local industry and government leaders will be included to show students applications of data science.

bahler (Thu, 06 Oct 2016 17:53:00 GMT): From COE-CCC: * Include cost of textbook * Not sure if the grading scheme proposed is allowed (e.g., "at least an A-") * Under requirements for S/U, change "October 17" to "Drop/Revision Deadline per R&R" * Under audit, "regularly" is not clear, especially since "attendance is not required" per the attendance policy. * Under attendance policy, I would not say that "attendance is not required"
Key: 7261