Preview Workflow

Viewing: ST 562 : Data Mining with SAS Enterprise Miner

Last approved: Thu, 23 Jun 2016 20:55:40 GMT

Last edit: Thu, 23 Jun 2016 19:42:35 GMT

Catalog Pages referencing this course
Change Type
Major
ST (Statistics)
562
032351
Dual-Level Course
No
Cross-listed Course
No
Data Mining with SAS Enterprise Miner
Data Mining with SAS
College of Sciences
Statistics (17ST)
Term Offering
Spring Only
Offered Every Year
Spring 2017
Previously taught as Special Topics?
Yes
5
 
Course Prefix/NumberSemester/Term OfferedEnrollment
610, 610,610, 590, 590spring18,27,25,49,44
Course Delivery
Face-to-Face (On Campus)
Distance Education (DELTA)

Grading Method
Graded/Audit
3
15
Contact Hours
(Per Week)
Component TypeContact Hours
Lecture3
Course Attribute(s)


If your course includes any of the following competencies, check all that apply.
University Competencies

Course Is Repeatable for Credit
No
 
 
David Dickey
Professor
Full

Open when course_delivery = campus OR course_delivery = blended OR course_delivery = flip
Enrollment ComponentPer SemesterPer SectionMultiple Sections?Comments
Lecture5050No
Open when course_delivery = distance OR course_delivery = online OR course_delivery = remote
Delivery FormatPer SemesterPer SectionMultiple Sections?Comments
LEC3030No
Prerequisite: ST 512 or ST 514 or ST 515 or ST 517
Is the course required or an elective for a Curriculum?
No
This is a hands-on course using modeling techniques designed mostly for large observational studies. Estimation topics include recursive splitting, ordinary and logistic regression, neural networks, and discriminant analysis. Clustering and association analysis are covered under the topic "unsupervised learning," and the use of training and validation data sets is emphasized. Model evaluation alternatives to statistical significance include lift charts and receiver operating characteristic curves. SAS Enterprise Miner is used in the demonstrations, and some knowledge of basic SAS programming is helpful.

We are in an era in which large amounts of data are being collected, sometimes without a particular goal in mind, then later used for decision making. These data are typically observational in nature rather than from controlled studies, and there can be outliers and large chunks of missing values in some of the variables.  Such data call for additional tools in the modern analyst's tool bag. Fast methods that accommodate missing values and outliers, such as recursive splitting methods, have arisen  and computer methods for speeding up traditional analyses like logistic regression have been included in software such as SAS Institute's Enterprise Miner package.  Flexible models like neural networks have developed a following among analysts. When loyalty cards are scanned at a store, they provide data for association analysis. Learning what items are purchased together and customer segmentation by clustering has also found application in business.  The demand for graduates with the skills to analyze such data far exceeds the supply, and demand is growing. Hands-on experience with an industrial strength data mining package that has all of the above abilities, such as SAS Enterprise Miner used in the course, empowers our students at NC State to be competitive in the workforce.


No

Is this a GEP Course?
GEP Categories

Humanities Open when gep_category = HUM
Each course in the Humanities category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

Mathematical Sciences Open when gep_category = MATH
Each course in the Mathematial Sciences category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

Natural Sciences Open when gep_category = NATSCI
Each course in the Natural Sciences category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

Social Sciences Open when gep_category = SOCSCI
Each course in the Social Sciences category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

Interdisciplinary Perspectives Open when gep_category = INTERDISC
Each course in the Interdisciplinary Perspectives category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Visual & Performing Arts Open when gep_category = VPA
Each course in the Visual and Performing Arts category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
 

 
 

Health and Exercise Studies Open when gep_category = HES
Each course in the Health and Exercise Studies category of the General Education Program will provide instruction and guidance that help students to:
 
 

 
 

 
 

 
 

 
&
 

 
 

 
 

 
 

Global Knowledge Open when gep_category = GLOBAL
Each course in the Global Knowledge category of the General Education Program will provide instruction and guidance that help students to achieve objective #1 plus at least one of objectives 2, 3, and 4:
 
 

 
 

 
Please complete at least 1 of the following student objectives.
 

 
 

 
 

 
 

 
 

 
 

US Diversity Open when gep_category = USDIV
Each course in the US Diversity category of the General Education Program will provide instruction and guidance that help students to achieve at least 2 of the following objectives:
Please complete at least 2 of the following student objectives.
 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Requisites and Scheduling
 
a. If seats are restricted, describe the restrictions being applied.
 

 
b. Is this restriction listed in the course catalog description for the course?
 

 
List all course pre-requisites, co-requisites, and restrictive statements (ex: Jr standing; Chemistry majors only). If none, state none.
 

 
List any discipline specific background or skills that a student is expected to have prior to taking this course. If none, state none. (ex: ability to analyze historical text; prepare a lesson plan)
 

Additional Information
Complete the following 3 questions or attach a syllabus that includes this information. If a 400-level or dual level course, a syllabus is required.
 
Title and author of any required text or publications.
 

 
Major topics to be covered and required readings including laboratory and studio topics.
 

 
List any required field trips, out of class activities, and/or guest speakers.
 

Since the course has been taught for 5 years as a special topics course, there will be no need for additional resources.

The goal of this course is to introduce the basic elements of data mining techniques to students with backgrounds equivalent to that supplied by the department's statistical methodology sequence. Students will get hands-on experience with the SAS Enterprise Miner product as well as SAS programming through in class demonstrations and practice with homework data sets.


Student Learning Outcomes

By the end of this course, the students will be able to

    Use SAS Enterprise Miner to run analyses

    Check for problem data and mitigate the problems

    Use classification and regression trees to perform recursive splitting

    Perform and interpret logistic regression

    Evaluate and compare models with modern tools like lift charts

    Run discriminant analysis and compare it to modern methods

    Fit neural network models to data

    Perform cluster analyses for large data sets

    Use association and sequence analysis on large data sets

 


Evaluation MethodWeighting/Points for EachDetails
Homework20%
Exam20%
Exam20%
Exam20%
Final Exam20%
TopicTime Devoted to Each TopicActivity
Overview, diagrams, ordinary regression2 weekssummary of upcoming topics with examples, creating data mining diagrams, setting up the environment for running SAS Enterprise Miner, linking data sets to be used in examples
Classification Trees, Regression Trees 4 weeksUse of Chi-square tests in recursive splitting, Interpretation of decision trees using famous Framingham heart study data, Splitting algorithms for decision trees, Treatment of missing values, Simplifying decision trees using validation data, Building trees for estimates, decisions, or ranking gives different results, Build several trees on an example data set, Compare decision trees to regression trees and give a regression tree example, Compute lift charts for trees
Discriminant Analysis2 weeks Review multivariate normal distribution, Develop discriminant functions from normal distribution definition, Discuss the role of priors in discriminant, analysis, Interpret posterior probabilities and error rates, Compare quadratic discriminants to linear ones
Ordinary and Logistic Regression 3 weeksExplain the need for new regression methods when the response is categorical (focus on binary), Show the logistic function, Develop maximum likelihood estimators, graph the likelihood function and discuss Gauss-Newton estimation, Show a logistic example within SAS (space shuttle O-ring data), Review additional data cleaning steps needed here but not in tree based methods, Develop logistic regressions within Enterprise Miner, Interpret logistic output including a discussion of concordance
Neural Networks1 weekRelate hyperbolic tangent functions to familiar logistic functions, Demystify neural nets somewhat by showing them as compositions of hyperbolic tangents, Explore that flexibility of neural networks, Control neural network complexity using logistic regression model building as a preliminary variable selection tool.
Evaluation Methods 1 weekDevelop the ROC curve idea, Show how ROC curves relate to concordance, Compare several of the above models through their ROC curves and lift charts, Pick a winner among models and export the model code to C, Java, or SAS code
Clustering 1 weekDistinguish agglomerative, divisive, and direct clustering, Describe single, average, and complete linkage, Ward's method, and k-means and give examples, Describe the two step method used in Enterprise Miner, Cluster some Census Bureau data on U.S. households within Enterprise Miner, Show graphical depictions of cluster compositions
Association Analysis and other topics as time permits 1 week Relate association analysis to simple conditional probability
Compute lift for association analysis
Show association and sequence analysis on some banking data. Other topics as time permits: multidimensional scaling, bagging and boosting of tree based models

mlnosbis 4/11/2016: No overlapping courses.

ghodge 4/16/2016 No consultation required as it does not seem to overlap with any courses. Ready for ABGS reviewers

ABGS Reviewer Comments:
-Good, but syllabus has no details about grading of assignments.
Key: 10017