MS Data Science Program Structure
For those students who wish to complete the program in one calendar year, the default curriculum is defined in the table below. Both summer courses are available online, so that students may start the program online during the summer and complete it on campus during the following academic year.
A core of six courses, three on modern statistical methods and three on big data programming and management, provide a solid base for future work in data science. Four other courses chosen from a wide variety of graduate statistics andmathematics courses build additional expertise in specific areas of data science.
|STAT 700 Statistical Programming*
||STAT 701 Modern Applied Statistics I*
||STAT 702 Modern Applied Statistics II*
|INFS 774 Big Data Analytics*
||INFS 762 Data Warehousing/Mining*
||INFS 772 Programming-Data Analytics*
||STAT 560 Time Series Analysis**
||STAT 545 Nonparametric Statistics**
||MATH 575 Operations Research I**
||STAT 551 Predictive Analytics I**
* Denotes core course, required of all MS Data Science students.
** Denotes default choice for non-core course. Students with necessary prerequisites may take other graduate courses in place of one or more of these.
See the Complete Graduate Course Rotation and Course Descriptions page for information on all Mathematics and Statistics graduate courses. Course descriptions of courses included in the table above are available below.
- STAT 700 Statistical Programming: Fundamentals of statistical programming languages including descriptive and visual analytics in R and SAS, and programming fundamentals in R and SAS including logic, loops, macros, and functions.
- STAT 701 Modern Applied Statistics I: Topics include statistical graphics, modern statistical computing languages, nonparametric and semiparametric statistical methods, longitudinal and repeated measures, meta-analysis, and large-scale inference. Prerequisite: STAT 541 or equivalent, and STAT 700.
- STAT 702 Modern Applied Statistics II: Topics include data mining techniques for multivariate data, including principal component analysis, multidimensional scaling, and cluster analysis; supervised learning methods and pattern recognition; and an overview of statistical prediction analysis relevant to business intelligence and analytics. Prerequisite: STAT 701
- STAT 545 Nonparametric Statistics: Covers many standard nonparametric methods of analysis. Methods will be compared with one another and with parametric methods where applicable. Attention will be given to: (1) analogies with regression and ANOVA; (2) emphasis on construction of tests tailored to specific problems; and (3) logistic analysis. Pre-requisites: STAT 281, MATH 381 or STAT 381.
- STAT 551 Predictive Analytics I: Introduction to Predictive Analytics. This course will examine the fundamental methodologies of predictive modeling used in financial and predictive modeling such as credit scoring. Topics covered will include logistic regression, tree algorithms, customer segmentation, cluster analysis, model evaluation, and credit scoring. Pre-requisite: STAT 482 or STAT 786 (or equivalent).
- STAT 560 Time Series Analysis: Statistical methods for analyzing data collected sequentially in time where successive observations are dependent. Includes smoothing techniques, decomposition, trends and seasonal variation, forecasting methods, models for time series: stationarity, autocorrelation, linear filters, ARMA processes, nonstationary processes, model building, forecast errors and confidence intervals. Pre-requisite: STAT 582.
- MATH 575 Operations Research I: Philosophy and techniques of operations research, including game theory; linear programming, simplex method, and duality; transportation and assignment problems; introduction to dynamic programming; and queuing theory. Applications to business and industrial problems. Prerequisite: MATH-315, or (MATH-281 and MATH-125), or instructor consent.
- INFS 762 Data Warehousing and Data Mining: The main concepts, components, and various architectures of Data Warehouse. Advanced data analysis and optimization of Data Warehouse Design. Data Warehousing and OLAP tools. Applying data mining algorithms to retrieve highly specialized information or knowledge about the data stored in the Data Warehouse. Prerequisites: INFS 605 (or equivalent programming) and INFS 760
- INFS 772 Programming for Data Analytics: This course will provide an introduction to programming for data analysis with an emphasis on the analysis of large datasets. The programming language we will use is Python. Python is a general-purpose programming language that's powerful, easy to learn and fast to code. It has a mature and growing ecosystem of open source tools for mathematics and data analysis, and is rapidly becoming the language of choice for scientists and researchers of all stripes. In the first half of the course, students will learn the core of ideas of programming – flow control, input and output, data structures (e.g., arrays, lists, trees and hash tables), iteration and recursion, classes and object-oriented programming – through writing code to deal with Big Data generated by social media sites such as Twitter. In the second half of the course, students will learn how to use Python for effective data analysis. Specific topics addressed include: vector computation and mathematics with NumPy, statistical computation with SciPy, working with tubular data with Pandas, and implementing analytics algorithms using Python.
- INFS 774 Big Data Analytics: This course provides a broad understanding of the principles underlying Big Data analytics and its applications in different domains using a hands-on approach with a state-of-the-art Big Data platform. It provides a combination of essential business and technical skills related to Big Data analytics. Business aspects of the course emphasized include (a) understanding the scope and role of Big Data in today's organizations, (b) representative example scenarios and case studies of industry specific applications highlighting Big Data issues – volume, variety, velocity, and veracity, (c) when to consider a Big Data Solution, (d) the integration of Big Data initiatives as part of the overall business strategy to achieve "return on data" and competitive differentiation, and information governance issues. Technical aspects of the course emphasized include (a) lifecycle of a Big Data analytics solution with multiple entry points, (b) essential components of a Big Data solution and technology platform, (c) key features of Hadoop and related technologies (e.g., MapReduce, HDFS, NoSQL), (d) performing analytics with predictive models, text analytics, and streaming data, and (e) data visualization and communication of analytical findings. State-of-the art tools are integrated throughout the course to provide hands-on exercises with relevant techniques.