COURSE OBJECTIVE:
• Immediately participate and contribute as a Data Science Team Member on big data and other analytics projects by:
• Deploying the Data Analytics Lifecycle to address big data analytics projects
• Reframing a business challenge as an analytics challenge
• Applying appropriate analytic techniques and tools to analyze big data, create statistical models, and identify insights that can lead to actionable results
• Selecting appropriate data visualizations to clearly communicate analytic insights to business sponsors and analytic audiences
• Using tools such as: R and RStudio, MapReduce/Hadoop, in-database analytics, Window and MADlib functions
• Explain how advanced analytics can be leveraged to create competitive advantage and how the data scientist role and skills differ from those of a traditional business intelligence analyst
TARGET AUDIENCE:
This course is intended for individuals seeking to develop an understanding of Data Science from the perspective of a practicing Data Scientist.
COURSE PREREQUISITES:
To complete this course successfully and gain the maximum benefits from it, a student should have the following knowledge and skill sets:
• A strong quantitative background with a solid understanding of basic statistics, as would be found in a statistics 101 level course
• Experience with a scripting language, such as Java, Perl, or Python (or R). Many of the lab examples taught in the course use R (with an RStudio GUI), which is an open source statistical tool and programming
• Experience with SQL (some course examples use
Consider the above as a list of specific prerequisite (or refresher) training and reading to be completed prior to enrolling for or attending this course. Having this requisite background will help ensure a positive experience in the class, and enable students to build on their expertise to learn many of the more advanced tools and analytical methods taught in the course.
COURSE CONTENT:
• Introduction and Course Agenda
• Introduction to Big Data Analytics
• Big Data Overview
• State of the Practice in Analytics
• The Data Scientist
• Big Data Analytics in Industry Verticals
• Data Analytics Lifecycle
• Discovery
• Data Preparation
• Model Planning
• Model Building
• Communicating Results
• Operationalizing
• Review of Basic Data Analytic Methods Using R
• Using R to Look at Data – Introduction to R
• Analyzing and Exploring the Data
• Statistics for Model Building and Evaluation
• Advanced Analytics – Theory And Methods
• K Means Clustering
• Association Rules
• Linear Regression
• Logistic Regression
• Naïve Bayesian Classifier
• Decision Trees
• Time Series Analysis
• Text Analysis
• Advanced Analytics – Technologies and Tools
• Analytics for Unstructured Data – MapReduce and Hadoop
• The Hadoop Ecosystem
• In-database Analytics – SQL Essentials
• Advanced SQL and MADlib for In-database Analytics
• The Endgame, or Putting it All Together
• Operationalizing an Analytics Project
• Creating the Final Deliverables
• Data Visualization Techniques
• Final Lab Exercise on Big Data Analytics
FOLLOW ON COURSES:
Not available. Please contact.