This open source course provides participants with a comprehensive understanding of the steps necessary to install, configure, operate and maintain Hadoop. The course begins with an overview of the Big Data landscape, and then dives into a system administration working view of running Hadoop.
TARGET AUDIENCE:
This course is intended for System administrators, DevOps engineers, and software developers responsible for managing and maintaining Hadoop clusters.
COURSE PREREQUISITES:
Not available. Please contact.
COURSE CONTENT:
• The content of this course is designed to support the course objectives.
Hadoop Introduction
• A Brief History of Hadoop
• Core Hadoop Components
• Fundamental Concepts
Planning Your Hadoop Cluster
• General Planning Considerations
• Choosing Hardware
• Network Considerations
• Configuring Nodes
• Planning for Cluster Management
HDFS
• HDFS Features
• Writing and Reading Files
• NameNode Considerations
• HDFS Security
• Namenode Web UI
• Hadoop File Shell
Getting Data into HDFS
• Pulling data from External Sources with Flume
• Importing Data from Relational Databases with Sqoop
• REST Interfaces
• Best Practices
• MapReduce
• MapReduce overview
• Features of MapReduce
• Architectural Overview
• YARN MapReduce Version 2
• Failure Recovery
• The JobTracker Web UI
Hadoop Installation & Initial Configuration
• Configuration & Deployment Types
• Installing Hadoop
• Specifying the Hadoop Configuration
• Initial HDFS & MapReduce Configuration
• Log Files
Installing/Configuring Hive, Impala, and Pig
• Hive
• Impala
• Pig
Hadoop Clients
• What is a Hadoop Client?
• Installing and Configuring Hadoop Clients
• Installing and Configuring Hue
• Hue Authentication and Configuration
Advanced Cluster Configuration
• Advanced Configuration Parameters
• Configuring Hadoop Ports
• Explicitly Including and Excluding Hosts
• Configuring HDFS for Rack Awareness & HDFS High Availability
Hadoop Security
• Why Hadoop Security Is Important
• Hadoop's Security System Concepts
• What Kerberos Is and How it Works
• Securing a Hadoop Cluster with Kerberos
Managing and Scheduling Jobs
• Managing Running Jobs
• Scheduling Hadoop Jobs
• Configuring the FairScheduler
Cluster Maintenance
• Checking HDFS Status
• Copying Data Between Clusters
• Adding/Removing Cluster Nodes
• Rebalancing the Cluster
• NameNode Metadata Backup
• Cluster Upgrades
Cluster Monitoring and Troubleshooting
• General System Monitoring
• Managing Hadoop's Log Files
• Monitoring the Clusters
• Common Troubleshooting Issues
COURSE OBJECTIVE:
Upon successful completion of this course, participants should be able to:
• Describe the fundamental concepts of using Big Data
• Identify where Hadoop fits into a Big Data strategy
• Learn to plan your Hadoop cluster.
• Learn HDFS features.
• Learn how to get data into HDFS.
• Learn to work with MapReduce.
• Learn installation and configuration of Hadoop.
• Learn cluster maintenance.
FOLLOW ON COURSES:
Not available. Please contact.