HDP Developer Apache Pig and Hive

Our classes are always live and instructor led from our Exton, PA or EPIC Partner locations. Springhouse AnywhereLive options require Internet Access. Select classes are Guaranteed to Run (GTR). View our complete schedule policies.

 

 

 

 

12e759f5-2eb1-e811-912c-00155d0a14062018-10-08T08:00:00Z2800.000000000001410:00 AM6:00 PMAnywhereLive12e759f5-2eb1-e811-912c-00155d0a1406

Overview

​This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.

Intended Audience

​Software developers who need to understand and develop applications for Hadoop.


At Completion

  • Describe Hadoop, YARN and use cases for Hadoop
  • Describe Hadoop ecosystem tools and frameworks
  • Describe the HDFS architecture
  • Use the Hadoop client to input data into HDFS
  • Transfer data between Hadoop and a relational database
  • Explain YARN and MaoReduce architectures
  • Run a MapReduce job on YARN
  • Use Pig to explore and transform data in HDFS
  • Understand how Hive tables are defined and implemented
  • Use Hive to explore and analyze data sets
  • Use the new Hive windowing functions
  • Explain and use the various Hive file formats
  • Create and populate a Hive table that uses ORC file formats
  • Use Hive to run SQL-like queries to perform data analysis
  • Use Hive to join datasets using a variety of techniques
  • Write efficient Hive queries
  • Create ngrams and context ngrams using Hive
  • Perform data analytics using the DataFu Pig library
  • Explain the uses and purpose of HCatalog
  • Use HCatalog with Pig and Hive
  • Define and schedule an Oozie workflow
  • Present the Spark ecosystem and high-level architecture
  • Perform data analysis with Spark's Resilient Distributed Dataset API
  • Explore Spark SQL and the DataFrame API

Prerequisites

​Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.


Exams & Certifications


Materials

  • ​50% Lecture/Discussion
  • 50% Hands-on Labs

Course Outline

​DAY 1 - IN INTRODUCTION TO THE HADOOP DISTRIBUTED FILE SYSTEM


OBJECTIVES

  • Understanding Hadoop
  • The Hadoop Distributed File System
  • Ingesting Data into HDFS
  • The MapReduce Framework

 

LABS

  • Starting an HDP Cluster
  • Demonstration: Understanding Block Storage
  • Using HDFS Commands
  • Importing RDBMS Data into HDFS
  • Exporting HDFS Data to an RDBMS
  • Importing Log Data into HDFS Using Flume
  • Demonstration: Understanding MapReduce
  • Running a MapReduce Job

 

DAY 2 - AN INTRODUCTION TO APACHE PIG


OBJECTIVES

  • Introduction to Apache Pig
  • Advanced Apache Pig Programming

 

LABS

  • Demonstration: Understanding Apache Pig
  • Getting Starting with Apache Pig
  • Exploring Data with Apache Pig
  • Splitting a Dataset
  • Joining Datasets with Apache Pig
  • Preparing Data for Apache Hive
  • Demonstration: Computing Page Rank
  • Analyzing Clickstream Data
  • Analyzing Stock Market Data Using Quantiles

 

DAY 3 - AN INTRODUCTION TO APACHE HIVE


OBJECTIVES

  • Apache Hive Programming
  • Using HCatalog
  • Advanced Apache Hive Programming

 

LABS

  • Understanding Hive Tables
  • Understanding Partition and Skew
  • Analyzing Big Data with Apache Hive
  • Demonstration: Computing NGrams
  • Joining Datasets in Apache Hive
  • Computing NGrams of Emails in Avro Format
  • Using HCatalog withApachePig

 

DAY 4 - WORKING WITH SPARK CORE, SPARK SQL AND OOZIE


OBJECTIVES

  • Advanced Apache Hive Programming (Continued)
  • Hadoop 2 and YARN
  • Introduction to Spark Core and Spark SQL
  • Defining Workflow with Oozie

 

LABS

  • Advanced Apache Hive Programming
  • Running a YARN Application
  • Getting Started with Apache Spark
  • Exploring Apache Spark SQL
  • Defining an Apache Oozie Workflow

 

 

HDP Developer Apache Pig and Hivehttp://springhouse.com/course-catalog/HW HDP PHHDP Developer Apache Pig and Hive

Get More Information
Name:

Phone:  

Email:  

Comments:

Help us prove you're not a robot:
 

 ‭(Hidden)‬ Catalog-Item Reuse

Microsoft Gold Partner

PMI R.E.P.

AXELOS Limited

The Microsoft Gold CPLS logo is a mark of Microsoft, Inc.

The PMI R.E.P. logo is a mark of the Project Management Institute, Inc.

ITIL® is a registered trade mark of AXELOS Limited.
IT Infrastructure Library® is a registered trade mark of AXELOS Limited
The Swirl logo™ is a registered trade mark of AXELOS Limited
Accredited course material is property of ITSM Academy.

Connect with us

Springhouse Education & Consulting Services

Corporate HQ:Eagleview Corporate Park
707 Eagleview Boulevard
Suite 207
Exton, PA 19341

610-321-3500 - info@springhouse.com