HDP Developer Enterprise Apache Spark I

Our classes are always live and instructor led from our Exton, PA or EPIC Partner locations. Springhouse AnywhereLive options require Internet Access. Select classes are Guaranteed to Run (GTR). View our complete schedule policies.

 

 

 

 

Overview

This course is designed as an entry point for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark.

Topics include:

  • An overview of the Hortonworks Data Platform (HDP), including HDFS and YARN
  • Using Spark Core APIs for interactive data exploration
  • Spark SQL and DataFrame operations
  • Spark Streaming and DStream operations
  • Data visualization, reporting, and collaboration
  • Performance monitoring and tuning
  • Building and deploying Spark applications
  • Introduction to the Spark Machine Learning Library

Intended Audience

​Software engineers that are looking to develop in-memory applications for time sensitive and highly iterative applications in an Enterprise HDP environment.


At Completion

  • Describe Hadoop, HDFS, YARN, and the HDP ecosystem
  • Describe Spark use cases
  • Explore and manipulate data using Zeppelin
  • Explore and manipulate data using a Spark REPL
  • Explain the purpose and function of RDDs
  • Employ functional programming practices
  • Perform Spark transformations and actions
  • Work with Pair RDDs
  • Perform Spark queries using Spark SQL and DataFrames
  • Use Spark Streaming stateless and window transformations
  • Visualize data, generate reports, and collaborate using Zeppelin
  • Monitor Spark applications using Spark History Server
  • Learn general application optimization guidelines/tips
  • Use data caching to increase performance of applications
  • Build and package Spark applications
  • Deploy applications to the cluster using YARN
  • Understand the purpose of Spark MLlib

Prerequisites

​Students should be familiar with programming principles and have previous experience in software development using either Python or Scala. Previous experience with data streaming, SQL, and HDP is also helpful, but not required.


Exams & Certifications


Materials

  • 50% Lecture/Discussion
  • 50% Hands-on Labs

Course Outline

Labs can be performed using either Python or Scala

  • Use common HDFS commands
  • Use a REPL to program in Spark
  • Use Zeppelin to program in Spark
  • Perform RDD transformations and actions
  • Perform Pair RDD transformations and actions
  • Utilize Spark SQL
  • Perform stateless transformations using Spark Streaming
  • Perform window-based transformations
  • Use Zeppelin for data visualization and reporting
  • Monitor applications using Spark History Server
  • Cache and persist data
  • Configure checkpointing, broadcast variables, and executors
  • Build and submit a Spark application to YARN
  • Run Spark MLlib applications

 

 

HDP Developer Enterprise Apache Spark Ihttp://springhouse.com/course-catalog/HW HDP SparkHDP Developer Enterprise Apache Spark I

Get More Information
Name:

Phone:  

Email:  

Comments:

Help us prove you're not a robot:
 

 ‭(Hidden)‬ Catalog-Item Reuse

Microsoft Gold Partner

PMI R.E.P.

AXELOS Limited

The Microsoft Gold CPLS logo is a mark of Microsoft, Inc.

The PMI R.E.P. logo is a mark of the Project Management Institute, Inc.

ITIL® is a registered trade mark of AXELOS Limited.
IT Infrastructure Library® is a registered trade mark of AXELOS Limited
The Swirl logo™ is a registered trade mark of AXELOS Limited
Accredited course material is property of ITSM Academy.

Connect with us

Springhouse Education & Consulting Services

Corporate HQ:Eagleview Corporate Park
707 Eagleview Boulevard
Suite 207
Exton, PA 19341

610-321-3500 - info@springhouse.com