Apache Spark Programming

Our classes are always live and instructor led from our Exton, PA or EPIC Partner locations. Springhouse AnywhereLive options require Internet Access. Select classes are Guaranteed to Run (GTR). View our complete schedule policies.

 

 

 

 

ef42d6a1-26fe-e811-a3ed-00155d0a14062019-02-25T10:00:00Z2500.000000000001310:00 AM6:00 PMAnywhereLiveef42d6a1-26fe-e811-a3ed-00155d0a1406
ed42d6a1-26fe-e811-a3ed-00155d0a14062019-03-25T08:00:00Z2500.000000000001310:00 AM6:00 PMAnywhereLiveed42d6a1-26fe-e811-a3ed-00155d0a1406
a4b57391-af20-e911-a3ed-00155d0a14062019-04-29T08:00:00Z2500.000000000001310:00 AM6:00 PMAnywhereLivea4b57391-af20-e911-a3ed-00155d0a1406
469f9299-af20-e911-a3ed-00155d0a14062019-06-17T08:00:00Z2500.000000000001310:00 AM6:00 PMAnywhereLive469f9299-af20-e911-a3ed-00155d0a1406

Overview

This 3-day course is equally applicable to data engineers, data scientist, analysts, architects, software engineers, and technical managers interested in a thorough, hands-on overview of Apache Spark.

The course covers the fundamentals of Apache Spark including Spark's architecture and internals, the core APIs for using Spark, SQL and other high-level data access tools, as well as Spark's streaming capabilities and machine learning APIs. The class is a mixture of lecture and hands-on labs.

Each topic includes lecture content along with hands-on labs in the Databricks notebook environment. Students may keep the notebooks and continue to use them with the free Databricks Community Edition offering after the class ends; all examples are guaranteed to run in that environment.

Intended Audience

​Data scientists, analysts, architects, software engineers, and technical managers with experience in machine learning who want to adapt traditional machine learning tasks to run at scale using Apache Spark.


At Completion

After taking this class, students will be able to:

  • Use the core Spark APIs to operate on data
  • Articulate and implement typical use cases for Spark
  • Build data pipelines and query large data sets using Spark SQL and DataFrames
  • Analyze Spark jobs using the administration UIs inside Databricks
  • Create Structured Streaming jobs
  • Work with relational data using the GraphFrames APIs
  • Understand how a Machine Learning pipeline works
  • Understand the basics of Spark's internals

Prerequisites

  • Some familiarity with Apache Spark is helpful but not required.
  • Some familiarity with Machine Learning and Data Science concepts are highly recommended but not required.
  • Basic programming experience in an object-oriented or functional language is required. The class can be taught concurrently in Python and Scala.

Exams & Certifications


Materials


Course Outline

Module 1: Spark Overview

Lecture

  • Databricks Overview
  • Spark Capabilities
  • Spark Ecosystem
  • Basic Spark Components

Hands-On

  • Databricks Lab Environment
  • Working with Notebooks
  • Spark Clusters and Files


Module 2: Spark SQL and DataFrames

Lecture

  • Use of Spark SQL
  • Use of DataFrames / DataSets
  • Reading from CSV, JSON, JDBC, Parquet Files & more
  • Writing Data
  • DataFrame, DataSet and SQL APIs
  • Aggregations
  • SQL Joins with DataFrames
  • Broadcasting
  • Catalyst Query Optimization
  • Tungsten
  • ETL

Hands-On

  • Creating DataFrames
  • Querying with DataFrames and SQL
  • ETL with DataFrames
  • Caching
  • Visualization


Module 3: Spark Internals

Lecture

  • Jobs, Stages and Tasks
  • Partitions and Shuffling
  • Job Performance

Hands-On

  • Visualizing SQL Queries
  • Observing Task Execution
  • Understanding Performance
  • Measuring Memory Use


Module 4: Structured Streaming

Lecture

  • Streaming Sources and Sinks
  • Structured Streaming APIs
  • Windowing and Aggregation
  • Checkpointing
  • Watermarking
  • Reliability and Fault Tolerance

Hands-On

  • Reading from TCP
  • Reading from Kafka
  • Continuous Visualization


Module 5: Machine Learning

Lecture

  • Spark ML Pipeline API
  • Built-in Featurizing and Algorithms

Hands-On

  • Featurization
  • Building a Machine Learning Pipeline


Module 6: Graph Processing with GraphFrames

Lecture

  • Basic Graph Analysis
  • GraphFrames API

Hands-On

  • GraphFrames ETL
  • Pagerank and Label Propagation with GraphFrames

 

 

Apache Spark Programminghttp://springhouse.com/course-catalog/DB_105Apache Spark Programming

Get More Information
Name:

Phone:  

Email:  

Comments:

Help us prove you're not a robot:
 

 ‭(Hidden)‬ Catalog-Item Reuse

Microsoft Gold Partner

PMI R.E.P.

AXELOS Limited

The Microsoft Gold CPLS logo is a mark of Microsoft, Inc.

The PMI R.E.P. logo is a mark of the Project Management Institute, Inc.

ITIL® is a registered trade mark of AXELOS Limited.
IT Infrastructure Library® is a registered trade mark of AXELOS Limited
The Swirl logo™ is a registered trade mark of AXELOS Limited
Accredited course material is property of ITSM Academy.

Connect with us

Springhouse Education & Consulting Services

Corporate HQ:Eagleview Corporate Park
707 Eagleview Boulevard
Suite 207
Exton, PA 19341

610-321-3500 - info@springhouse.com