Skip to content

Big Data Analytics with Apache Spark

Hands-on practical training with real-world business use cases, taught by experienced Big Data professionals.


Pairview Training

Summary

Price
£1,428 inc VAT
Study method
Online + live classes
Duration
2 days · Full-time
Qualification
No formal qualification
Certificates
  • Certificate of completion - Free
Additional info
  • Tutor is available to students

Overview

About This Course

This course will enable delegates to build complete, unified big data applications combining batch, streaming, and interactive analytics using different data types. Delegates will learn how to write sophisticated parallel applications to implement faster and better decisions and real-time actions that can be applied to a wide variety of use cases, architectures, and industries.

Benefits to Learners

At the end of this course, you will be able to:

  • Build unified big data applications combining batch, streaming, and interactive analytics.
  • Understand how to write sophisticated parallel applications.
  • Understand different APIs which Spark offers such as Spark Streaming, MLlib, SparkSQL, GraphX.

Description

Course Curriculum

1. Introduction to SPARK

  • Defining Big Data and Big Computation
  • What is Spark?
  • What is its purpose?
  • What are the benefits of Spark?
  • Components of the Spark unified stack
  • Resilient Distributed Dataset (RDD)
  • Downloading and installing Spark standalone

2. Resilient Distributed Dataset

  • Creating RDD
  • RDD Transformations
  • RDD Actions
  • Programming with RDD
  • Numeric RDD operations
  • Pair RDDs
  • Transformations and Actions on Pair RDDs
  • Joins and Reduction Operations

3. Structured data: SQL, DataFrame, and Datasets

  • Spark SQL
  • DataFrame and datasets
  • Datasets instead of RDD
  • Combining RDDs with the powerful automatic optimizations behind Spark SQL.
  • Connecting to databases with JDBC

4. Defining the Spark architecture

  • Partitioning data across the cluster using Resilient Distributed Datasets (RDD) and DataFrames
  • Apportioning task execution across multiple nodes
  • Running applications with the Spark execution model
  • Creating resilient and fault-tolerant clusters
  • Achieving scalable distributed storage
  • Monitoring and administering Spark applications

5. Performing machine learning with Spark

  • MLLIB: Machine Learning Library
  • Predicting outcomes with supervised learning
  • Building a decision tree classifier
  • Grouping data using unsupervised learning
  • Clustering with the k-means method

6. Streaming Data in Spark

  • Implementing sliding window operations
  • Determining state from continuous data
  • Processing simultaneous streams
  • Improving performance and reliability
  • Streaming from built-in sources (e.g., log files, Twitter sockets, Kinesis, Kafka)
  • Processing with the streaming API and Spark SQL

7. GraphX Library

  • Introduction to Graphs
  • Imports
  • Building the graph
  • Creating Graph Frames
  • Standard Graph Algorithms
    • Breadth First Search
    • Connected Components
    • Strongly connected components
    • PageRank
    • Shortest Paths
  • Basic graph and dataframe queries

Who is this course for?

This course is suitable for data architects, database developers and administrators that are looking to advance their career to Big Data engineering.

If delegates have not had such experience with database design and development, they should meet the following entry requirements:

  • A completed graduate degree with a minimum of a 2:2 is essential.
  • An academic background related to technology, IT, programming, engineering or technical science is advisable.
  • Loving to work with technical processes and development, and aspiring to build a successful career in the technical, infrastructural part of Business Analytics.

Requirements

  • Knowledge of a programming language (e.g. Java).

Career path

Big data engineering

Data architecture

Big data solution development

Questions and answers

Certificates

Certificate of completion

Digital certificate - Included

Reviews

Currently there are no reviews for this course. Be the first to leave a review.

FAQs

Study method describes the format in which the course will be delivered. At Reed Courses, courses are delivered in a number of ways, including online courses, where the course content can be accessed online remotely, and classroom courses, where courses are delivered in person at a classroom venue.

CPD stands for Continuing Professional Development. If you work in certain professions or for certain companies, your employer may require you to complete a number of CPD hours or points, per year. You can find a range of CPD courses on Reed Courses, many of which can be completed online.

A regulated qualification is delivered by a learning institution which is regulated by a government body. In England, the government body which regulates courses is Ofqual. Ofqual regulated qualifications sit on the Regulated Qualifications Framework (RQF), which can help students understand how different qualifications in different fields compare to each other. The framework also helps students to understand what qualifications they need to progress towards a higher learning goal, such as a university degree or equivalent higher education award.

An endorsed course is a skills based course which has been checked over and approved by an independent awarding body. Endorsed courses are not regulated so do not result in a qualification - however, the student can usually purchase a certificate showing the awarding body's logo if they wish. Certain awarding bodies - such as Quality Licence Scheme and TQUK - have developed endorsement schemes as a way to help students select the best skills based courses for them.