Big Data Analytics with Apache Spark
Hands-on practical training with real-world business use cases, taught by experienced Big Data professionals.
Pairview Training
Summary
- Certificate of completion - Free
- Tutor is available to students
Overview
About This Course
This course will enable delegates to build complete, unified big data applications combining batch, streaming, and interactive analytics using different data types. Delegates will learn how to write sophisticated parallel applications to implement faster and better decisions and real-time actions that can be applied to a wide variety of use cases, architectures, and industries.
Benefits to Learners
At the end of this course, you will be able to:
- Build unified big data applications combining batch, streaming, and interactive analytics.
- Understand how to write sophisticated parallel applications.
- Understand different APIs which Spark offers such as Spark Streaming, MLlib, SparkSQL, GraphX.
Description
Course Curriculum
1. Introduction to SPARK
- Defining Big Data and Big Computation
- What is Spark?
- What is its purpose?
- What are the benefits of Spark?
- Components of the Spark unified stack
- Resilient Distributed Dataset (RDD)
- Downloading and installing Spark standalone
2. Resilient Distributed Dataset
- Creating RDD
- RDD Transformations
- RDD Actions
- Programming with RDD
- Numeric RDD operations
- Pair RDDs
- Transformations and Actions on Pair RDDs
- Joins and Reduction Operations
3. Structured data: SQL, DataFrame, and Datasets
- Spark SQL
- DataFrame and datasets
- Datasets instead of RDD
- Combining RDDs with the powerful automatic optimizations behind Spark SQL.
- Connecting to databases with JDBC
4. Defining the Spark architecture
- Partitioning data across the cluster using Resilient Distributed Datasets (RDD) and DataFrames
- Apportioning task execution across multiple nodes
- Running applications with the Spark execution model
- Creating resilient and fault-tolerant clusters
- Achieving scalable distributed storage
- Monitoring and administering Spark applications
5. Performing machine learning with Spark
- MLLIB: Machine Learning Library
- Predicting outcomes with supervised learning
- Building a decision tree classifier
- Grouping data using unsupervised learning
- Clustering with the k-means method
6. Streaming Data in Spark
- Implementing sliding window operations
- Determining state from continuous data
- Processing simultaneous streams
- Improving performance and reliability
- Streaming from built-in sources (e.g., log files, Twitter sockets, Kinesis, Kafka)
- Processing with the streaming API and Spark SQL
7. GraphX Library
- Introduction to Graphs
- Imports
- Building the graph
- Creating Graph Frames
- Standard Graph Algorithms
- Breadth First Search
- Connected Components
- Strongly connected components
- PageRank
- Shortest Paths
- Basic graph and dataframe queries
Who is this course for?
This course is suitable for data architects, database developers and administrators that are looking to advance their career to Big Data engineering.
If delegates have not had such experience with database design and development, they should meet the following entry requirements:
- A completed graduate degree with a minimum of a 2:2 is essential.
- An academic background related to technology, IT, programming, engineering or technical science is advisable.
- Loving to work with technical processes and development, and aspiring to build a successful career in the technical, infrastructural part of Business Analytics.
Requirements
- Knowledge of a programming language (e.g. Java).
Career path
Big data engineering
Data architecture
Big data solution development
Questions and answers
Certificates
Certificate of completion
Digital certificate - Included
Reviews
Currently there are no reviews for this course. Be the first to leave a review.
Legal information
This course is advertised on reed.co.uk by the Course Provider, whose terms and conditions apply. Purchases are made directly from the Course Provider, and as such, content and materials are supplied by the Course Provider directly. Reed is acting as agent and not reseller in relation to this course. Reed's only responsibility is to facilitate your payment for the course. It is your responsibility to review and agree to the Course Provider's terms and conditions and satisfy yourself as to the suitability of the course you intend to purchase. Reed will not have any responsibility for the content of the course and/or associated materials.