Add to basket or enquireEnquire now
About This Course
This course will enable delegates to build complete, unified big data applications combining batch, streaming, and interactive analytics using different data types. Delegates will learn how to write sophisticated parallel applications to implement faster and better decisions and real-time actions that can be applied to a wide variety of use cases, architectures, and industries.
Benefits to Learners
At the end of this course, you will be able to:
- Build unified big data applications combining batch, streaming, and interactive analytics.
- Understand how to write sophisticated parallel applications.
- Understand different APIs which Spark offers such as Spark Streaming, MLlib, SparkSQL, GraphX.
Part 1: Scala (Day 1 & 2)
- Introduction to Scala
- Why is Scala used
- Advantages of Scala for data science
- Installing Scala
2. Functional programming v/s Object oriented programming
- What is functional programming
- What is object oriented programming
- Advantages of functional programming
3. Basic object-oriented programming
4. Scala basics
- Key concepts of Scala
- Scala syntax
- Comments in Scala
- Keywords in Scala
- Scala Identifiers
- Scala packages
- Scala worksheet
- Scala repl session
5. Programming in Scala
- Creating variables
- Data types
- Type inference
- Conditional statements
- Decision statements
6. Functions in Scala
- Defining functions
- High order functions
- Using special functions
- Sequence – Indexed and Linear sequence
- Set – Hash set, tree set, list set
- Maps – Hash map, tree map, list map
8. Idiomatic Scala
- Pattern matching
- Handling exceptions
- Try and Catch
9. Object oriented scala
- Classes, fields, and methods
- Singleton objects
- Case classes
10. File I/O
- Reading files in Scala
- Writing to files in Scala
11. Parallel Processing in Scala
- Parallel processing in Scala
- Advantages of parallel collections
- Mapping functions over parallel collections
- Filtering parallel collections
- When to use parallel collections
Part 2: Apache Spark (Day 3 & 4)
1. Introduction to SPARK
- Defining Big Data and Big Computation
- What is Spark?
- What is its purpose?
- What are the benefits of Spark?
- Components of the Spark unified stack
- Resilient Distributed Dataset (RDD)
- Downloading and installing Spark standalone
2. Resilient Distributed Dataset
- Creating RDD
- RDD Transformations
- RDD Actions
- Programming with RDD
- Numeric RDD operations
- Pair RDDs
- Transformations and Actions on Pair RDDs
- Joins and Reduction Operations
3. Structured data: SQL, DataFrame, and Datasets
- Spark SQL
- DataFrame and datasets
- Datasets instead of RDD
- Combining RDDs with the powerful automatic optimizations behind Spark SQL.
- Connecting to databases with JDBC
4. Defining the Spark architecture
- Partitioning data across the cluster using Resilient Distributed Datasets (RDD) and DataFrames
- Apportioning task execution across multiple nodes
- Running applications with the Spark execution model
- Creating resilient and fault-tolerant clusters
- Achieving scalable distributed storage
- Monitoring and administering Spark applications
5. Performing machine learning with Spark
- MLLIB: Machine Learning Library
- Predicting outcomes with supervised learning
- Building a decision tree classifier
- Grouping data using unsupervised learning
- Clustering with the k-means method
6. Streaming Data in Spark
- Implementing sliding window operations
- Determining state from continuous data
- Processing simultaneous streams
- Improving performance and reliability
- Streaming from built-in sources (e.g., log files, Twitter sockets, Kinesis, Kafka)
- Processing with the streaming API and Spark SQL
7. GraphX Library
- Introduction to Graphs
- Building the graph
- Creating Graph Frames
- Standard Graph Algorithms
- Breadth First Search
- Connected Components
- Strongly connected components
- Shortest Paths
- Basic graph and dataframe queries
Who is this course for?
This course is suitable for data architects, database developers and administrators that are looking to advance their career to Big Data engineering.
If delegates have not had such experience with database design and development, they should meet the following entry requirements:
- A completed graduate degree with a minimum of a 2:2 is essential.
- An academic background related to technology, IT, programming, engineering or technical science is advisable.
- Loving to work with technical processes and development, and aspiring to build a successful career in the technical, infrastructural part of Business Analytics.
- Knowledge of a programming language (e.g. Java).
Big data engineering
Big data solution development
Questions and answers
Currently there are no reviews for this course. Be the first to leave a review.