Apache Hadoop Architecture Development and Administration

Name: Apache Hadoop Architecture Development and Administration
Price: 2376.0 GBP

Pairview Training

Price

£2,376 inc VAT

Study method

Onsite

Duration

4 days

Qualification

No formal qualification

Certificates

Certificate of completion - Free

Additional info

Tutor is available to students

Overview

This course equips you with the knowledge and skills to become an Apache Hadoop Developer. You will be exposed to different industry use case scenarios, the core concepts (HDFS and MapReduce) and implementation of Hadoop, how to develop robust data processing applications, MapReduce and how to write MapReduce codes, and Hadoop Distributed Files System (HDFS). You will also learn best practice Hadoop Development, debugging and implementation of workflows.

Certificates

Certificate of completion

Digital certificate - Included

Description

Understand the concept of HDFS and MapReduce frameworkDevelop robust data processing applicationsWrite Hadoop codesLearn best practice in a Hadoop development environment

Introduction

2. The Motivation for Hadoop

Problems with Traditional Large-Scale Systems
Introducing Hadoop
Hadoopable Problems

Hadoop: Basic Concepts and HDFS

The Hadoop Project and Hadoop Components
The Hadoop Distributed File System

Introduction to MapReduce

MapReduce Overview
Example: WordCount
Mappers
Reducers

Hadoop Clusters and the Hadoop Ecosystem

Hadoop Cluster Overview
Hadoop Jobs and Tasks
Other Hadoop Ecosystem Components

Writing a MapReduce Program in Java

Basic MapReduce API Concepts
Writing MapReduce Drivers, Mappers, and Reducers in Java
Speeding Up Hadoop Development by Using Eclipse
Differences between the Old and New MapReduce APIs

Writing a MapReduce Program using Streaming

Writing Mappers and Reducers with the Streaming API
Unit Testing MapReduce Programs
Unit Testing
The JUnit and MRUnit Testing Frameworks
Writing Unit Tests with MRUnit
Running Unit Tests

Delving Deeper into the Hadoop API

Using the ToolRunner Class
Setting Up and Tearing Down Mappers and Reducers
Decreasing the Amount of Intermediate Data with Combiners
Accessing HDFS Programmatically
Using the Distributed Cache
Using the Hadoop API’s Library of Mappers, Reducers, and Partitioners

Practical Development Tips and Techniques

Strategies for Debugging MapReduce Code
Testing MapReduce Code Locally by Using LocalJobRunner
Writing and Viewing Log Files
Retrieving Job Information with Counters
Reusing Objects
Creating Map-Only MapReduce Jobs

Partitioners and Reducers

How Partitioners and Reducers Work Together
Determining the Optimal Number of Reducers for a Job
Writing Customer Partitioners

Data Input and Output

Creating Custom Writable and WritableComparable Implementations
Saving Binary Data Using SequenceFile and Avro Data Files
Issues to Consider When Using File Compression
Implementing Custom InputFormats and OutputFormats

Common MapReduce Algorithms

Sorting and Searching Large Data Sets
Indexing Data
Computing Term Frequency — Inverse Document Frequency
Calculating Word Co-Occurrence
Performing Secondary Sort

Joining Data Sets in MapReduce Jobs

Writing a Map-Side Join
Writing a Reduce-Side Join

Integrating Hadoop into the Enterprise Workflow

Integrating Hadoop into an Existing Enterprise
Loading Data from an RDBMS into HDFS by Using Sqoop
Managing Real-Time Data Using Flume
Accessing HDFS from Legacy Systems with FuseDFS and HttpFS

An Introduction to Hive, Imapala and Pig

The Motivation for Hive, Impala, and Pig
Hive Overview
Impala Overview
Pig Overview
Choosing Between Hive, Impala, and Pig

An Introduction to Oozie

Introduction to Oozie
Creating Oozie Workflows

Questions and answers

Reviews

Currently there are no reviews for this course. Be the first to leave a review.

Leave a review

This course is advertised on reed.co.uk by the Course Provider, whose terms and conditions apply. Purchases are made directly from the Course Provider, and as such, content and materials are supplied by the Course Provider directly. Reed is acting as agent and not reseller in relation to this course. Reed's only responsibility is to facilitate your payment for the course. It is your responsibility to review and agree to the Course Provider's terms and conditions and satisfy yourself as to the suitability of the course you intend to purchase. Reed will not have any responsibility for the content of the course and/or associated materials.

FAQs

Study method describes the format in which the course will be delivered. At Reed Courses, courses are delivered in a number of ways, including online courses, where the course content can be accessed online remotely, and classroom courses, where courses are delivered in person at a classroom venue.

CPD stands for Continuing Professional Development. If you work in certain professions or for certain companies, your employer may require you to complete a number of CPD hours or points, per year. You can find a range of CPD courses on Reed Courses, many of which can be completed online.

A regulated qualification is delivered by a learning institution which is regulated by a government body. In England, the government body which regulates courses is Ofqual. Ofqual regulated qualifications sit on the Regulated Qualifications Framework (RQF), which can help students understand how different qualifications in different fields compare to each other. The framework also helps students to understand what qualifications they need to progress towards a higher learning goal, such as a university degree or equivalent higher education award.

An endorsed course is a skills based course which has been checked over and approved by an independent awarding body. Endorsed courses are not regulated so do not result in a qualification - however, the student can usually purchase a certificate showing the awarding body's logo if they wish. Certain awarding bodies - such as Quality Licence Scheme and TQUK - have developed endorsement schemes as a way to help students select the best skills based courses for them.

View all

Similar subjects: IT, Data analysis, Business

Apache Hadoop Architecture Development and Administration

Overview

Certificates

Certificate of completion

Description

Questions and answers

Reviews

Course provided by

FAQs

Nail your next interviewNew

Summary

Overview Overview

Certificates Certificates

Certificate of completion

Description Description

Questions and answers Questions and answers

Reviews Reviews

Add to basket or enquire

Course provided by

Legal information

FAQs

What does study method mean?

What are CPD hours/points?

What is a 'regulated qualification'?

What is an 'endorsed' course?

Overview

Certificates

Description

Questions and answers

Reviews