Big Data / Hadoop Development

Big Data / Hadoop Development

Duration: 1 months
Fee type: Free

Apache Hadoop is an open source platform for dealing with Big Data. The platform has been designed in such a way, so that making computing more reliable, scalable and distributed, therefore, making the process more streamlined as well as faster. Hadoop is based on Java therefore it is supported on any platform. Hadoop makes it possible to run applications on systems with thousands of commodity hardware. Hadoop basic components include Hadoop Distributed File System (HDFS) and MapReduce. Besides this Hadoop eco-system also consists of other related projects such as SQOOP, HIVE, PIG, ZOOKEEPER, FLUME etc. which complements and extends Hadoop’s basic functionality. In this course we will learn Hadoop’s basic components as well as the Hadoop’s eco system supported technologies with hands on experience. 


1. Big Data and Hadoop Introduction

Duration: 1 week

·         What is Big Data?

·         Types of Data

·         Sources of Big Data

·         Limitations and Solutions of existing Data Analytics Architecture

·         Introduction to Hadoop

·         Features of Hadoop

·         Hadoop Components

·         Hadoop Storage : HDFS

·         Hadoop Processing : MapReduce Framework

2. Hadoop Environment Setup

Duration: 1 week

·         Types of Deployment

·         Install & Configure Hadoop.

·         Configure HDFS

·         Configure YARN

·         Configure MapReduce

·         Hadoop Logging

·         Install and Configure Hive and Pig

3. Hadoop Concept and Architecture

Duration: 1 week

·         Hadoop Components

·         HDFS

·         MapReduce

·         HDFS Basic Concepts

·         Write Operation in HDFS

·         Read Operation in HDFS

·         NameNode , DataNode and SecondaryNode

·         JobTracker

·         TaskTracker

·         Accessing HDFS

4. Hadoop MapReduce Framework

Duration: 1 week

·         What is MapReduce

·         Features of MapReduce

·         Use cases of MapReduce

·         MapReduce Framework

·         Read/Write Operation in MapReduce

·         Mapper/Reducer Class, Driver code

·         Combiner and Partitioner

5. Programming with the Hadoop Core API

Duration: 1 week

·         A Sample MapReduce Program

·         List Processing

·         Data Flow

·         API Concept

·         Mapper and Reducer together in MapReduce

·         MapReduce Application in Java

·         The Driver Class

·         Job Invocation

·         Mapper complete Code

·         Reducer Complete code

·         Processing the Values

·         Using Eclipse for MapReduce Program

6. Sorting and Searching large data sets using MapReduce

Duration: 1 week

·         Sorting

·         Searching

·         Secondary Sort

7. Indexing data and inverted Index

Duration: 1 week

·         Indexing

·         Inverted Index Algorithm

·         Data flow

·         Term Frequency-Inverse Document Frequency(TF-IDF)

·         Calculating Word co-occurrences

7. Indexing data and inverted Index

Duration: 1 week

·         Indexing

·         Inverted Index Algorithm

·         Data flow

·         Term Frequency-Inverse Document Frequency(TF-IDF)

·         Calculating Word co-occurrences

8. Hadoop Eco System component : Sqoop

Duration: 1 week

·         RDBMS strengths & weaknesses

·         Augument Existing Databases using Hadoop

·         Advantages of Hadoop

·         Hadoop Tradeoffs

9. Sqoop Introduction

Duration: 1 week

·         Importing  data from RDBMS to HDFS

·         SQL to Hadoop

·         Custom Sqoop Connectors

·         Basic Syntax

·         Connecting to database server

·         Import Data

·         Free-form Query Imports

·         Examples

10. HIVE

Duration: 1 week

·         Hive and Pig Introduction

·         Hive Features

·         Hive Data Model

·         Hive Data Types

·         Timestamps data type

·         The Hive Metastore

·         Physical Layout

·         Creating Table

·         Loading data into Hive

·         Using Sqoop to import data into HIVE Tables

·         Basic Select Query

·         Joiining Tables

·         Sorting output

·         Creating User-defined functions

·         Limitations of Hive

11. PIG

Duration: 1 week

·         Pig Introduction

·         Pig Latin

·         Pig Concept and Features

·         A Simple Pig Script

·         More PigLatin: FOREACH

·         Pig Vs SQL

12. Zookeeper

Duration: 1 week

·         Configure Zookeeper

·         About Zookeeper

·         Components in Zookeeper

13. Flume

Duration: 1 week

·         Flume Basics

·         Flume Architecture

·         Flow in Flume

·         Features of Flume

·         Flume Agent Characteristics

·         Flume Usage Pattern

An Academic Initiative of CEBS © Adept 2017-18. All Right Reserved. Privacy Policy