Big-Data and Hadoop training |writeabc

Big-data and Hadoop TrainingBig Data and Hadoop

  • Hadoop and Hadoop Ecosystem
  • Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency
  • Hadoop Distributed File System (HDFS) Concepts and its Importance
  • Deep Dive in Map Reduce – Execution Framework, Partitioner, Combiner, Data Types, Key pairs
  • HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow
  • Parallel Copying with DISTCP, Hadoop Archives

Hadoop development training Course Content

A) Big Data – Motivation & Basics.

B) Hadoop Administration – Architecture, Setups, Manipulation & Maintenances ……… (with Prerequisites -Linux).

C) Hadoop Development – a) MapReduce (Basics) b)Real World MapReduce (Advance) …..(with Prerequisites –Java).

D) Corporate Technologies – Hive, Pig, HBase, Oozie, Flume, Sqoop, Zookeeper, Mahout ……(with Prerequisites –SQL).

E) Cloud Computing – Concepts & Deploying hadoop on cloud (AWS- EC2, S3, EMR, & others as per requirement of Project).

F) Beyond Hadoop – Strom, Sparks, Mesos…..etc & future scope of Hadoop with these coming technologies.


A) Big Data

What is big data?
Challenges in big data
Challenges in traditional Applications
New Requirement
Introducing hadoop development training
Brief History of hadoop development training
Features of Hadoop
Overview of hadoop development training Ecosystem
Overview of MapReduce
B) Hadoop Administration

1) Linux –

Basic architecture
Important commands
File permission and ownership
Pipe etc.
2) Setup Single (pseudo-node) Cluster

Important Directories,
Configuring HDFS & Important Configuration Properties.
3) Interacting with HDFS.

Common Example Operations
HDFS Command Reference
DFSAdmin Command Reference
Using HDFS For MapReduce
HDFS Web Interface
And how to Setup Multi-node Cluster.
Hands–on Exercises and Assignment.
4) Additional HDFS Tasks

Rebalancing Blocks
Copying Large Sets of Files
Decommissioning Nodes
Verifying File System Health
Rack Awareness
Cluster Configuration
Small Clusters: 2-10 Nodes
Medium Clusters: 10-40 Nodes
Large Clusters: Multiple Racks
Performance Monitoring
Hands-on Exercises and Assignment
C) Hadoop Development

1) MapReduce -1

Java – basic Oops concepts, Serialization, I/O, Collection, Sorts ..etc.
Configure eclipse environment for Mapreduce development & run first Program.
Hands-on Exercises and Assignment.
2) MapReduce -2

Explanation of first program in details describing Mapper, Reducer, Driver.
MapReduce Algorithms and whole process flow – map, partition, sort, shuffle, reduce.
Related terms – Input formats, Input Splits, Speculative Execution..etc
Other related Algorithm – Combiner, Partitioner.
Hands-on Exercises and Assignment.
3) MapReduce -3

Discussion and solution of various program and their use cases in real world.
Local Runner and Usage of Tool runners.
Setup/Cleanup method in mapper/reducer.
Passing the parameters to mapper and reducer.
Searching Algorithm.
Distributed cache.