big data advance

Course Features

Skill level:



 3 Months



Practical Ratio:

 30 : 70





Course Details

This course is designed to equip participants with the best skill sets needed to accelerate in the world of Big Data. This course will be a mix of classroom lectures & hands-on practical sessions on Cloudera Hadoop distribution covering some advanced development frameworks of latest Big data technologies that include Apache Spark, Apache Kafka, Apache Flume, Advance Hive, Hbase etc. The course will cover detailed explanation of each frameworks including self-assessments & a lot of discussions on industry usecases. The course would give you a steep edge in the Big Data world with a lot of career opportunities.


Basic knowledge of any programming language.
Familiar with Big Data basic frameworks like HDFS, Hive, Map Reduce etc.

Course Outcome

At the end of the course, you should be able to :

  • Equip with Big Data frameworks & industry use cases.
  • Ability to clear Big Data Certifications like CCA 175 etc.
  • Develop complex programs in Big Data frameworks like Apache Spark, Kafka etc.
  • Build Machine Learning models using Spark ML.
  • Code for Real Time Streaming usecases & analytics.


  • What is RDBMS & SQL
  • Overview of Big Data & Hadoop
  • Why HDFS & key concepts
  • Map Reduce
  • YARN Architecture
  • Sqoop, Hive & Impala
  • Quick Insight to Hive Fundamentals
  • Handling Complex Data Types
  • Usage of Regex Expression
  • Hive UDF (User Defined Functions)
  • Developing Hive Script
  • Introduction to Apache Kafka
  • Kafka Fundamentals
  • Kafka Integrations
  • Kafka API
  • Kafka Administration
  • Kafka Streams
  • What is Apache Spark?
  • Using the Spark Shell
  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark using python
  • RDD Operations
  • Key-Value Pair RDDs
  • MapReduce & Pair RDD Operations
  • Overview
  • A Spark Standalone Cluster
  • The Spark Standalone Web UI
  • RDD Partitions & HDFS Data Locality
  • Working with Partitions
  • Executing Parallel Operations
  • RDD Lineage
  • Caching Overview
  • Distributed Persistence
  • Spark Streaming Overview
  • Example: Streaming Word Count
  • Other Streaming Operations
  • Sliding Window Operations
  • Developing Spark Streaming Applications
  • Iterative Algorithms
  • Graph Analysis
  • Machine Learnings
  • Shared Variables: Broadcast Variables
  • Shared Variables: Accumulators
  • Common Performance Issues
  • Summarization Design Pattern (Discovering Association Rules)
  • Filtering & Joining
  • Performance Optimization Techniques
  • Flume Fundamentals
  • Flume Sources
  • Flume Sinks
  • Flume Configuration
  • HBase Architecture
  • HBase Components
  • HBase v/s RDBMS
  • Hbase Shell
  • Filters in Hbase
  • Use Case Discussions & Industry best standards