Workshop on Big Data Analytics with Hadoop and Spark

Event by:

Kumaraguru School of Innovation

Organised by:

Department of Information Science and Engineering


[email protected]

Date: 04th October 2021 - 08th October 2021

Big Data Technologies are changing the world with its advanced analytical capabilities. With our day-to-day Internet usage, data is growing dramatically, and we need the right technologies to analyse this data. That’s the reason why the emerging technologies like Hadoop and Spark are getting popular and helping the industry to make use of this data. Big Data Technologies plays a major role in the data science and Artificial Intelligence domain. It also gives the business insights and helps to make a profitable business.

The benefits of Bigdata Technologies are

  • Business decisions
  • Healthcare Advancements
  • Market Insights
  • Risk management
  • Target Customers

Who this course is for?

This course is designed for students who have some basic programming knowledge and can write simple programs in Python. This course will help students to understand some Bigdata Technologies like Hadoop and Spark. We would discuss more on the real-world problems and how and where these Bigdata technologies would fit in.

The outcome from this workshop:

Once students complete this workshop, they should be able to create a Hadoop cluster and set up HDFS by themselves. They should be able to perform ETL Operations using spark. In addition, they will also know about how to process the live streaming data using Spark.

Course Content

TimeLine Slot 1 – Theory

(2:00 – 3:15pm)

Slot 2 – Hands On

(3:30 – 5:00)

Day – 1 Big Data Fundamentals Hadoop Basics and Hadoop Installation
Day – 2 Hadoop Architecture

  • HDFS
  • MapReduce
HDFS Operations

MapReduce – word count problem

Day – 3 Hadoop Architecture

  • YARN
  • HIVE
HIVE Operations
Day – 4 Apache Spark Introduction

  • Spark RDD and Dataframes
  • Transformations and Actions
Spark Installation
Pyspark Architecture
RDD and Dataframe operations using pyspark
Day – 5 Spark SQL and Spark Functions

Spark Streaming

Access HIVE tables using pyspark
Spark Streaming – Stock data Analysis.

Learn Hadoop with Python
Install Hadoop / Spark
HDFS Operations
MapReduce Problems
Hive Operations using Spark – pyspark
Spark Streaming

Industry Expert – Ambika Santhosh
Software Architect with 17 years of experience in software development with strong technical expertise.
Have worked on various technologies like Python, Pyspark, Machine Learning in R Programming/Python, Natural Language Processing (NLP), Django, C/C++, Java on Linux and on several automation projects using Shell Scripting and Python. Have skilled in the areas of agile development, DevOps delivery and Data Analytics.

Registration fees for external participants Rs 1000/-

Registration fees for Internal participants Rs 500/-

Kindly pay your registration fee through NEFT to the below bank account and fill the details of payment in the registration form given below

Payment Details:

Institution’s Bank A/c Details

Name of the Beneficiary – The Correspondent, Kumaraguru College of Technology Bank
A/c No. – 1245115000000014
Type of Bank A/c – Current A/c
Bank Name – The Karur Vysya Bank Limited Bank
Branch – KCT Campus Branch
Bank Address – The Karur Vysya Bank Limited KCT Campus Coimbatore
IFS Code – KVBL0001245
MICR Code – 641053005