Data Engineering Online Training in noida, AWS online training noida, Azure online training delhi ncr

Curriculum:

Module 1: Cloud(Azure and AWS Database services)

AWS RDS
Creating a database server on RDS and connecting to it
Azure Database
Creating a database server on azure and connecting to it

Free Demo

Module 2: Using SQL on cloud

Install and configure MySQL Server on the cloud History of DBMS

Free Demo

Module 3 :SQL Commands

Module 4: SQL function and group by clause

Functions in MySQL Expressions
Using Functions
String Functions
Date and time functions
Numeric Functions
Aggregate Functions
Spaces in Function Names
Impact of functions on Query Performance
How to optimally use functions
Group By Clause
Having Clause
Where vs. Having Clause
Group By with Rollup and Cube

Module 5: SQL Multi-table Queries

Table Name Aliases
Joins
Types of joins:
Inner Join
Outer Join
? Left Outer Join
? Right Outer Join
? Full Outer Join
o Equi Join vs. Non-Equi Join
o Natural Join
Join with More than 2 Tables
Self-Join with Example
Impact of Joins and Self-Join on query performance
Optimal use of joins
Importance of the order of tables in joins
Subqueries
Independent and Co-Related Sub Queries
Derived Tables/Temporary Tables
Derived Columns
Using IN and NOT IN
Using EXISTS and NOT EXISTS
JOINS vs. Subqueries
Using Sub Query with DML Statement
Set Operations using
UNION
EXCEPT
UNION vs. UNION ALL
UNION vs. JOIN

Module 6: SQL Window Functions & CTE

Window Functions
Window Aggregate Functions
Window Ranking Functions:
Over Clause:
CTE Syntax
CTE Query with Example
CTEs vs. Subqueries
Recursive CTEs
Natural Sorting
Drawbacks of CTEs
Impact of CTEs and Window Functions on Query Performance

Module 7: SQL Advance concept

Views
Introduction to Views
Defining Advantages of Views
Creating, Altering, and Dropping Views
Modifying data through Views
Simple view, Complex Views
User-defined Variables
Use of variables to optimize queries
Stored Routines
Types of Stored Routines
Stored Procedures
Error Handling
Signals
Stored Functions
Triggers
Cursors

Module 8: SQL Indexing

Introduction to Indexes
Types Of Index
Commands related to Indexes
Create an Index using SSMS
Indexes and Query Optimization/li>

Module 9: SQL Query Pipelining and Optimization

Query Pipeline
Query Optimization Techniques
Spooling
Hash Matching
Keyup
Using advisors
Using profilers
Understanding the messages and their reasons generated by profilers
Query Logging
Slow Query Logs
Using EXPLAIN statement
Query execution time

Module 10: SQL Solutioning with Schemas

Pivot Table/Joining Table
Crossover Tables
Creating normalized and optimal schemas as per business requirement

Module 11: Python Introduction & Revision

Input & Output
Introduction to Blocks and Statements
Flow Control in Python
Control Flow Statements
if Statements
elif
Using a Debugger in IntelliJ or Pycharm
More on if, elif and else
if, elif, and else in the Debugger
Using if with strings
Simple condition
Conditional Operators
Using and, or, in Conditions
Simplify Chained Comparison
Boolean Expression True and False
Truthy Values
in and not in
for loops
Stepping through a for loop
for loops Extracting Values from User Input
Extracting capitals
Iterating Over a Range
For loop
More About Ranges
For loop with step
Nested for loops
continue
Initialising Variables and None
More on while loops
The Random Module and Import

Module 12: Python Function Introduction

Module 13: Python Collections ,Tuples, Dictionary

Module 14: Python Exception handling

Exception Handling
Input-Output Handling
File Handling

Module 15: Python Object Oriented programming

Module 16: Python For Data Analysis – NumPy

Module 17: Pandas Part-1 ,Pandas Data frame Fundamentals

Module 18: Pandas Part-2

Module 19: Connecting Python with Relational Databases

Module 20: ETL with Python

Module 21: Python for Data Visualization - Seaborn & Pandas Built In Data Visualization

Module 22: Python with Cloud and Automation Testing With Pytest

Module 23: Unix

Unix Introduction
Processing & Listing
Processing & Listing/li>
Basic Commands
Working With Files
Working With Directories
Word Count
Filter & Date
Grep/Egrep/Fgrep
Sed
Permissions
VI Editor
Shell Scripting
CLA/if-else
DB Connectivity

Module 24: BigData Overview

What is Big data?
Distributed computing
Overview of Big Data
Characteristics of Big Data
Types of data
Sources of Big Data
Big Data examples
What is streaming data?
Batch vs Streaming data processing
Big data Hadoop opportunities

Module 25: Hadoop Overview

Why we need Hadoop
Data centers and Hadoop Cluster overview
Hadoop Cluster and Racks
Hadoop ecosystem tools overview
Understanding the Hadoop configurations and Installation.

Module 26: HDFS Overview

HDFS Architecture
Namenode, Datanode, Secondary Namenode
Hadoop FS and Processing Environment’s UIs
Fault Tolerant
High Availability
Block Replication
Hadoop FS shell commands
Rack Awareness.

Module 27: Map Reduce Overview

The introduction of MapReduce
MapReduce Architecture
Data flow in MapReduce
Understand Difference Between Block and InputSplit
Role of RecordReader
Basic Configuration of MapReduce
MapReduce life cycle
How MapReduce Works

Module 28: YARN Overview

YARN (Hadoop Processing Framework)
YARN Daemons
Resource Manager, NodeManager etc.
Job assignment & Execution flow

Module 29: Hive Overview

What is Hive?
Why Hive??Hive Characteristics
Hive Architecture
Hive Components
Shell, Driver, Parser, Metastore, Execution Engine
Hive Features
Hive Limitations

Module 30: Hive Setup / Configuration

Hive Command Line Interfaces - Hive Shell, Beeline
Hive Commands - Interactive and Batch Mode
Hive Configuration Files

Module 31: Hive Concepts

Hive Data Model
Managed Tables
External Tables
Partitioning, Clustering
Partitioned Tables
Bucketed/Clustered Tables
Views, Joins, Indexes

Module 32: Hive QL – Basics

Keywords
Data Types
Operators
Functions

Module 33: Hive QL - DDL Operations

Create
Drop
Alter
Truncate Schema
Table
View
Index
Show
Describe

Module 34: Hive QL - DML Operations

Load, Insert, Update, Delete
Import, Export from/into files or tables

Module 35: Hive QL – Queries

Select queries
Conditional queries
Partition queries
Grouping and Aggregation
Sorting, Ordering, Clustering, Distributing
Union, Joins, Subqueries, Views

Module 36: Working with Different File Formats

Text
CSV
JSON
XML
XML
Avro
Parquet
ORC

Module 37: Hive QL - UDF Overview

Built-in function
Built-in Aggregate Functions (UDAF)
Built-in Table Generating Functions (UDTF)
UDF internals
Writing custom UDF

Module 38: Hive QL - Advanced Concepts Overview

Sampling, Virtual Columns, Lateral View
Windowing, OVER and Analytics, Common Table Expressions
Transactions, Counters
Indexes, Statistics/Analyse, Locks
Explain Plan, Authorisation, Archiving

Module 39: Sqoop Overview

What is Sqoop
Sqoop Architecture
Import/Export
incremental imports
Sqoop Job

Module 40: Spark Overview

What is Spark?
Why Spark?
Spark vs MR
Features of Apache Spark
Limitations of Apache Spark
Spark Ecosystem:
Spark Core
Spark SQL
Spark Streaming
MLib
GraphX
SparkR

Module 41: Spark Setup / Configuration

Modes of Operation
Spark Environment Setup
Spark Configuration Files
Spark Shell
Spark UI

Module 42: Spark Architecture

Spark Architecture?Standalone
Spark on YARN?Spark on Mesos?Spark on K8s
Spark Components
Driver
Master Node
Worker Nodes
Executors

Module 43: Spark Concepts

Spark Configuration
Spark Context
Spark Session
Resilient Distributed Dataset (RDD)
Partitions, Transformations, Actions
DAG (Direct Acyclic Graph)

Module 44: PySpark Overview

PySpark Overview
Environment Setup
PySpark Basics
PySpark programming

Module 44: PySpark Overview

What is RDD
Different Ways to create RDD
Parallelized Collections
External Datasets

Module 45: Pyspark Transformations & Actions

a) Transformations
Wide Transformation
Narrow Transformation
b) Actions
map/flatMap/filter/mapPartitions
Intersection/distinct/groupByKey
reduceByKey/sortByKey/join/coalesce
countByValue/countByKey/reduce
AggreagteByKey
union/foreach
Explore all transformations PYSPARK WEBUI
Understnd how JOB, STAGE, TASK created in PYSPARK WEBUI

Module 46: RDD Persistence

RDD Persistence using Cache
RDD persistence using Persist fucntion
RDD different storage levels
MEMORY_ONLY
MEMORY_AND_DISK
DISK_ONLY
MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.
How to choose the persistence level

Module 47: Pyspark Shared Variables

Broadcast Variable
Accumulator Variable
Use cases on Broadcast variable
Use cases on Accumulator

Module 48: Spark SQL Overview

What is Spark SQL?
Spark SQL Architecture
Writing SQL Queries
Converting DF to Tables
Catalyst Optimizer
Spark Joins
Apache Spark Application Execution Life Cycle and Spark UI
Pyspark Dataframe APIs to solve the problems using Dataframe style APIs.

Module 49: DataFrames Overview

What are DataFrames?
RDD vs DataFrame
DataFrame vs DataSet
RDD to DF
Read DF with Different options
Save Options

Module 50: DataFrames API Transformations

Introduction to Transformations and Extractions
DataFrame APIs Introduction
DataFrame APIs Selection
DataFrame APIs Filter or Where
DataFrame APIs Sorting
DataFrame APIs Set
DataFrame APIs Join
DataFrame APIs Aggregation
DataFrame APIs GroupBy
DataFrame APIs Window
Repartition Vs Coalesce Method of a DataFrame
DataFrame APIs Sampling Functions
DataFrame Built-in Functions Introduction
DataFrame Built-in Functions_New Column Functions
DataFrame Built-in Functions_String Functions
DataFrame Built-in Functions_RegExp Functions
DataFrame Built-in Functions_Date Functions
DataFrame Built-in Functions_Null Functions
DataFrame Built-in Functions_Collection Functions
DataFrame Built-in Functions_na Functions
DataFrame Built-in Functions_Math and Statistics Functions
DataFrame Built-in Functions_Explode and Flatten Function
DataFrame Built-in Functions_Formatting Functions.
DataFrame Built-in Functions_Json Functions
Register DataFrame as Temp Table
Perform various operations using SQL queries

Module 51: DataFrames API Extractions

? DataFrame Extraction Introduction
? Spark Session Read/Write API
? DataFrame Extractions - csv
? DataFrame Extractions APIs - text
? DataFrame Extractions - parquet
? DataFrame Extractions - orc json
? DataFrame Extractions - hive
? DataFrame Extractions – jdbc

Module 51: DataFrames API Extractions

DataFrame Extraction Introduction
Spark Session Read/Write API
DataFrame Extractions - csv
DataFrame Extractions APIs - text
DataFrame Extractions - parquet
DataFrame Extractions - orc json
DataFrame Extractions - hive
DataFrame Extractions – jdbc

Module 52: Metastore and Hive Integration

Intro to Spark SQL Hive Integration
Understanding Warehouse Directory
Managing Spark Metastore Databases
Managing Spark Metastore Databases
Managing Spark Metastore Tables
Retrieve Metadata of Tables
Role of Spark Metastore or Hive Metastore

Module 53: PySpark Streaming Overview

Intro to Stream Processing
Batch vs Real-time Processing
Storm vs Spark
Stream Processing Usecases

Module 54: DStreams Overview

Intro to DStreams
Micro Batch Windowing Concept
Sliding Window
Tumbling Window DStreams Transformations
Integration with different sources
Socket Source, File Source, etc.

Module 55: Structured Streaming Overview

Intro to Structured Streaming
DataFrame API for Streaming
Connecting to Streaming Source
Reading Data from Streaming Source
Writing Streaming Output
Output Modes - Complete, Append, Update
DataFrame Transformations for Streaming
Stream Processing Example with DataFrame API

Module 56: Structured Streaming Advanced

Windowing Concept
Sliding Window
Tumbling Window
Aggregation in a time window
Adding timestamp to streaming data
Tumbling window and Sliding window
Event Time Windows
Processing Time Windows
Handelling late data
Watermarking
Integration with different sources
Socket Source, File Source, Kafka, etc.

Module 57: Kafka Overview

Introduction
Terminology
Architecture
Replication
Producer
Consumer
Broker
Internals of Producer
Internals of Consumer
Single Node Implementation
Multi Node Implementation
Offsets
Brokers/Partitions
Topics
Consumer Groups
Rebalancing Groups

Module 58: Producer / Consumer API

Producer Configurations
Produce Message into Kafka Topic
Consumer Configurations
Consume Message from Kafka Topic

Module 59: Structured Streaming Kafka Integration

Connecting to Kafka Streaming Source
Reading Data from Kafka Topic using DataFrame API
Processing the Data using DataFrame API
Writing Streaming Output to Kafka Topic using DataFrame API

Module 60: Airflow Overview

Introduction to Airflow
Workflows as Code
Features of Airflow
Airflow Architecture
Components of Airflow
Setup and Configuration of Airflow

Module 61: Airflow Concepts

DAGs
Declaring, Loading, Running DAGs
Tasks
Operators
Dynamic Task Mapping
Data Aware Scheduling
Triggers/li>
Task Flow
Executor
Scheduler
DAG File Processing
Airflow UI / CLI

Module 62: Creating Workflows with Airflow

Create workflows to automate Data Engineering pipeline
Automate Data Ingestion, Cleansing, Validation and Transformation jobs
Monitor and Manage the workflows

Module 63: Azure Databricks Overview

Introduction to Azure Databricks
Azure Databricks Architecture
Creating Azure Databricks Service

Module 64: Azure Databricks Clusters

Azure Databricks Cluster Types
Creating Azure Databricks Cluster
Azure Databricks Cluster Pool

Module 65: Azure Databricks Notebooks

Intro to Azure Databricks Notebooks
Working with Notebooks
Magic Commands
Databricks Utilities

Module 66: Azure Databricks Storage

Intro to Azure BLOB storage
Intro to Databricks File System (DBFS)
Intro to Azure Data Lake Storage
Databricks Mount overview
Creating Azure Data Lake Storage
Mounting Azure Data Lake Storage

Module 67: PySpark Recap

Recap PySpark Programming
PySQL SQL
DataFrame API

Module 68: Azure Databricks Data Ingestion

Intro to Data Ingestion
Loading Data into Azure BLOB storage
Loading Data from BLOB to Databricks pipeline
Data Ingestion - CSV
Data Ingestion - JSON
Data Ingestion - Other Files
Handling Bulk / Incremental Load

Module 69: Azure Databricks Data Processing

Databricks Workflows

Databricks Jobs

Filter & Join Transformations

Aggregations

Grouping

Window Functions

Using Spark SQL

Performing Transformations, Joins and Aggregations

Data Analysis

Module 70: Delta Lake

Intro to Data Lakehouse Architecture
Delta Lake Overview
Handling incremental Load
Performing CRUD operations with Delta Lake
Delta Lake Transaction Logs
Data Ingestion
Data Transformation
Key features and usage

Module 71: Azure Data Factory (ADF) Overview

Intro to Azure Data Factory
Azure Data Factory Architecture
Azure Data Factory Components
Creating ADF Service
Key Features and Usecases

Module 72: Data Engineering Pipeline with ADF

Data Ingestion from Azure Blob
Data Ingestion from HTTP, other sources
Creating ADF pipeline
Pipelines with Ingestion, Transformations and Processing

Module 73: Synapse / Power BI Integration

Synapse Overview
Power Bi Overview
Integrating in the pipeline
Visualizing the metrics using Power BI

Module 74: Deployment and Monitoring

Deployment of Data Engineering Pipelines
CI/CD for Data Factory Pipelines
Monitoring and Managing the Deployments

Module 75: Data Warehousing Concepts

Data Warehousing Overview
ETL (Extract Transform Load) Overview
Fact Tables
Dimension Tables
Star Schema
Snow Flake Schema
Slowly Changing Dimensions (SCD)
Data Mart Overview
Business Intelligence (BI) Overview

Module 76: Cloud Concepts

Intro to Cloud Computing
Cloud Providers Overview
Cloud Services Overview
Features and Benefits of Cloud

Module 77: Snowflake Overview

What is Snowflake?
The Snowflake Story
Signup for Snowflake
Using Snowflake UI
Understanding Database objects
Container heirarchy
Creating our first Database, Schema & Table
Load Data into our first table
Setting up command line interface – Snowsql

Modsule 78: Snowflake Warehouse for Compute

Creating our first Virtual Warehouse
Virtual Warehouse sizes & Scalability
Warehouse Credits
Warehouse Suspend & Resume
Warehouse - Maximized vs Auto Scale
Multi-Cluster Warehouse Scaling policy's
Multi-cluster virtual warehouse or Scaling Out
Working with Resource Monitors

Module 79: Architecture, Features & Pricing

Key Concepts & Features
Shared disk vs Shared Nothing Architecture
Columnar storage & Micro Partitions
Multi Cluster Shared Data Architecture
Snowflake 3 layer Architecture
Supported Cloud Platforms & Regions
Snowflake Editions
Snowflake Releases
Snowflake Pricing
Data Integration configurations
Data Integration with ETL tools
Data Integration with reporting tool (BI)
Integration with Programming Connectors (Python)

Module 80: Loading & Unloading Structured Data

Ingestion / Loading Methods
Steps to Managing Loads
Preparing data for loading
File Format Object
Staging your data
Copy options & ON_ERROR
Loading data from an Internal stage
Working with rejected records
Different types of Internal stages
Unloading of data

Module 81: Loading from AWS

Creating S3 bucket
Upload files in S3
Creating policy
Creating integration object
Loading from S3

Module 82: Continuous Data Loading

What is Snowpipe
Loading data via Snowpipe - high level steps
Storage Integration

Module 83: Stage data transformations

Querying Data in Staged Files
Querying Metadata for Staged Files
Transforming Data During a Load

Module 84: Semi Structured Data

Data Types for Semi Structured data
File Formats
Loading JSON data
Loading Parquet data
Loading XML data
Handling nested data, array data, hierarchy data
Analytics on JSON data

Module 85: Databases, Tables & Views

Working with Temporary, Transient & Permanent Tables
Working with External Tables
Overview of Views
Working with Materialized Views
Table Design Considerations

Module 86: Time Travel, Failsafe & Zero Copy Clones

How Time Travel works
Travel to a specific time or before through a query
Assignment: Time Travel to a point in time
Undrop databases, schema & tables
Assignment: Un-drop tables, schemas & databases
Assignment: Test drive the time travel
Failsafe in Snowflake
Lab: View storage used by Fail-safe
Assignment: Understanding storage used by Fail-safe
Zero Copy Cloning
How cloning is different from copying
Cloning with Time Travel

Module 87: Query & Cache Management

Types of Cache
Maximize Cache usage
Lab: Query Caching in Action
Query Profile
Query History
Query profile & Disk Spilling
Information Schema
Account Usage Schema

Module 88: Performance Considerations

Imapct of Data distribution in Micro-partitions
Clustering Keys
Clustering Depth
Clustering large tables & improve partition elimination
Lab: Cluster keys for large tables
Search Optimization
Benefits and Cost of Search Optimization

Module 89: Change Data Capture

Table Streams
Versioning & Read Isolation
Stream types & Columns in Stream
Data flow
Scheduling through Tasks
Tree of Tasks
Task History
SCD Implementation

Module 90 : Secure Data Sharing

Secure Data Sharing in Snowflake
Sharing - who's compute is used?
Sharing with other Snowflake users
Sharing - Data always up-todate
Sharing with non Snowflake Users

Module 91: Snowflake Access Management

Snowflake Approach to Access Control & Key Concepts
Role Hierarchy in Snowflake
ACCOUNTADMIN role
SECURITYADMIN role
SYSADMIN role
Custom roles
PUBLIC role

Module 92: ETL Tool / Talend

Understanding Big Data & Talend
Working with Talend Opend Studio v8
File Components in Talend - Csv/Json/Xml/Excel

Module 93: Talend

Contextualization - Local/Project/Global etc.
Error & exception handling
Java components (tJava, tJavaRow etc.)
Database components in Talend
Transformations & Executing Jobs remotely in Talend
Joblets, Looping & Pivot components
Parallelization
HDFS & Hive in Talend
Spark in Talend
Talend with Kafka
Dynamic Data Ingestion
Orchestration | TAC

Course Highlights

No. of hours: 100 hrs.

Star Rating: (5)

Trainer:

Lokesh

Mr. Lokesh has more than 17 years of experience. He has worked with many organizations for almost 17+ years and associated with reputed organizations like Torque IT (South Africa), First National Bank (South Africa), UST Global (Trivandrum), UST Global, Reliance Corporate (Mumbai), National Training Institute- Muscat (Oman), CGI (Bangalore), UST Global and Microsoft. He is a Microsoft Certified Trainer (MCT) ,Microsoft Certified Azure Data Engineer, Microsoft Certified Azure Developer, Microsoft Certified Azure Administrator MCSD in .NET.

Fee:

Online Virtual Class Room

Enroll in any above batch and attend live class at scheduled time

62498

Upcoming Batches:

FAQ:

Who can learn this course?

Anyone interested and has prior computer knowledge can learn Data Engineering. Get yourself enrolled today with Wifi Learning

What are the prerequisites for this course?

No prerequisite as such, if you have computer knowledge you can learn Data Engineering

What are career opportunities from this course?

You can apply for jobs like Data Engineer.

How long this course will take?

It will take 4 months.

Will you provide soft copy material?

Yes, we will share the PowerPoint soft copy material and we will provide a recording of our live classes.

Will you start this course from scratch?

Yes. We will start this course from scratch.

Do you provide a certificate for Data Engineering?

Yes we will provide a Training Completion certificate.

Do you provide projects to work on?

Yes, after completion of the course you will have to work on 2 or 3 projects. It will be an experience to clear the interviews confidently.

Recommended Courses:

Recommended courses

Reviews:

Jun-03-2024

Data Engineering

Curriculum:

Course Highlights

No. of hours: 100 hrs.

Star Rating: (5)

Lokesh

Fee:

Online Virtual Class Room

Online Virtual Class Room

Upcoming Batches:

FAQ:

Recommended Courses:

Recommended courses

Java Programming

Data Science with Python

AWS Solutions Architect Certification Course

Advanced Python

Data Analyst

Interview Skills Development

Azure Data Engineering

Reviews:

Popular Courses

Contact us