facebook

Data Engineering

Data Engineering is the world’s most comprehensive and broadly adopted technology. With Wifi Learning you can get expertise in AWS, AZURE, SQL, PYspark, Talend, Hive, Tableau, PowerBI etc.


Curriculum:

  • AWS RDS
  • Creating a database server on RDS and connecting to it
  • Azure Database
  • Creating a database server on azure and connecting to it
  • Install and configure MySQL Server on the cloud History of DBMS
  • DDL
  • DML
  • DQL
  • DCL
  • TCL
  • Functions in MySQL Expressions
  • Using Functions
  • String Functions
  • Date and time functions
  • Numeric Functions
  • Aggregate Functions
  • Spaces in Function Names
  • Impact of functions on Query Performance
  • How to optimally use functions
  • Group By Clause
  • Having Clause
  • Where vs. Having Clause
  • Group By with Rollup and Cube
  • Table Name Aliases
  • Joins
  • Types of joins:
  • Inner Join
  • Outer Join
  • ? Left Outer Join
  • ? Right Outer Join
  • ? Full Outer Join
  • o Equi Join vs. Non-Equi Join
  • o Natural Join
  • Join with More than 2 Tables
  • Self-Join with Example
  • Impact of Joins and Self-Join on query performance
  • Optimal use of joins
  • Importance of the order of tables in joins
  • Subqueries
  • Independent and Co-Related Sub Queries
  • Derived Tables/Temporary Tables
  • Derived Columns
  • Using IN and NOT IN
  • Using EXISTS and NOT EXISTS
  • JOINS vs. Subqueries
  • Using Sub Query with DML Statement
  • Set Operations using
  • UNION
  • EXCEPT
  • UNION vs. UNION ALL
  • UNION vs. JOIN
  • Window Functions
  • Window Aggregate Functions
  • Window Ranking Functions:
  • Over Clause:
  • CTE Syntax
  • CTE Query with Example
  • CTEs vs. Subqueries
  • Recursive CTEs
  • Natural Sorting
  • Drawbacks of CTEs
  • Impact of CTEs and Window Functions on Query Performance
  • Views
  • Introduction to Views
  • Defining Advantages of Views
  • Creating, Altering, and Dropping Views
  • Modifying data through Views
  • Simple view, Complex Views
  • User-defined Variables
  • Use of variables to optimize queries
  • Stored Routines
  • Types of Stored Routines
  • Stored Procedures
  • Error Handling
  • Signals
  • Stored Functions
  • Triggers
  • Cursors
  • Introduction to Indexes
  • Types Of Index
  • Commands related to Indexes
  • Create an Index using SSMS
  • Indexes and Query Optimization/li>
  • Query Pipeline
  • Query Optimization Techniques
  • Spooling
  • Hash Matching
  • Keyup
  • Using advisors
  • Using profilers
  • Understanding the messages and their reasons generated by profilers
  • Query Logging
  • Slow Query Logs
  • Using EXPLAIN statement
  • Query execution time
  • Pivot Table/Joining Table
  • Crossover Tables
  • Creating normalized and optimal schemas as per business requirement
  • Input & Output
  • Introduction to Blocks and Statements
  • Flow Control in Python
  • Control Flow Statements
  • if Statements
  • elif
  • Using a Debugger in IntelliJ or Pycharm
  • More on if, elif and else
  • if, elif, and else in the Debugger
  • Using if with strings
  • Simple condition
  • Conditional Operators
  • Using and, or, in Conditions
  • Simplify Chained Comparison
  • Boolean Expression True and False
  • Truthy Values
  • in and not in
  • for loops
  • Stepping through a for loop
  • for loops Extracting Values from User Input
  • Extracting capitals
  • Iterating Over a Range
  • For loop
  • More About Ranges
  • For loop with step
  • Nested for loops
  • continue
  • Initialising Variables and None
  • More on while loops
  • The Random Module and Import
  • Exception Handling
  • Input-Output Handling
  • File Handling
  • Unix Introduction
  • Processing & Listing
  • Processing & Listing/li>
  • Basic Commands
  • Working With Files
  • Working With Directories
  • Word Count
  • Filter & Date
  • Grep/Egrep/Fgrep
  • Sed
  • Permissions
  • VI Editor
  • Shell Scripting
  • CLA/if-else
  • DB Connectivity
  • What is Big data?
  • Distributed computing
  • Overview of Big Data
  • Characteristics of Big Data
  • Types of data
  • Sources of Big Data
  • Big Data examples
  • What is streaming data?
  • Batch vs Streaming data processing
  • Big data Hadoop opportunities
  • Why we need Hadoop
  • Data centers and Hadoop Cluster overview
  • Hadoop Cluster and Racks
  • Hadoop ecosystem tools overview
  • Understanding the Hadoop configurations and Installation.
  • HDFS Architecture
  • Namenode, Datanode, Secondary Namenode
  • Hadoop FS and Processing Environment’s UIs
  • Fault Tolerant
  • High Availability
  • Block Replication
  • Hadoop FS shell commands
  • Rack Awareness.
  • The introduction of MapReduce
  • MapReduce Architecture
  • Data flow in MapReduce
  • Understand Difference Between Block and InputSplit
  • Role of RecordReader
  • Basic Configuration of MapReduce
  • MapReduce life cycle
  • How MapReduce Works
  • YARN (Hadoop Processing Framework)
  • YARN Daemons
  • Resource Manager, NodeManager etc.
  • Job assignment & Execution flow
  • What is Hive?
  • Why Hive??Hive Characteristics
  • Hive Architecture
  • 29
  • Hive Components
  • Shell, Driver, Parser, Metastore, Execution Engine
  • Hive Features
  • Hive Limitations
  • Hive Command Line Interfaces - Hive Shell, Beeline
  • Hive Commands - Interactive and Batch Mode
  • 30
  • Hive Configuration Files
  • Hive Data Model
  • Managed Tables
  • External Tables
  • Partitioning, Clustering
  • Partitioned Tables
  • Bucketed/Clustered Tables
  • Views, Joins, Indexes
  • Keywords
  • Data Types
  • Operators
  • Functions
  • Create
  • Drop
  • Alter
  • Truncate Schema
  • Table
  • View
  • Index
  • Show
  • Describe
  • Load, Insert, Update, Delete
  • Import, Export from/into files or tables
  • Select queries
  • Conditional queries
  • Partition queries
  • Grouping and Aggregation
  • Sorting, Ordering, Clustering, Distributing
  • Union, Joins, Subqueries, Views
  • Text
  • CSV
  • JSON
  • XML
  • XML
  • Avro
  • Parquet
  • ORC
  • Built-in function
  • Built-in Aggregate Functions (UDAF)
  • Built-in Table Generating Functions (UDTF)
  • UDF internals
  • Writing custom UDF
  • Sampling, Virtual Columns, Lateral View
  • Windowing, OVER and Analytics, Common Table Expressions
  • Transactions, Counters
  • Indexes, Statistics/Analyse, Locks
  • Explain Plan, Authorisation, Archiving
  • What is Sqoop
  • Sqoop Architecture
  • Import/Export
  • incremental imports
  • Sqoop Job
  • What is Spark?
  • Why Spark?
  • Spark vs MR
  • Features of Apache Spark
  • Limitations of Apache Spark
  • Spark Ecosystem:
  • Spark Core
  • Spark SQL
  • Spark Streaming
  • MLib
  • GraphX
  • SparkR
  • Modes of Operation
  • Spark Environment Setup
  • Spark Configuration Files
  • Spark Shell
  • Spark UI
  • Spark Architecture?Standalone
  • Spark on YARN?Spark on Mesos?Spark on K8s
  • Spark Components
  • Driver
  • Master Node
  • Worker Nodes
  • Executors
  • Spark Configuration
  • Spark Context
  • Spark Session
  • Resilient Distributed Dataset (RDD)
  • Partitions, Transformations, Actions
  • DAG (Direct Acyclic Graph)
  • PySpark Overview
  • Environment Setup
  • PySpark Basics
  • PySpark programming
  • What is RDD
  • Different Ways to create RDD
  • Parallelized Collections
  • External Datasets
  • a) Transformations
  • Wide Transformation
  • Narrow Transformation
  • b) Actions
  • map/flatMap/filter/mapPartitions
  • Intersection/distinct/groupByKey
  • reduceByKey/sortByKey/join/coalesce
  • countByValue/countByKey/reduce
  • AggreagteByKey
  • union/foreach
  • Explore all transformations PYSPARK WEBUI
  • Understnd how JOB, STAGE, TASK created in PYSPARK WEBUI
  • RDD Persistence using Cache
  • RDD persistence using Persist fucntion
  • RDD different storage levels
  • MEMORY_ONLY
  • MEMORY_AND_DISK
  • DISK_ONLY
  • MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.
  • How to choose the persistence level
  • Broadcast Variable
  • Accumulator Variable
  • Use cases on Broadcast variable
  • Use cases on Accumulator
  • What is Spark SQL?
  • Spark SQL Architecture
  • Writing SQL Queries
  • Converting DF to Tables
  • Catalyst Optimizer
  • Spark Joins
  • Apache Spark Application Execution Life Cycle and Spark UI
  • Pyspark Dataframe APIs to solve the problems using Dataframe style APIs.
  • What are DataFrames?
  • RDD vs DataFrame
  • DataFrame vs DataSet
  • RDD to DF
  • Read DF with Different options
  • Save Options
  • Introduction to Transformations and Extractions
  • DataFrame APIs Introduction
  • DataFrame APIs Selection
  • DataFrame APIs Filter or Where
  • DataFrame APIs Sorting
  • DataFrame APIs Set
  • DataFrame APIs Join
  • DataFrame APIs Aggregation
  • DataFrame APIs GroupBy
  • DataFrame APIs Window
  • Repartition Vs Coalesce Method of a DataFrame
  • DataFrame APIs Sampling Functions
  • DataFrame Built-in Functions Introduction
  • DataFrame Built-in Functions_New Column Functions
  • DataFrame Built-in Functions_String Functions
  • DataFrame Built-in Functions_RegExp Functions
  • DataFrame Built-in Functions_Date Functions
  • DataFrame Built-in Functions_Null Functions
  • DataFrame Built-in Functions_Collection Functions
  • DataFrame Built-in Functions_na Functions
  • DataFrame Built-in Functions_Math and Statistics Functions
  • DataFrame Built-in Functions_Explode and Flatten Function
  • DataFrame Built-in Functions_Formatting Functions.
  • DataFrame Built-in Functions_Json Functions
  • Register DataFrame as Temp Table
  • Perform various operations using SQL queries
  • ? DataFrame Extraction Introduction
  • ? Spark Session Read/Write API
  • ? DataFrame Extractions - csv
  • ? DataFrame Extractions APIs - text
  • ? DataFrame Extractions - parquet
  • ? DataFrame Extractions - orc json
  • ? DataFrame Extractions - hive
  • ? DataFrame Extractions – jdbc
  • DataFrame Extraction Introduction
  • Spark Session Read/Write API
  • DataFrame Extractions - csv
  • DataFrame Extractions APIs - text
  • DataFrame Extractions - parquet
  • DataFrame Extractions - orc json
  • DataFrame Extractions - hive
  • DataFrame Extractions – jdbc
  • Intro to Spark SQL Hive Integration
  • Understanding Warehouse Directory
  • Managing Spark Metastore Databases
  • Managing Spark Metastore Databases
  • Managing Spark Metastore Tables
  • Retrieve Metadata of Tables
  • Role of Spark Metastore or Hive Metastore
  • Intro to Stream Processing
  • Batch vs Real-time Processing
  • Storm vs Spark
  • Stream Processing Usecases
  • Intro to DStreams
  • Micro Batch Windowing Concept
  • Sliding Window
  • Tumbling Window DStreams Transformations
  • Integration with different sources
  • Socket Source, File Source, etc.
  • Intro to Structured Streaming
  • DataFrame API for Streaming
  • Connecting to Streaming Source
  • Reading Data from Streaming Source
  • Writing Streaming Output
  • Output Modes - Complete, Append, Update
  • DataFrame Transformations for Streaming
  • Stream Processing Example with DataFrame API
  • Windowing Concept
  • Sliding Window
  • Tumbling Window
  • Aggregation in a time window
  • Adding timestamp to streaming data
  • Tumbling window and Sliding window
  • Event Time Windows
  • Processing Time Windows
  • Handelling late data
  • Watermarking
  • Integration with different sources
  • Socket Source, File Source, Kafka, etc.
  • Introduction
  • Terminology
  • Architecture
  • Replication
  • Producer
  • Consumer
  • Broker
  • Internals of Producer
  • Internals of Consumer
  • Single Node Implementation
  • Multi Node Implementation
  • Offsets
  • Brokers/Partitions
  • Topics
  • Consumer Groups
  • Rebalancing Groups
  • Producer Configurations
  • Produce Message into Kafka Topic
  • Consumer Configurations
  • Consume Message from Kafka Topic
  • Connecting to Kafka Streaming Source
  • Reading Data from Kafka Topic using DataFrame API
  • Processing the Data using DataFrame API
  • Writing Streaming Output to Kafka Topic using DataFrame API
  • Introduction to Airflow
  • Workflows as Code
  • Features of Airflow
  • Airflow Architecture
  • Components of Airflow
  • Setup and Configuration of Airflow
  • DAGs
  • Declaring, Loading, Running DAGs
  • Tasks
  • Operators
  • Dynamic Task Mapping
  • Data Aware Scheduling
  • Triggers/li>
  • Task Flow
  • Executor
  • Scheduler
  • DAG File Processing
  • Airflow UI / CLI
  • Create workflows to automate Data Engineering pipeline
  • Automate Data Ingestion, Cleansing, Validation and Transformation jobs
  • Monitor and Manage the workflows
  • Introduction to Azure Databricks
  • Azure Databricks Architecture
  • Creating Azure Databricks Service
  • Azure Databricks Cluster Types
  • Creating Azure Databricks Cluster
  • Azure Databricks Cluster Pool
  • Intro to Azure Databricks Notebooks
  • Working with Notebooks
  • Magic Commands
  • Databricks Utilities
  • Intro to Azure BLOB storage
  • Intro to Databricks File System (DBFS)
  • Intro to Azure Data Lake Storage
  • Databricks Mount overview
  • Creating Azure Data Lake Storage
  • Mounting Azure Data Lake Storage
  • Recap PySpark Programming
  • PySQL SQL
  • DataFrame API
  • Intro to Data Ingestion
  • Loading Data into Azure BLOB storage
  • Loading Data from BLOB to Databricks pipeline
  • Data Ingestion - CSV
  • Data Ingestion - JSON
  • Data Ingestion - Other Files
  • Handling Bulk / Incremental Load
  • Databricks Workflows
  • Databricks Jobs
  • Filter & Join Transformations
  • Aggregations
  • Grouping
  • Window Functions
  • Using Spark SQL
  • Performing Transformations, Joins and Aggregations
  • Data Analysis
    • Intro to Data Lakehouse Architecture
    • Delta Lake Overview
    • Handling incremental Load
    • Performing CRUD operations with Delta Lake
    • Delta Lake Transaction Logs
    • Data Ingestion
    • Data Transformation
    • Key features and usage
    • Intro to Azure Data Factory
    • Azure Data Factory Architecture
    • Azure Data Factory Components
    • Creating ADF Service
    • Key Features and Usecases
    • Data Ingestion from Azure Blob
    • Data Ingestion from HTTP, other sources
    • Creating ADF pipeline
    • Pipelines with Ingestion, Transformations and Processing
    • Synapse Overview
    • Power Bi Overview
    • Integrating in the pipeline
    • Visualizing the metrics using Power BI
    • Deployment of Data Engineering Pipelines
    • CI/CD for Data Factory Pipelines
    • Monitoring and Managing the Deployments
    • Data Warehousing Overview
    • ETL (Extract Transform Load) Overview
    • Fact Tables
    • Dimension Tables
    • Star Schema
    • Snow Flake Schema
    • Slowly Changing Dimensions (SCD)
    • Data Mart Overview
    • Business Intelligence (BI) Overview
    • Intro to Cloud Computing
    • Cloud Providers Overview
    • Cloud Services Overview
    • Features and Benefits of Cloud
    • What is Snowflake?
    • The Snowflake Story
    • Signup for Snowflake
    • Using Snowflake UI
    • Understanding Database objects
    • Container heirarchy
    • Creating our first Database, Schema & Table
    • Load Data into our first table
    • Setting up command line interface – Snowsql
    • Creating our first Virtual Warehouse
    • Virtual Warehouse sizes & Scalability
    • Warehouse Credits
    • Warehouse Suspend & Resume
    • Warehouse - Maximized vs Auto Scale
    • Multi-Cluster Warehouse Scaling policy's
    • Multi-cluster virtual warehouse or Scaling Out
    • Working with Resource Monitors
    • Key Concepts & Features
    • Shared disk vs Shared Nothing Architecture
    • Columnar storage & Micro Partitions
    • Multi Cluster Shared Data Architecture
    • Snowflake 3 layer Architecture
    • Supported Cloud Platforms & Regions
    • Snowflake Editions
    • Snowflake Releases
    • Snowflake Pricing
    • Data Integration configurations
    • Data Integration with ETL tools
    • Data Integration with reporting tool (BI)
    • Integration with Programming Connectors (Python)
    • Ingestion / Loading Methods
    • Steps to Managing Loads
    • Preparing data for loading
    • File Format Object
    • Staging your data
    • Copy options & ON_ERROR
    • Loading data from an Internal stage
    • Working with rejected records
    • Different types of Internal stages
    • Unloading of data
    • Creating S3 bucket
    • Upload files in S3
    • Creating policy
    • Creating integration object
    • Loading from S3
    • What is Snowpipe
    • Loading data via Snowpipe - high level steps
    • Storage Integration
    • Querying Data in Staged Files
    • Querying Metadata for Staged Files
    • Transforming Data During a Load
    • Data Types for Semi Structured data
    • File Formats
    • Loading JSON data
    • Loading Parquet data
    • Loading XML data
    • Handling nested data, array data, hierarchy data
    • Analytics on JSON data
    • Working with Temporary, Transient & Permanent Tables
    • Working with External Tables
    • Overview of Views
    • Working with Materialized Views
    • Table Design Considerations
    • How Time Travel works
    • Travel to a specific time or before through a query
    • Assignment: Time Travel to a point in time
    • Undrop databases, schema & tables
    • Assignment: Un-drop tables, schemas & databases
    • Assignment: Test drive the time travel
    • Failsafe in Snowflake
    • Lab: View storage used by Fail-safe
    • Assignment: Understanding storage used by Fail-safe
    • Zero Copy Cloning
    • How cloning is different from copying
    • Cloning with Time Travel
    • Types of Cache
    • Maximize Cache usage
    • Lab: Query Caching in Action
    • Query Profile
    • Query History
    • Query profile & Disk Spilling
    • Information Schema
    • Account Usage Schema
    • Imapct of Data distribution in Micro-partitions
    • Clustering Keys
    • Clustering Depth
    • Clustering large tables & improve partition elimination
    • Lab: Cluster keys for large tables
    • Search Optimization
    • Benefits and Cost of Search Optimization
    • Table Streams
    • Versioning & Read Isolation
    • Stream types & Columns in Stream
    • Data flow
    • Scheduling through Tasks
    • Tree of Tasks
    • Task History
    • SCD Implementation
    • Secure Data Sharing in Snowflake
    • Sharing - who's compute is used?
    • Sharing with other Snowflake users
    • Sharing - Data always up-todate
    • Sharing with non Snowflake Users
    • Snowflake Approach to Access Control & Key Concepts
    • Role Hierarchy in Snowflake
    • ACCOUNTADMIN role
    • SECURITYADMIN role
    • SYSADMIN role
    • Custom roles
    • PUBLIC role
    • Understanding Big Data & Talend
    • Working with Talend Opend Studio v8
    • File Components in Talend - Csv/Json/Xml/Excel
    • Contextualization - Local/Project/Global etc.
    • Error & exception handling
    • Java components (tJava, tJavaRow etc.)
    • Database components in Talend
    • Transformations & Executing Jobs remotely in Talend
    • Joblets, Looping & Pivot components
    • Parallelization
    • HDFS & Hive in Talend
    • Spark in Talend
    • Talend with Kafka
    • Dynamic Data Ingestion
    • Orchestration | TAC


    Fee:

    Online Virtual Class Room

    Enroll in any above batch and attend live class at scheduled time

    62498      




    Upcoming Batches:



    FAQ:

    Who can learn this course?
    Anyone interested and has prior computer knowledge can learn Data Engineering. Get yourself enrolled today with Wifi Learning
    What are the prerequisites for this course?
    No prerequisite as such, if you have computer knowledge you can learn Data Engineering
    What are career opportunities from this course?
    You can apply for jobs like Data Engineer.
    How long this course will take?
    It will take 4 months.
    Will you provide soft copy material?
    Yes, we will share the PowerPoint soft copy material and we will provide a recording of our live classes.
    Will you start this course from scratch?
    Yes. We will start this course from scratch.
    Do you provide a certificate for Data Engineering?
    Yes we will provide a Training Completion certificate.
    Do you provide projects to work on?
    Yes, after completion of the course you will have to work on 2 or 3 projects. It will be an experience to clear the interviews confidently.


    Reviews:



    Sep-14-2023

    R.Ranjan

    I learned not only the technical aspects of data analytics but also the ethical considerations.

    Sep-04-2023

    Alok Mishra

    curriculum is as per the industry requirement , well experienced and industry expert trainer . i got promoted as a data engineer with a good hike . it became possible due to dedicated training . i would like to say if you are looking for data engineering please go with wifi learning . thank you.

    Aug-19-2023

    Ishan rathi

    Thanks to the knowledgable & approachable trainers of Wifi learning, the course helped me get promoted to Data Engineer from Quality Analyst with a 50% hike in salary!

    Aug-19-2023

    N. Saujanya

    The certification took me from a Project Manager to a Senior Position with a good hike. All credit to the informative and easy-to-understand online study material!

    Contact us

    Phone: +91-9999468662 +91-9999468661

    Email: info@wifilearning.com

    Enquiry