Course Content

Introduction to Data Engineering with Azure

1. Getting Started with Azure and Core Tools

Getting Started Understanding Azure Subscriptions Organizing Resources with Resource Groups Using Azure Blob Storage for Data Storage Setting Up Azure SQL Database Managing Credentials with Azure Key Vault Introduction to Azure Data Factory

2. Foundations of Azure Data Factory

Linked Services Datasets What is ADF Pipeline?Script Activity Lookup Activity Dataset Parameters ForEach Activity Execute Pipeline Activity

3. Data Flows and Transformations in ADF

Understanding Data Flows in Azure Data Factory Derived Column Transformation Combining Data with Joins Using Conditional Split and Filtering in ADF Grouping, Select, and Sort Transformations in ADF

4. Practical Problem Solving with ADF

Challenge: Loading Data to Several Tables Solving Loading Data Challenge Challenge: Conducting Transaction Analysis Solving Conducting Analysis Challenge

Understanding Data Flows in Azure Data Factory

For instance, imagine a scenario where you need to clean, enrich, and aggregate sales data from multiple regions. Instead of writing extensive SQL or Python scripts, you can use a Data Flow to visually map these transformations and execute them seamlessly within ADF.

Key Components of Data Flows

Source Transformation: defines where the data originates, such as Blob Storage or a SQL Database;
Transformations: include tools like filtering, joining, aggregating, or deriving new columns to manipulate the data;
Sink Transformation: specifies the destination for the processed data, such as another SQL Database, a data lake, or a file storage.

We will start our work with creating simple dataflow with source and sink transformations.

How to Set Up a Source Transformation

Add a new Data Flow in the Author section of Azure Data Factory Studio;
Drag a Source Transformation from the toolbox onto the Data Flow canvas;
In the Source Transformation settings, select a Linked Service, such as Azure SQL Database or Azure Blob Storage, to connect to your data source;
Choose an existing Dataset or create a new Dataset that represents the data to be ingested;
Configure file format options if connecting to Blob Storage, or provide a SQL query to filter or structure the incoming data for databases;
Validate the configuration and preview the data to ensure the source is correctly set up.

Sink Transformation for Processed Data

After defining transformations, use a Sink Transformation to specify where the transformed data will be stored. For example, you might save aggregated data back to the SQL database or export it as a CSV file to Blob Storage.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat