Conteúdo do Curso
Introduction to Data Engineering with Azure
Introduction to Data Engineering with Azure
2. Foundations of Azure Data Factory
Understanding Data Flows in Azure Data Factory
For instance, imagine a scenario where you need to clean, enrich, and aggregate sales data from multiple regions. Instead of writing extensive SQL or Python scripts, you can use a Data Flow to visually map these transformations and execute them seamlessly within ADF.
Key Components of Data Flows
- Source Transformation: defines where the data originates, such as Blob Storage or a SQL Database;
- Transformations: include tools like filtering, joining, aggregating, or deriving new columns to manipulate the data;
- Sink Transformation: specifies the destination for the processed data, such as another SQL Database, a data lake, or a file storage.
We will start our work with creating simple dataflow with source and sink transformations.
How to Set Up a Source Transformation
- Add a new Data Flow in the Author section of Azure Data Factory Studio;
- Drag a Source Transformation from the toolbox onto the Data Flow canvas;
- In the Source Transformation settings, select a Linked Service, such as Azure SQL Database or Azure Blob Storage, to connect to your data source;
- Choose an existing Dataset or create a new Dataset that represents the data to be ingested;
- Configure file format options if connecting to Blob Storage, or provide a SQL query to filter or structure the incoming data for databases;
- Validate the configuration and preview the data to ensure the source is correctly set up.
Sink Transformation for Processed Data
After defining transformations, use a Sink Transformation to specify where the transformed data will be stored. For example, you might save aggregated data back to the SQL database or export it as a CSV file to Blob Storage.
Tudo estava claro?
Obrigado pelo seu feedback!
Seção 3. Capítulo 1