Course Content
Introduction to Data Engineering with Azure
Introduction to Data Engineering with Azure
Using Conditional Split and Filtering in ADF
This chapter explores how to use Conditional Split and Filter Transformations in Azure Data Factory (ADF) Data Flows to organize and refine data for downstream processes. You'll learn how to divide data based on conditions and filter out unwanted records efficiently.
In Azure Data Factory, a Conditional Split transformation can be used to route records based on conditions, such as splitting data into "High" and "Low" categories based on a sales amount. For example, if the sales amount is greater than 1000, the record is sent to the "High Sales" output, otherwise, it's sent to the "Low Sales" output for further processing.
For example, if you want to filter out records with null or invalid email addresses, you can apply a filter that removes records where the email is either null or does not match a valid email format.
How to Use Conditional Split and Filter Transformations in ADF
- Create a new Data Flow or use an existing one in the Author section of Azure Data Factory Studio;
- Drag a Source Transformation onto the Data Flow canvas and configure it to ingest data, such as from SQL tables or Blob Storage;
- Add a Conditional Split Transformation from the toolbox and connect it to your data source;
- In the Conditional Split settings, define conditions to split the data into multiple streams. In our case, we used the following:
LowRisk
: DeathRate < 5;HighRisk
: DeathRate > 10;MediumRisk
: DeathRate >= 5 and <= 10;
- Add a Filter Transformation and connect it to the data stream that you want to filter;
- In the Filter settings, define the filter condition to keep only necessary records. In our case we remained only records where the
WeekEndingDate
is after/ before'2021-09-01'
; - Connect each output stream to separate Sink Transformations to store the split data in different destinations (e.g., one for low risk, one for high risk, and one for medium risk);
- Validate the Data Flow configuration to ensure everything works correctly.
Thanks for your feedback!