Зміст курсу
Introduction to Data Engineering with Azure
Introduction to Data Engineering with Azure
Challenge: Loading Data to Several Tables
In this task, we will work with credit card data, with the main goal of loading this data into Azure while meeting all necessary requirements.
Imagine you are working for a bank that handles a significant volume of credit card data. Your team has been tasked with organizing this data into a format that can be easily analyzed based on card types. You've been provided with a dataset containing information about various cards, including both debit and credit cards. Your job is to load this dataset into an Azure SQL database and ensure that the card information is stored in separate tables for each card type: one table for credit cards, another for debit cards, and so on.
The main dataset looks as follows:
This task involves:
- Loading data from CSV files to the cloud;
- Separating card data into distinct tables based on the card type;
- Ensuring the data is properly formatted for future analysis.
The resulting tables will look as follows.
Credit Cards Table
Debit Cards Table
Debit Cards (Prepaid) Table
Please note that the dataset may contain more than three card types, so you need to create separate tables for each of them!
Hint
To solve this task, you can use the materials from the second section. Here's a step-by-step approach to tackle this:
- First, you need to load the raw data into the database. This involves reading the CSV file and populating the target table with all card data;
- Once the data is in the database, ensure the correct data types are applied to each column (e.g., ensuring numeric fields like
credit_limit
are recognized as numeric, date fields likeacct_open_date
are formatted correctly, etc.); - After the data is loaded and formatted, you can perform a Lookup activity in Azure Data Factory (ADF) to identify all distinct card types. This will give you a list of unique card types present in the dataset;
- Use a ForEach activity to process each unique card type separately. Inside the loop, you can filter the data by card type, ensuring that each card type has its own table;
- For each card type, create a separate table in your database and insert the relevant records from the dataset.
By following these steps, you will be able to correctly segregate and load the data into different tables based on the card type.
Дякуємо за ваш відгук!