Contenido del Curso
Introduction to Data Engineering with Azure
Introduction to Data Engineering with Azure
Challenge: Conducting Transaction Analysis
Financial institutions frequently assess the creditworthiness of their customers to enhance risk management and make better-informed decisions. By analyzing debt levels and financial behavior through user transactions, institutions can categorize customers into risk groups.
For example, individuals with a high debt-to-income ratio, indicating they owe significantly more than they earn, might be flagged as Risk Users due to their potential financial instability. In contrast, those with a healthier balance between their income and debt are classified as Normal Users, reflecting a lower likelihood of financial risk.
In the previous challenge, we loaded the cards data, and now the goal is to classify credit card users as either Risk Users or Normal Users based on their custom debt-to-income ratio.
To complete this task, you will need additional data:
- Transactions data to calculate key metrics such as the average transaction amount, the sum of all transactions, and the chip usage rate (the ratio of transactions made using chips compared to the total number of transactions);
- Users data to calculate the custom debt-to-income ratio, which is determined by dividing the total debt by the yearly income.
By using these datasets, you can evaluate users' financial behavior and classify them into the appropriate risk category.
Users Table
Transactions Table
As the result of this task you will have two tables - one for risk users and one for normal users. They will look like this.
Risk Users
Normal Users
Algorithm Description
To solve this task, you can use the materials from the third section. Here's a step-by-step guide on how to do the task:
- First, load the raw data into the database. This involves reading the CSV files for users, transactions, and cards and populating the respective tables. To do it you should use Script and Copy activities just like in the previous challenge;
- After loading, ensure the correct data types are applied by using another Script activity;
- Filter the users table to include only users with credit cards. This can be done by joining the users and cards tables on
client_id
. You should use that cards table that stores only credit cards data (you have created this table in the previous challenge); - Calculate the Credit Score Ratio for the filtered users by dividing
total_debt
byyearly_income
. Based on this ratio:- Classify users with a ratio > 50% as Risk Users;
- Classify users with a ratio ≤ 50% as Normal Users;
- Aggregate the transactions data for each user group (Risk Users and Normal Users) to compute the following metrics:
- Total Transaction Amount: Sum of all transactions;
- Average Transaction Amount;
- Chip Usage Ratio: Proportion of chip transactions to total transactions;
- Create separate tables for Risk Users and Normal Users in the database and populate them with the aggregated metrics, ensuring all required fields (e.g.,
client_id
, metrics) are included.
By following these steps, you can efficiently process the data and create the required outputs using only dataflow activities. Good luck!
¡Gracias por tus comentarios!