Learn Aggregating and Enriching Data | Data Transformation and Loading

Swipe to show menu


              123456789101112131415161718192021222324252627282930
            
import pandas as pd

# Sample sales data
sales_data = pd.DataFrame({
    "region": ["East", "West", "East", "West", "East"],
    "salesperson": ["Alice", "Bob", "Alice", "Charlie", "David"],
    "units_sold": [10, 20, 15, 10, 5],
    "revenue": [100, 200, 150, 120, 50]
})

# Aggregating total units sold and revenue per region
aggregated = sales_data.groupby("region").agg({
    "units_sold": "sum",
    "revenue": "sum"
}).reset_index()

print("Aggregated by region:")
print(aggregated)

# Enrichment dataset: region targets
region_targets = pd.DataFrame({
    "region": ["East", "West"],
    "target_revenue": [300, 350]
})

# Joining aggregated data with enrichment (targets)
enriched = pd.merge(aggregated, region_targets, on="region", how="left")

print("\nEnriched with targets:")
print(enriched)

Aggregating and enriching data are essential steps in preparing datasets for analytics and reporting. Aggregation involves summarizing data, such as calculating totals or averages, to provide a higher-level view. In pandas, you use the groupby method followed by aggregation functions like sum, mean, count, and others. This allows you to group data by one or more columns and compute summary statistics for each group, making it easier to analyze trends, compare categories, or prepare data for dashboards.

Enrichment is the process of enhancing your primary dataset with additional context or features from external sources. This often involves joining your main data with auxiliary tables, such as reference data, targets, or lookup tables. In pandas, the merge function enables you to join datasets on shared columns, similar to SQL joins. By enriching your data, you can add new attributes, targets, or classifications that improve downstream analytics and decision-making.

Common enrichment strategies include:

Adding targets or benchmarks from reference tables;
Merging in descriptive labels for codes or IDs;
Incorporating external metrics, such as market data or demographic information;
Calculating new features based on aggregated or joined data.

When building data pipelines, combining aggregation and enrichment ensures your data is both concise and context-rich, ready for loading into analytics systems or visualizations.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 2