Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Aggregating and Enriching Data | Data Transformation and Loading
Data Pipelines with Python

bookAggregating and Enriching Data

123456789101112131415161718192021222324252627282930
import pandas as pd # Sample sales data sales_data = pd.DataFrame({ "region": ["East", "West", "East", "West", "East"], "salesperson": ["Alice", "Bob", "Alice", "Charlie", "David"], "units_sold": [10, 20, 15, 10, 5], "revenue": [100, 200, 150, 120, 50] }) # Aggregating total units sold and revenue per region aggregated = sales_data.groupby("region").agg({ "units_sold": "sum", "revenue": "sum" }).reset_index() print("Aggregated by region:") print(aggregated) # Enrichment dataset: region targets region_targets = pd.DataFrame({ "region": ["East", "West"], "target_revenue": [300, 350] }) # Joining aggregated data with enrichment (targets) enriched = pd.merge(aggregated, region_targets, on="region", how="left") print("\nEnriched with targets:") print(enriched)
copy

Aggregating and enriching data are essential steps in preparing datasets for analytics and reporting. Aggregation involves summarizing data, such as calculating totals or averages, to provide a higher-level view. In pandas, you use the groupby method followed by aggregation functions like sum, mean, count, and others. This allows you to group data by one or more columns and compute summary statistics for each group, making it easier to analyze trends, compare categories, or prepare data for dashboards.

Enrichment is the process of enhancing your primary dataset with additional context or features from external sources. This often involves joining your main data with auxiliary tables, such as reference data, targets, or lookup tables. In pandas, the merge function enables you to join datasets on shared columns, similar to SQL joins. By enriching your data, you can add new attributes, targets, or classifications that improve downstream analytics and decision-making.

Common enrichment strategies include:

  • Adding targets or benchmarks from reference tables;
  • Merging in descriptive labels for codes or IDs;
  • Incorporating external metrics, such as market data or demographic information;
  • Calculating new features based on aggregated or joined data.

When building data pipelines, combining aggregation and enrichment ensures your data is both concise and context-rich, ready for loading into analytics systems or visualizations.

question mark

Which of the following statements correctly describe aggregation or enrichment in pandas?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 2

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Suggested prompts:

Can you explain how the groupby and aggregation work in this example?

What are some common use cases for data enrichment in analytics?

How can I join more than two datasets using pandas?

bookAggregating and Enriching Data

Swipe um das Menü anzuzeigen

123456789101112131415161718192021222324252627282930
import pandas as pd # Sample sales data sales_data = pd.DataFrame({ "region": ["East", "West", "East", "West", "East"], "salesperson": ["Alice", "Bob", "Alice", "Charlie", "David"], "units_sold": [10, 20, 15, 10, 5], "revenue": [100, 200, 150, 120, 50] }) # Aggregating total units sold and revenue per region aggregated = sales_data.groupby("region").agg({ "units_sold": "sum", "revenue": "sum" }).reset_index() print("Aggregated by region:") print(aggregated) # Enrichment dataset: region targets region_targets = pd.DataFrame({ "region": ["East", "West"], "target_revenue": [300, 350] }) # Joining aggregated data with enrichment (targets) enriched = pd.merge(aggregated, region_targets, on="region", how="left") print("\nEnriched with targets:") print(enriched)
copy

Aggregating and enriching data are essential steps in preparing datasets for analytics and reporting. Aggregation involves summarizing data, such as calculating totals or averages, to provide a higher-level view. In pandas, you use the groupby method followed by aggregation functions like sum, mean, count, and others. This allows you to group data by one or more columns and compute summary statistics for each group, making it easier to analyze trends, compare categories, or prepare data for dashboards.

Enrichment is the process of enhancing your primary dataset with additional context or features from external sources. This often involves joining your main data with auxiliary tables, such as reference data, targets, or lookup tables. In pandas, the merge function enables you to join datasets on shared columns, similar to SQL joins. By enriching your data, you can add new attributes, targets, or classifications that improve downstream analytics and decision-making.

Common enrichment strategies include:

  • Adding targets or benchmarks from reference tables;
  • Merging in descriptive labels for codes or IDs;
  • Incorporating external metrics, such as market data or demographic information;
  • Calculating new features based on aggregated or joined data.

When building data pipelines, combining aggregation and enrichment ensures your data is both concise and context-rich, ready for loading into analytics systems or visualizations.

question mark

Which of the following statements correctly describe aggregation or enrichment in pandas?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 2
some-alt