Aprende Challenge: Unemployment Rate Summary | Economic Data Analysis with Python

Sección 1. Capítulo 5

single

Desliza para mostrar el menú

You are now ready to apply your knowledge of pandas and descriptive statistics to a practical economic dataset. Imagine you have unemployment rate data for several countries, collected over a five-year period. Your goal is to summarize this data by calculating key statistics for each country and identifying important trends.

To begin, you will work with a hardcoded pandas DataFrame that contains unemployment rates for countries such as the United States, Germany, Japan, and Brazil from 2018 to 2022. For each country, you need to calculate the mean, median, and standard deviation of the unemployment rates across these years. In addition, you should find out which year had the highest unemployment rate for each country. This summary will help you quickly compare the labor market situation across countries and spot years of particular economic difficulty.


              123456789101112131415161718192021222324252627282930313233
            
import pandas as pd

# Hardcoded unemployment rate data
data = {
    "Country": ["United States", "United States", "United States", "United States", "United States",
                "Germany", "Germany", "Germany", "Germany", "Germany",
                "Japan", "Japan", "Japan", "Japan", "Japan",
                "Brazil", "Brazil", "Brazil", "Brazil", "Brazil"],
    "Year": [2018, 2019, 2020, 2021, 2022]*4,
    "Unemployment Rate": [3.9, 3.7, 8.1, 5.4, 3.6,
                          3.4, 3.2, 4.0, 3.6, 3.0,
                          2.4, 2.4, 2.8, 2.8, 2.6,
                          12.3, 11.9, 13.5, 13.2, 9.3]
}
df = pd.DataFrame(data)

def unemployment_summary(df):
    # Group by country and calculate statistics
    grouped = df.groupby("Country")["Unemployment Rate"]
    summary = grouped.agg(["mean", "median", "std"]).reset_index()
    
    # Find the year with the highest unemployment rate for each country
    idx = df.groupby("Country")["Unemployment Rate"].idxmax()
    max_years = df.loc[idx, ["Country", "Year"]].set_index("Country")
    
    # Merge the summary with the year of highest unemployment
    summary = summary.merge(max_years, left_on="Country", right_index=True)
    summary = summary.rename(columns={"Year": "Year of Max Unemployment"})
    
    return summary

summary_df = unemployment_summary(df)
print(summary_df)

This code creates a function called unemployment_summary that takes a DataFrame with unemployment data and returns a summary DataFrame. For each country, you will see the mean, median, and standard deviation of unemployment rates, along with the year when unemployment was at its highest. This kind of summary is valuable for economists who want to quickly understand labor market trends and identify years of significant change.

Now, it's your turn to practice and reinforce these concepts by writing your own function to generate this summary.

Tarea

Swipe to start coding

Write a function called unemployment_summary that takes a pandas DataFrame with columns "Country", "Year", and "Unemployment Rate". The function should:

Calculate the mean, median, and standard deviation of unemployment rates for each country.
Identify the year with the highest unemployment rate for each country.
Return a DataFrame with columns: Country, mean, median, std, Year of Max Unemployment.

The result should be sorted alphabetically by country name.

The input DataFrame will look like this:

Country	Year	Unemployment Rate
United States	2018	3.9
United States	2019	3.7
...	...	...

Your function will be tested with similar data.

Solución

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 1. Capítulo 5

single

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla