Challenge: Unemployment Rate Summary
You are now ready to apply your knowledge of pandas and descriptive statistics to a practical economic dataset. Imagine you have unemployment rate data for several countries, collected over a five-year period. Your goal is to summarize this data by calculating key statistics for each country and identifying important trends.
To begin, you will work with a hardcoded pandas DataFrame that contains unemployment rates for countries such as the United States, Germany, Japan, and Brazil from 2018 to 2022. For each country, you need to calculate the mean, median, and standard deviation of the unemployment rates across these years. In addition, you should find out which year had the highest unemployment rate for each country. This summary will help you quickly compare the labor market situation across countries and spot years of particular economic difficulty.
123456789101112131415161718192021222324252627282930313233import pandas as pd # Hardcoded unemployment rate data data = { "Country": ["United States", "United States", "United States", "United States", "United States", "Germany", "Germany", "Germany", "Germany", "Germany", "Japan", "Japan", "Japan", "Japan", "Japan", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil"], "Year": [2018, 2019, 2020, 2021, 2022]*4, "Unemployment Rate": [3.9, 3.7, 8.1, 5.4, 3.6, 3.4, 3.2, 4.0, 3.6, 3.0, 2.4, 2.4, 2.8, 2.8, 2.6, 12.3, 11.9, 13.5, 13.2, 9.3] } df = pd.DataFrame(data) def unemployment_summary(df): # Group by country and calculate statistics grouped = df.groupby("Country")["Unemployment Rate"] summary = grouped.agg(["mean", "median", "std"]).reset_index() # Find the year with the highest unemployment rate for each country idx = df.groupby("Country")["Unemployment Rate"].idxmax() max_years = df.loc[idx, ["Country", "Year"]].set_index("Country") # Merge the summary with the year of highest unemployment summary = summary.merge(max_years, left_on="Country", right_index=True) summary = summary.rename(columns={"Year": "Year of Max Unemployment"}) return summary summary_df = unemployment_summary(df) print(summary_df)
This code creates a function called unemployment_summary that takes a DataFrame with unemployment data and returns a summary DataFrame. For each country, you will see the mean, median, and standard deviation of unemployment rates, along with the year when unemployment was at its highest. This kind of summary is valuable for economists who want to quickly understand labor market trends and identify years of significant change.
Now, it's your turn to practice and reinforce these concepts by writing your own function to generate this summary.
Swipe to start coding
Write a function called unemployment_summary that takes a pandas DataFrame with columns "Country", "Year", and "Unemployment Rate". The function should:
- Calculate the mean, median, and standard deviation of unemployment rates for each country.
- Identify the year with the highest unemployment rate for each country.
- Return a DataFrame with columns:
Country,mean,median,std,Year of Max Unemployment.
The result should be sorted alphabetically by country name.
The input DataFrame will look like this:
| Country | Year | Unemployment Rate |
|---|---|---|
| United States | 2018 | 3.9 |
| United States | 2019 | 3.7 |
| ... | ... | ... |
Your function will be tested with similar data.
Solución
¡Gracias por tus comentarios!
single
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Can you explain how the function calculates the statistics for each country?
What does the standard deviation tell us about the unemployment rates?
Why did all countries have their highest unemployment rate in 2020?
Genial!
Completion tasa mejorada a 4.76
Challenge: Unemployment Rate Summary
Desliza para mostrar el menú
You are now ready to apply your knowledge of pandas and descriptive statistics to a practical economic dataset. Imagine you have unemployment rate data for several countries, collected over a five-year period. Your goal is to summarize this data by calculating key statistics for each country and identifying important trends.
To begin, you will work with a hardcoded pandas DataFrame that contains unemployment rates for countries such as the United States, Germany, Japan, and Brazil from 2018 to 2022. For each country, you need to calculate the mean, median, and standard deviation of the unemployment rates across these years. In addition, you should find out which year had the highest unemployment rate for each country. This summary will help you quickly compare the labor market situation across countries and spot years of particular economic difficulty.
123456789101112131415161718192021222324252627282930313233import pandas as pd # Hardcoded unemployment rate data data = { "Country": ["United States", "United States", "United States", "United States", "United States", "Germany", "Germany", "Germany", "Germany", "Germany", "Japan", "Japan", "Japan", "Japan", "Japan", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil"], "Year": [2018, 2019, 2020, 2021, 2022]*4, "Unemployment Rate": [3.9, 3.7, 8.1, 5.4, 3.6, 3.4, 3.2, 4.0, 3.6, 3.0, 2.4, 2.4, 2.8, 2.8, 2.6, 12.3, 11.9, 13.5, 13.2, 9.3] } df = pd.DataFrame(data) def unemployment_summary(df): # Group by country and calculate statistics grouped = df.groupby("Country")["Unemployment Rate"] summary = grouped.agg(["mean", "median", "std"]).reset_index() # Find the year with the highest unemployment rate for each country idx = df.groupby("Country")["Unemployment Rate"].idxmax() max_years = df.loc[idx, ["Country", "Year"]].set_index("Country") # Merge the summary with the year of highest unemployment summary = summary.merge(max_years, left_on="Country", right_index=True) summary = summary.rename(columns={"Year": "Year of Max Unemployment"}) return summary summary_df = unemployment_summary(df) print(summary_df)
This code creates a function called unemployment_summary that takes a DataFrame with unemployment data and returns a summary DataFrame. For each country, you will see the mean, median, and standard deviation of unemployment rates, along with the year when unemployment was at its highest. This kind of summary is valuable for economists who want to quickly understand labor market trends and identify years of significant change.
Now, it's your turn to practice and reinforce these concepts by writing your own function to generate this summary.
Swipe to start coding
Write a function called unemployment_summary that takes a pandas DataFrame with columns "Country", "Year", and "Unemployment Rate". The function should:
- Calculate the mean, median, and standard deviation of unemployment rates for each country.
- Identify the year with the highest unemployment rate for each country.
- Return a DataFrame with columns:
Country,mean,median,std,Year of Max Unemployment.
The result should be sorted alphabetically by country name.
The input DataFrame will look like this:
| Country | Year | Unemployment Rate |
|---|---|---|
| United States | 2018 | 3.9 |
| United States | 2019 | 3.7 |
| ... | ... | ... |
Your function will be tested with similar data.
Solución
¡Gracias por tus comentarios!
single