Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Wage Determinants Regression | Economic Modeling and Regression
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Python for Economists

bookChallenge: Wage Determinants Regression

In this challenge, you will analyze a dataset containing information about individuals' wages and their potential determinants, such as years of education, years of experience, and gender. Your goal is to write a function that fits a linear regression model using this data and then interprets which factor most strongly influences wages. This exercise will help you practice applying regression techniques to real-world economic questions and interpreting the results to inform policy or business decisions.

Start by considering a small dataset that is already structured as a pandas DataFrame. The DataFrame includes the following columns:

  • "wage": annual wage in thousands of dollars;
  • "education": years of education completed;
  • "experience": years of work experience;
  • "gender": 1 if male, 0 if female.

You will use the scikit-learn library's LinearRegression class to fit the model, and then examine the estimated coefficients to determine which variable has the largest effect on wage.

12345678910111213141516171819202122
import pandas as pd from sklearn.linear_model import LinearRegression # Hardcoded sample data data = { "wage": [45, 55, 60, 38, 70, 52, 48, 62, 49, 58], "education": [12, 16, 18, 12, 20, 14, 13, 17, 12, 16], "experience": [5, 7, 10, 3, 12, 6, 4, 9, 5, 8], "gender": [1, 0, 1, 0, 1, 0, 0, 1, 0, 1] } df = pd.DataFrame(data) def fit_wage_regression(df): X = df[["education", "experience", "gender"]] y = df["wage"] model = LinearRegression() model.fit(X, y) coefficients = dict(zip(X.columns, model.coef_)) return coefficients coeffs = fit_wage_regression(df) print("Estimated coefficients:", coeffs)
copy

After running the function, you will see the estimated coefficients for each variable. The coefficient for "education" shows how much the wage is expected to increase for each additional year of education, holding other variables constant. The "experience" coefficient reflects the effect of an extra year of work experience, and the "gender" coefficient captures the difference in wage between males and females, assuming other factors are equal.

To interpret which determinant has the largest impact, compare the absolute values of the coefficients. The variable with the largest coefficient (in absolute terms) is the strongest predictor of wage in this dataset. This information is crucial for economists and policymakers when designing interventions to improve earnings or reduce inequality.

Task

Swipe to start coding

Write a function called analyze_wage_determinants that:

  • Takes a pandas DataFrame with columns "wage", "education", "experience", and "gender".
  • Fits a linear regression model with "wage" as the dependent variable and the other columns as independent variables.
  • Returns a tuple containing:
    • A dictionary mapping each independent variable to its estimated coefficient.
    • The name of the variable with the largest absolute coefficient (i.e., the strongest impact on wage).

Use the hardcoded DataFrame from the example above to test your function.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 5
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you help me interpret the coefficients and explain which factor most strongly influences wages?

What does the gender coefficient mean in this context?

How can these results inform policy or business decisions?

close

bookChallenge: Wage Determinants Regression

Swipe to show menu

In this challenge, you will analyze a dataset containing information about individuals' wages and their potential determinants, such as years of education, years of experience, and gender. Your goal is to write a function that fits a linear regression model using this data and then interprets which factor most strongly influences wages. This exercise will help you practice applying regression techniques to real-world economic questions and interpreting the results to inform policy or business decisions.

Start by considering a small dataset that is already structured as a pandas DataFrame. The DataFrame includes the following columns:

  • "wage": annual wage in thousands of dollars;
  • "education": years of education completed;
  • "experience": years of work experience;
  • "gender": 1 if male, 0 if female.

You will use the scikit-learn library's LinearRegression class to fit the model, and then examine the estimated coefficients to determine which variable has the largest effect on wage.

12345678910111213141516171819202122
import pandas as pd from sklearn.linear_model import LinearRegression # Hardcoded sample data data = { "wage": [45, 55, 60, 38, 70, 52, 48, 62, 49, 58], "education": [12, 16, 18, 12, 20, 14, 13, 17, 12, 16], "experience": [5, 7, 10, 3, 12, 6, 4, 9, 5, 8], "gender": [1, 0, 1, 0, 1, 0, 0, 1, 0, 1] } df = pd.DataFrame(data) def fit_wage_regression(df): X = df[["education", "experience", "gender"]] y = df["wage"] model = LinearRegression() model.fit(X, y) coefficients = dict(zip(X.columns, model.coef_)) return coefficients coeffs = fit_wage_regression(df) print("Estimated coefficients:", coeffs)
copy

After running the function, you will see the estimated coefficients for each variable. The coefficient for "education" shows how much the wage is expected to increase for each additional year of education, holding other variables constant. The "experience" coefficient reflects the effect of an extra year of work experience, and the "gender" coefficient captures the difference in wage between males and females, assuming other factors are equal.

To interpret which determinant has the largest impact, compare the absolute values of the coefficients. The variable with the largest coefficient (in absolute terms) is the strongest predictor of wage in this dataset. This information is crucial for economists and policymakers when designing interventions to improve earnings or reduce inequality.

Task

Swipe to start coding

Write a function called analyze_wage_determinants that:

  • Takes a pandas DataFrame with columns "wage", "education", "experience", and "gender".
  • Fits a linear regression model with "wage" as the dependent variable and the other columns as independent variables.
  • Returns a tuple containing:
    • A dictionary mapping each independent variable to its estimated coefficient.
    • The name of the variable with the largest absolute coefficient (i.e., the strongest impact on wage).

Use the hardcoded DataFrame from the example above to test your function.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 5
single

single

some-alt