Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Challenge: Predict Employee Attrition | Predictive People Analytics
Python for People Analytics
Sectie 3. Hoofdstuk 5
single

single

bookChallenge: Predict Employee Attrition

Veeg om het menu te tonen

Before diving into a hands-on challenge, it is helpful to recap the typical steps involved in predictive modeling for employee attrition. You usually start by preparing your data, which includes collecting relevant employee features such as age, tenure, satisfaction, and department, and ensuring that the target column (attrition: 1 for left, 0 for stayed) is correctly formatted. The next step is to select and train an appropriate model; logistic regression is commonly used for binary classification problems like attrition prediction. After fitting the model to your data, you evaluate its performance using metrics such as accuracy (the proportion of correct predictions) and recall (the proportion of actual attrition cases correctly identified). Visualizations like confusion matrices and probability plots help you interpret the model’s predictions and understand where it performs well or needs improvement.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, recall_score, confusion_matrix import seaborn as sns import matplotlib.pyplot as plt # Example DataFrame data = { 'age': [25, 45, 30, 41, 36, 28, 50, 29], 'tenure': [2, 10, 4, 8, 6, 3, 15, 1], 'satisfaction': [0.9, 0.4, 0.7, 0.5, 0.6, 0.8, 0.3, 0.95], 'department': ['Sales', 'HR', 'IT', 'Sales', 'IT', 'HR', 'Sales', 'IT'], 'attrition': [0, 1, 0, 1, 0, 0, 1, 0] } df = pd.DataFrame(data) # One-hot encode categorical variables df_encoded = pd.get_dummies(df, columns=['department'], drop_first=True) # Features and target X = df_encoded.drop('attrition', axis=1) y = df_encoded['attrition'] # Model setup model = LogisticRegression() model.fit(X, y) # Predict y_pred = model.predict(X) accuracy = accuracy_score(y, y_pred) recall = recall_score(y, y_pred) # Confusion matrix cm = confusion_matrix(y, y_pred) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues') plt.xlabel('Predicted') plt.ylabel('Actual') plt.title('Confusion Matrix') plt.show() # Predicted probabilities probs = model.predict_proba(X)[:, 1] plt.bar(range(len(probs)), probs) plt.xlabel('Employee') plt.ylabel('Predicted Probability of Attrition') plt.title('Attrition Probability by Employee') plt.show() # Print metrics print("Accuracy:", accuracy) print("Recall:", recall) # Summary: # The logistic regression model predicts employee attrition using age, tenure, satisfaction, and department. # Accuracy shows the proportion of correct predictions, while recall indicates how well the model identifies employees who left. # The confusion matrix and probability plot help visualize model performance and individual risk.
copy

When interpreting your attrition model results, remember that accuracy alone may not capture the value of your predictions—recall is especially important if HR wants to minimize missed cases of likely attrition. Use the confusion matrix to identify false positives and false negatives, and review predicted probabilities to spot employees at high risk. In communicating findings to HR, focus on actionable insights:

  • Which features are most associated with attrition;
  • Which employees may benefit from targeted retention strategies based on their predicted risk.
Taak

Swipe to start coding

Build a Python script that predicts employee attrition using logistic regression. Use the provided DataFrame with employee features and attrition labels. Your script must:

  • Train a logistic regression model to predict attrition based on age, tenure, satisfaction, and one-hot encoded department.
  • Predict attrition for all employees in the DataFrame.
  • Calculate and store the accuracy and recall of the predictions.
  • Visualize the confusion matrix of actual vs. predicted attrition.
  • Visualize the predicted probability of attrition for each employee.
  • Summarize your findings in comments at the end of your script.

Oplossing

Switch to desktopSchakel over naar desktop voor praktijkervaringGa verder vanaf waar je bent met een van de onderstaande opties
Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 3. Hoofdstuk 5
single

single

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

some-alt