Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Challenge: Predict Employee Attrition | Predictive People Analytics
Python for People Analytics
Abschnitt 3. Kapitel 5
single

single

bookChallenge: Predict Employee Attrition

Swipe um das Menü anzuzeigen

Before diving into a hands-on challenge, it is helpful to recap the typical steps involved in predictive modeling for employee attrition. You usually start by preparing your data, which includes collecting relevant employee features such as age, tenure, satisfaction, and department, and ensuring that the target column (attrition: 1 for left, 0 for stayed) is correctly formatted. The next step is to select and train an appropriate model; logistic regression is commonly used for binary classification problems like attrition prediction. After fitting the model to your data, you evaluate its performance using metrics such as accuracy (the proportion of correct predictions) and recall (the proportion of actual attrition cases correctly identified). Visualizations like confusion matrices and probability plots help you interpret the model’s predictions and understand where it performs well or needs improvement.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, recall_score, confusion_matrix import seaborn as sns import matplotlib.pyplot as plt # Example DataFrame data = { 'age': [25, 45, 30, 41, 36, 28, 50, 29], 'tenure': [2, 10, 4, 8, 6, 3, 15, 1], 'satisfaction': [0.9, 0.4, 0.7, 0.5, 0.6, 0.8, 0.3, 0.95], 'department': ['Sales', 'HR', 'IT', 'Sales', 'IT', 'HR', 'Sales', 'IT'], 'attrition': [0, 1, 0, 1, 0, 0, 1, 0] } df = pd.DataFrame(data) # One-hot encode categorical variables df_encoded = pd.get_dummies(df, columns=['department'], drop_first=True) # Features and target X = df_encoded.drop('attrition', axis=1) y = df_encoded['attrition'] # Model setup model = LogisticRegression() model.fit(X, y) # Predict y_pred = model.predict(X) accuracy = accuracy_score(y, y_pred) recall = recall_score(y, y_pred) # Confusion matrix cm = confusion_matrix(y, y_pred) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues') plt.xlabel('Predicted') plt.ylabel('Actual') plt.title('Confusion Matrix') plt.show() # Predicted probabilities probs = model.predict_proba(X)[:, 1] plt.bar(range(len(probs)), probs) plt.xlabel('Employee') plt.ylabel('Predicted Probability of Attrition') plt.title('Attrition Probability by Employee') plt.show() # Print metrics print("Accuracy:", accuracy) print("Recall:", recall) # Summary: # The logistic regression model predicts employee attrition using age, tenure, satisfaction, and department. # Accuracy shows the proportion of correct predictions, while recall indicates how well the model identifies employees who left. # The confusion matrix and probability plot help visualize model performance and individual risk.
copy

When interpreting your attrition model results, remember that accuracy alone may not capture the value of your predictions—recall is especially important if HR wants to minimize missed cases of likely attrition. Use the confusion matrix to identify false positives and false negatives, and review predicted probabilities to spot employees at high risk. In communicating findings to HR, focus on actionable insights:

  • Which features are most associated with attrition;
  • Which employees may benefit from targeted retention strategies based on their predicted risk.
Aufgabe

Swipe to start coding

Build a Python script that predicts employee attrition using logistic regression. Use the provided DataFrame with employee features and attrition labels. Your script must:

  • Train a logistic regression model to predict attrition based on age, tenure, satisfaction, and one-hot encoded department.
  • Predict attrition for all employees in the DataFrame.
  • Calculate and store the accuracy and recall of the predictions.
  • Visualize the confusion matrix of actual vs. predicted attrition.
  • Visualize the predicted probability of attrition for each employee.
  • Summarize your findings in comments at the end of your script.

Lösung

Switch to desktopWechseln Sie zum Desktop, um in der realen Welt zu übenFahren Sie dort fort, wo Sie sind, indem Sie eine der folgenden Optionen verwenden
War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 5
single

single

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

some-alt