single
Challenge: Predict Employee Attrition
Pyyhkäise näyttääksesi valikon
Before diving into a hands-on challenge, it is helpful to recap the typical steps involved in predictive modeling for employee attrition. You usually start by preparing your data, which includes collecting relevant employee features such as age, tenure, satisfaction, and department, and ensuring that the target column (attrition: 1 for left, 0 for stayed) is correctly formatted. The next step is to select and train an appropriate model; logistic regression is commonly used for binary classification problems like attrition prediction. After fitting the model to your data, you evaluate its performance using metrics such as accuracy (the proportion of correct predictions) and recall (the proportion of actual attrition cases correctly identified). Visualizations like confusion matrices and probability plots help you interpret the model’s predictions and understand where it performs well or needs improvement.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, recall_score, confusion_matrix import seaborn as sns import matplotlib.pyplot as plt # Example DataFrame data = { 'age': [25, 45, 30, 41, 36, 28, 50, 29], 'tenure': [2, 10, 4, 8, 6, 3, 15, 1], 'satisfaction': [0.9, 0.4, 0.7, 0.5, 0.6, 0.8, 0.3, 0.95], 'department': ['Sales', 'HR', 'IT', 'Sales', 'IT', 'HR', 'Sales', 'IT'], 'attrition': [0, 1, 0, 1, 0, 0, 1, 0] } df = pd.DataFrame(data) # One-hot encode categorical variables df_encoded = pd.get_dummies(df, columns=['department'], drop_first=True) # Features and target X = df_encoded.drop('attrition', axis=1) y = df_encoded['attrition'] # Model setup model = LogisticRegression() model.fit(X, y) # Predict y_pred = model.predict(X) accuracy = accuracy_score(y, y_pred) recall = recall_score(y, y_pred) # Confusion matrix cm = confusion_matrix(y, y_pred) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues') plt.xlabel('Predicted') plt.ylabel('Actual') plt.title('Confusion Matrix') plt.show() # Predicted probabilities probs = model.predict_proba(X)[:, 1] plt.bar(range(len(probs)), probs) plt.xlabel('Employee') plt.ylabel('Predicted Probability of Attrition') plt.title('Attrition Probability by Employee') plt.show() # Print metrics print("Accuracy:", accuracy) print("Recall:", recall) # Summary: # The logistic regression model predicts employee attrition using age, tenure, satisfaction, and department. # Accuracy shows the proportion of correct predictions, while recall indicates how well the model identifies employees who left. # The confusion matrix and probability plot help visualize model performance and individual risk.
When interpreting your attrition model results, remember that accuracy alone may not capture the value of your predictions—recall is especially important if HR wants to minimize missed cases of likely attrition. Use the confusion matrix to identify false positives and false negatives, and review predicted probabilities to spot employees at high risk. In communicating findings to HR, focus on actionable insights:
- Which features are most associated with attrition;
- Which employees may benefit from targeted retention strategies based on their predicted risk.
Swipe to start coding
Build a Python script that predicts employee attrition using logistic regression. Use the provided DataFrame with employee features and attrition labels. Your script must:
- Train a logistic regression model to predict attrition based on age, tenure, satisfaction, and one-hot encoded department.
- Predict attrition for all employees in the DataFrame.
- Calculate and store the accuracy and recall of the predictions.
- Visualize the confusion matrix of actual vs. predicted attrition.
- Visualize the predicted probability of attrition for each employee.
- Summarize your findings in comments at the end of your script.
Ratkaisu
Kiitos palautteestasi!
single
Kysy tekoälyä
Kysy tekoälyä
Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme