Learn Building a Simple Churn Prediction Model | Customer Health and Churn Prediction

Swipe to show menu

Machine learning offers a powerful way to predict customer churn, enabling you to proactively address at-risk accounts. In churn prediction, you typically want to answer a binary question: will a customer churn (leave) or not? Logistic regression is a popular machine learning algorithm for this task because it is specifically designed for binary classification problems. It estimates the probability that a customer belongs to one of two classes—churned or retained—based on their characteristics. This makes logistic regression a practical and interpretable choice for customer success managers aiming to identify which customers are most likely to churn.


              12345678910111213141516171819
            
import numpy as np
from sklearn.linear_model import LogisticRegression

# Example customer data: [usage_frequency, support_tickets, satisfaction_score]
X = np.array([
    [10, 1, 9],   # Active user, few tickets, high satisfaction
    [4, 5, 6],    # Less active, more tickets, medium satisfaction
    [2, 8, 3],    # Rarely active, many tickets, low satisfaction
    [7, 2, 8],    # Moderately active, few tickets, high satisfaction
    [3, 7, 4],    # Rarely active, many tickets, low satisfaction
    [8, 1, 8],    # Active, few tickets, high satisfaction
])

# Churn labels: 1 = churned, 0 = retained
y = np.array([0, 1, 1, 0, 1, 0])

# Create and train the logistic regression model
model = LogisticRegression()
model.fit(X, y)

When training a churn prediction model, you start by selecting features that are likely to influence churn, such as how often a customer uses your product (usage_frequency), the number of support tickets they've submitted (support_tickets), and their satisfaction score (satisfaction_score). These features are organized into a matrix, where each row represents a customer and each column a feature. The labels (y) indicate whether each customer churned or stayed. The logistic regression model learns the relationship between these features and the likelihood of churn. After fitting the model, you can inspect the model's coefficients to see which features have the most influence on the churn prediction. A positive coefficient means an increase in that feature raises the probability of churn, while a negative coefficient means it lowers the risk. This interpretability helps you understand which customer behaviors or attributes are most predictive of churn.


              1234567891011
            
# Predict churn for new customers
new_customers = np.array([
    [5, 3, 7],   # Moderate usage, some tickets, good satisfaction
    [2, 6, 4],   # Low usage, many tickets, low satisfaction
])

predictions = model.predict(new_customers)
probabilities = model.predict_proba(new_customers)

print("Predicted churn labels:", predictions)
print("Churn probabilities:", probabilities[:, 1])

1. What type of problem is churn prediction (classification or regression)?

2. Which scikit-learn function is used to fit a logistic regression model?

3. How can model coefficients help interpret feature importance?

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 4

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 4