The Dirty Secret of Churn Prediction
Every data science team has a churn model. Ours was solid — a well-tuned CatBoost classifier with strong AUC. But here's what nobody talks about: predicting who will churn doesn't tell you how to stop them.
A customer moving to another country will churn regardless of what you do. A customer frustrated by a billing error needs a completely different intervention than one who found a cheaper competitor. Standard churn models lump these together. Causal inference pulls them apart.
From Prediction to Intervention
We adopted the Rubin Causal Model (potential outcomes framework) to answer a different question: "What is the causal effect of intervention X on churn probability for customer segment Y?"
This reframing changed everything about how we operated.
Building the Causal Graph
Working with customer service teams, we constructed a DAG mapping relationships between tenure, engagement patterns, service interactions, pricing changes, and product usage.
The graph immediately revealed confounders we'd been ignoring. The most instructive one: customers who contact support frequently have higher churn rates. The naive conclusion is "support interactions drive churn." The causal truth is the opposite — they contact support because they're already frustrated. Targeting "high-contact" customers for retention outreach would be intervening on a symptom, not a cause.
Without the DAG, we would have wasted budget on exactly this kind of misguided intervention.
Identifying What We Can Actually Change
Not every variable in a causal graph is actionable. We narrowed to three interventable levers:
- Proactive outreach timing — reaching customers before frustration peaks, not after
- Offer personalization — matching the right retention offer to the right segment
- Service recovery speed — resolving issues faster for at-risk customers
Estimating Causal Effects
We used three complementary techniques:
- Propensity score matching for estimating treatment effects from observational data
- Instrumental variables where we had natural experiments (pricing changes rolled out by region created exogenous variation)
- Double machine learning (CatBoost as the first-stage learner) for heterogeneous treatment effect estimation — understanding which interventions work for which customer segments
Validating with A/B Tests
Observational causal estimates are only as good as your assumptions. We ran three targeted A/B tests:
| Test | Treatment | Churn Reduction |
|---|
| Proactive outreach at predicted frustration peak | vs. standard timing | 80 bps |
| Personalized retention offers | vs. generic discount | 50 bps |
| Expedited service recovery for at-risk segment | vs. standard SLA | 30 bps |
The Impact
Combined: 160 basis points of churn reduction, translating to roughly $1M in annual retained revenue.
More importantly, we built a reusable framework. When business conditions shift, we re-estimate effects and re-prioritize interventions. No starting from scratch.
What This Taught Me
Causal graphs are the best meeting you'll ever have. The DAG-building session with business stakeholders forced everyone to make their assumptions explicit. Arguments that had been circular for months resolved in an afternoon once the causal structure was drawn on a whiteboard.
Average effects hide the truth. The intervention that works for premium customers may backfire for budget-conscious ones. Heterogeneous treatment effects are where the real actionable signal lives.
Use observational methods to decide which experiments to run. We estimated effects for 12 potential interventions observationally, then only ran 3 A/B tests — the ones with the highest expected ROI. A/B tests are expensive. Spend them wisely.