All Posts
Machine LearningJune 18, 20257 min read

How CatBoost Sales Propensity Routing Drove $15M in Value

Building a real-time propensity model that matches incoming customers to the right sales agent — a 12% conversion lift that changed how our entire call center operates.

CatBoostFeature EngineeringA/B TestingReal-time ML

The Matching Problem

Picture this: 1,000 incoming customer calls per hour, 200 available agents. Some agents are natural upsellers; others excel at converting hesitant first-time buyers. Some customers are ready to purchase; others need convincing.

Random routing ignores all of this. We set out to build something better: match customer propensity with agent capability in real-time, before the call connects.

Why CatBoost Won the Bake-Off

We benchmarked XGBoost, LightGBM, and CatBoost head-to-head. CatBoost won for practical reasons, not theoretical ones.

Categorical features without leakage. Our feature space was heavy on categoricals — customer segment, product category, call reason, agent skill tier. CatBoost's ordered target encoding avoids the leakage problems we hit with standard target encoding in the other frameworks. Less preprocessing, fewer subtle bugs. Automatic interaction discovery. CatBoost found that the interaction between time-of-day and customer segment was highly predictive — something we hadn't hypothesized and might never have engineered manually. Graceful handling of sparse segments. Some customer-product combinations had very few training examples. CatBoost's symmetric tree structure and ordered boosting handled these thin slices without overfitting, where the alternatives needed explicit regularization tuning per segment.

150 Features Across Four Dimensions

Customer

Purchase history (recency, frequency, monetary), service interaction patterns, current product portfolio, tenure, and demographic segment.

Context

Time of day, day of week, holiday proximity, recent marketing touchpoints, current promotion eligibility, and queue wait time. That last one surprised us — longer waits meaningfully correlate with lower purchase propensity.

Agent

Historical conversion rate by product type, average handle time, customer satisfaction scores, and current shift duration. Agent fatigue is real and measurable.

Interaction

Customer-agent segment compatibility scores, product affinity signals, and whether the opportunity is cross-sell vs. upsell.

The A/B Test

We ran a 50/50 split for 8 weeks:

  • Control: Skills-based routing (match by product type only)
  • Treatment: Propensity-optimized routing (CatBoost score + agent capability matching)

MetricControlTreatmentLift
Sales conversion18.3%20.5%+12%
Average order value$42.10$43.80+4%
Customer satisfaction4.2/54.3/5+2.4%
Handle time8.2 min7.8 min-4.9%

At our call volume, the 12% conversion lift translated to roughly $15M in annualized incremental value. The handle time reduction was a bonus we didn't expect — better customer-agent matching means fewer awkward conversations.

Hard Problems in Production

Inference under 100ms. The routing decision happens while the customer is in the IVR queue. We deployed the CatBoost model via ONNX Runtime, achieving p99 latency of 12ms. Plenty of headroom. Feature freshness. Features like "calls in last 24 hours" need real-time updates. We built a lightweight feature store on Redis — batch features refresh hourly, real-time features update on every event. Nothing fancy, but the distinction between batch and streaming features was essential to get right. Fairness monitoring. We tracked conversion rates across customer demographics to ensure the model wasn't systematically routing certain groups away from top-performing agents. We implemented post-processing calibration to maintain equitable routing. This wasn't just ethics — it's good business. Biased routing leaves money on the table.

Three Takeaways

Feature engineering is 70% of the value. The single most impactful feature was a 7-day rolling engagement score — a weighted sum of customer interactions that took weeks of iteration to get right. The model itself was straightforward. Explain the routing to the agents. Agents who understood why they were getting certain calls performed measurably better. We built a simple dashboard showing each agent their routing rationale. Transparency isn't overhead; it's a multiplier. Instrument the full loop, not just the model. After deployment, we caught agents gaming the system by logging call outcomes differently to influence their routing scores. The model was fine; the feedback loop was corrupted. Always monitor the ecosystem, not just the predictions.
VS
Venkata Subramanian Srinivasan
Senior Data Scientist at Asurion | Georgia Tech Alumni
Share