Training a Churn Predictor on Odoo CRM Data Without a Data Warehouse

Your Odoo database already contains the signals that predict customer churn. This tutorial shows how to extract them, train a classifier, and surface risk scores directly in the CRM kanban view — no data warehouse required.

Key Takeaways: You don’t need a data warehouse or a separate analytics stack to build a churn predictor — crm.lead and sale.order contain enough signal to train a usable model on day one. RFM features (recency, frequency, monetary value) derived directly from Odoo data outperform most hand-crafted rule sets. Business-relevant evaluation metrics — precision at the top decile, revenue at risk — matter more than AUC when you’re explaining the model to a sales manager. A custom field on crm.lead and a server action is all you need to push predictions back into the kanban view without touching Odoo’s core code.

Most churn prediction projects die on the infrastructure question. Someone decides the data needs to go into Snowflake first, a BI pipeline needs to be built, then maybe a feature store, then the model — and by the time a proof of concept exists, three months have passed and the team has moved on.

Skip all of that. If you’re running Odoo, the data you need is already there. This tutorial walks through an end-to-end pipeline: extract features from crm.lead and sale.order, train a gradient-boosted classifier, evaluate it honestly, and write predictions back to the kanban view. The whole thing runs from a single Python script against your Odoo PostgreSQL database.

What “Churn” Means in a CRM Context

Before any code: be precise about the label. Churn in a B2B CRM context usually means one of these:

No new opportunity in the last N months — the customer has gone quiet
Opportunity stage regression — deals that stall or get archived after reaching a late stage
Revenue drop — invoiced amount in the last year is significantly below the customer’s historical average

Pick one definition and stick to it. Mixed labels produce a confused model. For this tutorial we use the first definition: a customer is “churned” if they had at least one closed-won deal but have had no new crm.lead opened in the last 12 months. Adjust the threshold to match your sales cycle — for enterprise SaaS it might be 18 months; for consumables it might be 90 days.

Step 1: Extract Features from Odoo

Connect directly to the PostgreSQL database. You could use the JSON-RPC API, but for a batch training job, a direct DB connection is faster and simpler.

import psycopg2
import pandas as pd

conn = psycopg2.connect(
    dbname="your_odoo_db",
    user="odoo",
    host="localhost",
    port=5432,
)

query = """
WITH partner_orders AS (
    SELECT
        so.partner_id,
        COUNT(so.id)                                        AS order_count,
        SUM(so.amount_total)                               AS total_revenue,
        MAX(so.date_order)                                 AS last_order_date,
        MIN(so.date_order)                                 AS first_order_date,
        AVG(sol.product_uom_qty * sol.price_unit)          AS avg_line_value,
        COUNT(DISTINCT sol.product_id)                     AS distinct_products
    FROM sale_order so
    JOIN sale_order_line sol ON sol.order_id = so.id
    WHERE so.state = 'sale'
    GROUP BY so.partner_id
),
partner_leads AS (
    SELECT
        cl.partner_id,
        COUNT(cl.id)                                       AS total_leads,
        SUM(CASE WHEN cl.stage_id IN (
            SELECT id FROM crm_stage WHERE is_won = true
        ) THEN 1 ELSE 0 END)                              AS won_leads,
        MAX(cl.create_date)                                AS last_lead_date,
        AVG(cl.probability)                                AS avg_probability
    FROM crm_lead cl
    WHERE cl.partner_id IS NOT NULL
    GROUP BY cl.partner_id
)
SELECT
    po.partner_id,
    po.order_count,
    po.total_revenue,
    po.avg_line_value,
    po.distinct_products,
    EXTRACT(DAY FROM NOW() - po.last_order_date)           AS days_since_last_order,
    EXTRACT(DAY FROM po.last_order_date - po.first_order_date)
                                                           AS customer_tenure_days,
    pl.total_leads,
    pl.won_leads,
    COALESCE(pl.avg_probability, 0)                        AS avg_opportunity_probability,
    CASE WHEN pl.last_lead_date < NOW() - INTERVAL '12 months'
         THEN 1 ELSE 0 END                                 AS churned
FROM partner_orders po
LEFT JOIN partner_leads pl ON pl.partner_id = po.partner_id
WHERE po.order_count >= 1
"""

df = pd.read_sql(query, conn)
conn.close()

Why this shape? We want one row per customer. Features come from two sources: transaction history (sale_order, sale_order_line) and CRM activity (crm_lead). The churned column is our label — derived from whether the customer has been active in the last 12 months.

Gotcha: crm_lead.partner_id is often null, especially for inbound leads that were never qualified. The WHERE cl.partner_id IS NOT NULL filter excludes these. If your team logs a lot of anonymous inquiries, you may want to match by email or company instead — but that’s a data cleaning exercise before model training, not during.

Step 2: Engineer a Few More Features

The raw query gives you a good base. Add a couple of derived features before training:

import numpy as np

# Win rate — how often does this customer convert?
df["win_rate"] = df["won_leads"] / df["total_leads"].replace(0, np.nan)
df["win_rate"] = df["win_rate"].fillna(0)

# Revenue per order — a proxy for deal size
df["revenue_per_order"] = df["total_revenue"] / df["order_count"]

# Product breadth ratio — are they buying across the catalogue or just one line?
# Normalise by total distinct products sold (you'd compute this separately)
# For now just keep distinct_products as-is

# Log-transform skewed money features
for col in ["total_revenue", "avg_line_value", "revenue_per_order"]:
    df[f"log_{col}"] = np.log1p(df[col])

feature_cols = [
    "order_count",
    "log_total_revenue",
    "log_avg_line_value",
    "log_revenue_per_order",
    "distinct_products",
    "days_since_last_order",
    "customer_tenure_days",
    "total_leads",
    "won_leads",
    "win_rate",
    "avg_opportunity_probability",
]

X = df[feature_cols].fillna(0)
y = df["churned"]

Log-transforming revenue features matters. A few large accounts will otherwise dominate gradient-boosted trees’ split decisions, and you’ll end up with a model that’s very good at identifying whales but useless for mid-market customers.

Step 3: Train the Classifier

XGBoost handles class imbalance better than logistic regression out of the box, and it’s fast enough on datasets up to a few hundred thousand customers without GPUs.

from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    classification_report,
    roc_auc_score,
    precision_score,
)
import xgboost as xgb

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# scale_pos_weight handles imbalance: ratio of negatives to positives
neg, pos = (y_train == 0).sum(), (y_train == 1).sum()

model = xgb.XGBClassifier(
    n_estimators=300,
    max_depth=4,
    learning_rate=0.05,
    scale_pos_weight=neg / pos,
    use_label_encoder=False,
    eval_metric="logloss",
    random_state=42,
)

model.fit(
    X_train, y_train,
    eval_set=[(X_test, y_test)],
    early_stopping_rounds=20,
    verbose=False,
)

df["churn_probability"] = model.predict_proba(X)[:, 1]

Gotcha: scale_pos_weight is your main lever for controlling false negatives. Increase it (beyond the natural class ratio) if you want the model to catch more churning customers at the cost of more false alarms. In a sales context, false negatives (missed at-risk customers) are usually more expensive than false positives (unnecessary outreach), so it’s worth biasing toward recall.

Step 4: Evaluate on Business Metrics, Not Just AUC

AUC is a fine global metric. But a sales manager doesn’t care about AUC — they want to know: “If I work through the top 50 customers flagged as at risk, how many of them are actually churning?”

# Sort by predicted probability descending
df_eval = df.assign(y_true=y.values).sort_values(
    "churn_probability", ascending=False
).reset_index(drop=True)

# Precision in the top N
def precision_at_k(df, k):
    top_k = df.head(k)
    return top_k["y_true"].sum() / k

print(f"Precision at top 50:  {precision_at_k(df_eval, 50):.1%}")
print(f"Precision at top 100: {precision_at_k(df_eval, 100):.1%}")
print(f"AUC:                  {roc_auc_score(y_eval, df_eval['churn_probability']):.3f}")

# Revenue at risk in the top decile
top_10pct = df_eval.head(int(len(df_eval) * 0.1))
revenue_at_risk = top_10pct.loc[top_10pct["y_true"] == 1, "total_revenue"].sum()
print(f"Revenue at risk (top 10%): ${revenue_at_risk:,.0f}")

A model with AUC 0.78 that flags $2.4M in revenue at risk in the top decile is a better conversation starter than one with AUC 0.82 that points at low-value accounts. Frame the evaluation in business terms from the beginning — it determines whether anyone acts on the output.

Feature importance check — always do this before deploying:

import matplotlib.pyplot as plt

feat_imp = pd.Series(
    model.feature_importances_, index=feature_cols
).sort_values(ascending=False)

print(feat_imp.head(10))

If days_since_last_order dominates, that’s expected — recency is nearly always the strongest churn signal. If a leaky feature appears (like won_leads that somehow encodes future information), you need to fix the data pipeline before shipping.

Step 5: Write Predictions Back to Odoo

The cleanest approach is a custom field on res.partner (or crm.lead) that stores the churn probability, plus a server action or scheduled action that refreshes it on a cadence.

Create the field via a small custom module. Add this to your module’s models/res_partner.py:

from odoo import fields, models

class ResPartner(models.Model):
    _inherit = "res.partner"

    churn_probability = fields.Float(
        string="Churn Risk Score",
        digits=(3, 2),
        help="Predicted probability of customer churn (0.0–1.0). "
             "Refreshed weekly by the churn prediction job.",
    )
    churn_risk_label = fields.Selection(
        selection=[
            ("low", "Low"),
            ("medium", "Medium"),
            ("high", "High"),
        ],
        string="Churn Risk",
        compute="_compute_churn_risk_label",
        store=True,
    )

    @api.depends("churn_probability")
    def _compute_churn_risk_label(self):
        for partner in self:
            p = partner.churn_probability
            if p >= 0.65:
                partner.churn_risk_label = "high"
            elif p >= 0.35:
                partner.churn_risk_label = "medium"
            else:
                partner.churn_risk_label = "low"

Then write predictions back with a simple update script:

import xmlrpc.client

url = "https://your-odoo-instance.com"
db = "your_odoo_db"
username = "[email protected]"
password = "api_key_here"

common = xmlrpc.client.ServerProxy(f"{url}/xmlrpc/2/common")
uid = common.authenticate(db, username, password, {})
models_proxy = xmlrpc.client.ServerProxy(f"{url}/xmlrpc/2/object")

# df here is the full scored dataframe with partner_id and churn_probability
for _, row in df.iterrows():
    models_proxy.execute_kw(
        db, uid, password,
        "res.partner", "write",
        [[int(row["partner_id"])], {"churn_probability": float(row["churn_probability"])}],
    )

Gotcha: Batch writes through XML-RPC are slow for large partner lists. Group them into chunks of 100 using write with a domain filter, or switch to a direct SQL UPDATE in a scheduled Odoo action for anything over 5,000 records.

Surfacing Risk in the Kanban View

Once the field is on res.partner, you can filter or group the CRM pipeline by churn_risk_label. In Odoo’s CRM module, add a filter for “High Churn Risk” customers and pin it to the Favorites bar — sales managers can now open their pipeline and see at a glance which existing customers haven’t had a new opportunity in months.

For reps, a color badge on the kanban card is more useful than a number. Add this to your custom module’s view XML:

<record id="view_crm_lead_kanban_churn" model="ir.ui.view">
    <field name="name">crm.lead.kanban.churn</field>
    <field name="model">crm.lead</field>
    <field name="inherit_id" ref="crm.crm_lead_all_leads_tree_view"/>
    <field name="arch" type="xml">
        <xpath expr="//kanban" position="inside">
            <field name="partner_id"/>
        </xpath>
    </field>
</record>

The badge itself is a statusbar widget or a color-coded many2one label — depends on how your kanban template is structured. The key point: reps don’t look at dashboards, they look at their pipeline. Putting the signal where they already work is what drives action.

What This Won’t Catch

Honest caveat: this model predicts churn based on historical patterns. It misses:

New customers with fewer than 2–3 orders — not enough history to score reliably
Relationship-level signals — if the main contact left the client’s company, no amount of feature engineering picks that up from sale.order data
Seasonal businesses — days_since_last_order will flag a seasonal buyer as churned every off-season unless you engineer a seasonality-adjusted recency feature

For the third case: look at crm.lead creation date seasonality per customer and adjust the churn threshold dynamically. For the second case, you need CRM hygiene — ensuring reps log contact changes in Odoo. That’s a process problem, not a modeling problem.

At Trobz, we’ve run this pipeline on Odoo databases ranging from 500 to 80,000 customers. The feature extraction query above scales well; the prediction refresh cadence and write-back strategy depend on your instance size. If you’re setting this up and hitting performance walls, get in touch — happy to walk through the options.