Cash Flow Forecasting in Odoo: A Practical Guide to the ML Pipeline

Your Odoo database already has everything needed for a 6-week cash flow forecast. Here's how to build the pipeline from raw data to a live dashboard widget.

Key Takeaways: Cash flow forecasting with machine learning doesn’t require a separate data warehouse or BI stack — your Odoo database already has everything you need. The key is extracting the right signals: receivables aging, payables schedules, confirmed order revenue, and seasonal patterns. A gradient boosting model trained on account.move.line, sale.order, and purchase.order data can generate a reliable 6-week rolling forecast. The hard part isn’t the model — it’s data quality and the discipline of reviewing output against actuals each week.

Most businesses running Odoo have everything needed for cash flow forecasting sitting in their database. The data is in account.move, sale.order, purchase.order, and account.move.line. The problem is that most finance teams use aging reports and pivot tables to answer what the balance will look like in four weeks — which works, until a big customer pays late or an unexpected purchase order lands.

An ML-based forecast doesn’t eliminate those surprises. It builds a probabilistic picture of the next six weeks based on historical patterns, known commitments, and receivables behavior. It also tells you where the uncertainty is concentrated, which is often more useful than the point estimate itself.

This post covers the full pipeline: data extraction, feature engineering, model training, rolling forecast generation, and displaying results inside Odoo.

Step 1: Extract the Raw Data from Odoo

What to do

Connect to your Odoo PostgreSQL database directly — the ORM adds too much overhead for the volume of historical data you’ll need. Pull from three tables: account.move.line (payment history), sale.order (confirmed revenue), and purchase.order (committed spend).

import pandas as pd
import psycopg2

conn = psycopg2.connect(
    host="localhost",
    dbname="your_odoo_db",
    user="odoo",
    password="your_password"
)

# Receivables and payables from posted journal entries
receivables_query = """
SELECT
    aml.id,
    aml.date,
    aml.date_maturity,
    aml.balance,
    am.move_type,
    am.payment_state,
    rp.id AS partner_id,
    rp.name AS partner_name
FROM account_move_line aml
JOIN account_move am ON aml.move_id = am.id
JOIN account_account aa ON aml.account_id = aa.id
JOIN res_partner rp ON aml.partner_id = rp.id
WHERE aa.account_type IN ('asset_receivable', 'liability_payable')
  AND am.state = 'posted'
  AND aml.date >= NOW() - INTERVAL '18 months'
ORDER BY aml.date
"""

# Confirmed sales orders not yet fully invoiced — future cash in
sales_query = """
SELECT
    so.id,
    so.date_order,
    so.amount_total,
    so.partner_id,
    so.invoice_status,
    sol.price_subtotal,
    sol.qty_to_invoice
FROM sale_order so
JOIN sale_order_line sol ON sol.order_id = so.id
WHERE so.state = 'sale'
  AND so.invoice_status IN ('to invoice', 'invoiced')
  AND so.date_order >= NOW() - INTERVAL '18 months'
"""

# Purchase orders confirmed — committed spend
purchase_query = """
SELECT
    po.id,
    po.date_order,
    po.date_approve,
    po.amount_total,
    po.partner_id,
    po.invoice_status,
    po.state
FROM purchase_order po
WHERE po.state IN ('purchase', 'done')
  AND po.date_order >= NOW() - INTERVAL '18 months'
"""

df_ar = pd.read_sql(receivables_query, conn)
df_so = pd.read_sql(sales_query, conn)
df_po = pd.read_sql(purchase_query, conn)

Why

The ORM layers security rules, computed field recalculation, and access control on every read. For analytics on 18 months of transaction history, you want the raw columns. Direct SQL is also 10–100x faster for full-table scans.

Gotchas

Filter by account_type — asset_receivable for AR, liability_payable for AP. Don’t filter on move_type alone: credit notes and refunds flip the sign on the balance column.
date_maturity is the right field for forecasting, not date. Payment terms set the maturity; that’s when cash is expected to move.
Sales orders with invoice_status = 'to invoice' represent committed revenue not yet billed. Include them — they’re real future inflows, just earlier in the billing cycle.

Step 2: Build the Feature Matrix

What to do

Transform raw transaction data into a weekly time series. Each row in the feature matrix is one week, with features derived from the state of the business at the start of that week.

import numpy as np
from datetime import timedelta

def build_weekly_features(df_ar, df_so, reference_date):
    weeks = pd.date_range(
        start=reference_date - pd.DateOffset(months=15),
        end=reference_date,
        freq='W-MON'
    )

    rows = []
    for week_start in weeks:
        week_end = week_start + timedelta(days=7)

        # Receivables due this week
        ar_due = df_ar[
            (df_ar['date_maturity'] >= week_start) &
            (df_ar['date_maturity'] < week_end) &
            (df_ar['move_type'].isin(['out_invoice', 'out_refund']))
        ]['balance'].sum()

        # Payables due this week
        ap_due = df_ar[
            (df_ar['date_maturity'] >= week_start) &
            (df_ar['date_maturity'] < week_end) &
            (df_ar['move_type'].isin(['in_invoice', 'in_refund']))
        ]['balance'].sum()

        # Overdue AR from past 30 days (collection behavior signal)
        ar_overdue_30 = df_ar[
            (df_ar['date_maturity'] < week_start) &
            (df_ar['date_maturity'] >= week_start - timedelta(days=30)) &
            (df_ar['payment_state'] != 'paid')
        ]['balance'].sum()

        # Confirmed SO value not yet invoiced (committed inflow pipeline)
        so_pipeline = df_so[
            (df_so['date_order'] < week_end) &
            (df_so['invoice_status'] == 'to invoice')
        ]['price_subtotal'].sum()

        # Seasonal features
        week_of_year = week_start.isocalendar()[1]
        month = week_start.month

        rows.append({
            'week_start': week_start,
            'ar_due': ar_due,
            'ap_due': ap_due,
            'ar_overdue_30': ar_overdue_30,
            'so_pipeline': so_pipeline,
            'quarter': (month - 1) // 3 + 1,
            # Sin/cos encoding avoids the artificial jump from week 52 to week 1
            'sin_week': np.sin(2 * np.pi * week_of_year / 52),
            'cos_week': np.cos(2 * np.pi * week_of_year / 52),
        })

    return pd.DataFrame(rows)

Why

The ar_overdue_30 feature is one of the better predictors of actual collection timing. Customers who paid late last month tend to pay late this month — the model picks this up quickly. This signal carries more information than date_maturity alone, which assumes everyone pays on time.

Sin/cos encoding for week-of-year (rather than the raw integer) means week 52 and week 1 are adjacent in feature space, just as they are in the real calendar. Without it, the model sees a large numerical gap between the last week of December and the first week of January.

Gotchas

Keep ar_due and so_pipeline as separate features. They represent different stages of the revenue cycle with different reliability as signals. Summing them into a single “expected inflows” feature loses that distinction.
Purchase order data tells you what’s committed, not when the invoice will arrive. Use date_approve plus your vendor payment terms to estimate AP timing, or accept that the AP feature is a lower-bound estimate.
If you’re running multi-company, build the feature matrix per legal entity. Aggregating across companies in different currencies before model training produces noise the model can’t usefully learn from.

Step 3: Train the Forecasting Model

What to do

For a 6-week rolling forecast, gradient boosting with lag features works better than a pure time-series model like ARIMA. The reason: your cash position depends on business state (pipeline, overdue AR, committed POs), not just the historical cash pattern alone. Gradient boosting handles mixed feature types without requiring stationarity.

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import TimeSeriesSplit
import joblib

def train_forecast_model(features_df, target_col='net_cash_flow'):
    # Add lag features
    for lag in [1, 2, 4, 8, 13]:
        features_df[f'cash_lag_{lag}w'] = features_df[target_col].shift(lag)

    # Rolling average — smooths one-off spikes
    features_df['cash_rolling_4w'] = (
        features_df[target_col].shift(1).rolling(4).mean()
    )

    features_df = features_df.dropna()

    feature_cols = [
        'ar_due', 'ap_due', 'ar_overdue_30', 'so_pipeline',
        'sin_week', 'cos_week', 'quarter',
        'cash_lag_1w', 'cash_lag_2w', 'cash_lag_4w',
        'cash_lag_8w', 'cash_lag_13w', 'cash_rolling_4w'
    ]

    X = features_df[feature_cols]
    y = features_df[target_col]

    # Time series cross-validation — never shuffle time-series data
    tscv = TimeSeriesSplit(n_splits=5)

    model = GradientBoostingRegressor(
        n_estimators=200,
        max_depth=4,
        learning_rate=0.05,
        subsample=0.8,
        random_state=42
    )

    scores = []
    for train_idx, val_idx in tscv.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        model.fit(X_train, y_train)
        pred = model.predict(X_val)
        mae = np.mean(np.abs(pred - y_val))
        scores.append(mae)

    print(f"Cross-validated MAE: {np.mean(scores):,.0f}")

    # Final fit on all available data
    model.fit(X, y)
    joblib.dump(model, 'cash_flow_model.pkl')
    return model, feature_cols

Why

TimeSeriesSplit is non-negotiable. Using random train/test splits on time-series data lets future information leak into training, which inflates validation metrics and produces a model that looks good in evaluation and fails in production. The temporal fold structure forces the model to predict forward, which is the actual task.

The 13-week lag (quarterly) matters for businesses with regular seasonal patterns — a Q4 revenue spike or a Q1 tax payment cycle. Without it, the model has no way to connect this week to the same week last year.

Gotchas

18 months of weekly data is roughly 75 rows after adding lag features. That’s thin. If validation MAE is substantially higher than training MAE, reduce max_depth to 3 or extend the historical window.
Net cash flow (ar_due - ap_due) is the right target. Forecasting inflows and outflows as separate models and then subtracting introduces correlated errors that amplify over the 6-week horizon.
Gradient boosting doesn’t extrapolate beyond its training range. An unusually large contract win or a one-off capital purchase will be under-predicted. Build in a manual override for known exceptional events.

Step 4: Generate the 6-Week Rolling Forecast

What to do

At the end of each week, re-run the pipeline against the latest Odoo data and project six weeks forward. Each forward step uses the previous step’s prediction as its lag input.

def generate_rolling_forecast(model, features_df, feature_cols, horizon=6):
    forecast_rows = []
    working_df = features_df.copy()

    for week_offset in range(1, horizon + 1):
        next_date = features_df['week_start'].max() + timedelta(weeks=week_offset)

        # Build forward features for this week
        # AR/AP due comes from maturity dates already in the database
        # so_pipeline comes from open sales orders
        next_features = build_future_week_features(working_df, next_date)

        pred = model.predict(next_features[feature_cols])[0]

        forecast_rows.append({
            'week_start': next_date,
            'forecast_net_cash_flow': pred,
            'week_offset': week_offset
        })

        # Append prediction as synthetic actuals so subsequent lags can use it
        working_df = pd.concat([
            working_df,
            pd.DataFrame([{
                'week_start': next_date,
                'net_cash_flow': pred,
                **next_features.iloc[0].to_dict()
            }])
        ], ignore_index=True)

    return pd.DataFrame(forecast_rows)

Week 1–2 forecasts are typically accurate within ±10–15%. By week 5–6, expect ±25–30%. This is normal. A 6-week cash forecast is a directional tool, not a balance sheet. Track accuracy weekly and display it next to the forecast so the finance team has calibrated expectations.

Step 5: Store and Display Results in Odoo

What to do

Create a lightweight custom model to hold forecast results and display them in the Accounting dashboard.

# models/cash_forecast.py
from odoo import models, fields

class CashFlowForecast(models.Model):
    _name = 'cash.flow.forecast'
    _description = 'Cash Flow Forecast'
    _order = 'week_start asc'

    week_start = fields.Date(string='Week Start', required=True)
    forecast_net_cash_flow = fields.Float(string='Forecast Net Cash Flow')
    actual_net_cash_flow = fields.Float(string='Actual Net Cash Flow')
    forecast_error = fields.Float(
        string='Forecast Error',
        compute='_compute_error',
        store=True
    )
    forecast_type = fields.Selection([
        ('model', 'Model'),
        ('manual_override', 'Manual Override'),
    ], string='Type', default='model')
    notes = fields.Text(string='Notes')

    @api.depends('forecast_net_cash_flow', 'actual_net_cash_flow')
    def _compute_error(self):
        for record in self:
            if record.actual_net_cash_flow:
                record.forecast_error = (
                    record.forecast_net_cash_flow - record.actual_net_cash_flow
                )
            else:
                record.forecast_error = 0.0

Wire a bar graph view to show forecast vs. actual side by side:

<record id="view_cash_forecast_graph" model="ir.ui.view">
    <field name="name">cash.flow.forecast.graph</field>
    <field name="model">cash.flow.forecast</field>
    <field name="arch" type="xml">
        <graph type="bar" stacked="0">
            <field name="week_start" type="row"/>
            <field name="forecast_net_cash_flow" type="measure"/>
            <field name="actual_net_cash_flow" type="measure"/>
        </graph>
    </field>
</record>

Schedule the pipeline with a ir.cron entry that runs Sunday evening, so the output is ready for Monday’s cash review meeting.

Gotchas

The manual_override field exists for a reason. Models drift when business conditions change — a large contract, a pricing revision, a supplier rate change. The override gives finance teams a clean way to intervene with an audit trail.
Don’t surface model internals in the dashboard. Finance teams want one number per week and a confidence band. Feature importances and residual plots belong in a separate analyst view, not the CFO’s screen.
Populate actual_net_cash_flow weekly from the posted journal entries (the same account.move.line query from step 1, but filtered to the prior week). The model’s accuracy tracking depends on it.

Key Takeaways

account.move.line, sale.order, and purchase.order give you the raw material for a 6-week cash forecast with no additional infrastructure.
date_maturity is the forecasting field. date is when the document was created.
Overdue AR is a stronger collection timing signal than maturity dates alone — customers who pay late once tend to do it again.
TimeSeriesSplit is non-negotiable. Random splits on time-series data produce misleading validation scores.
Weeks 1–3 are where the model adds real value. Weeks 4–6 are directional. Track accuracy weekly — a forecast nobody reviews is just noise with a confidence interval.

At Trobz, we’ve built this pipeline as part of our Predictive Analytics service for Odoo Accounting teams. If you want to pilot it on your own data, reach out.

Cash Flow Forecasting in Odoo: A Practical Guide to the ML Pipeline

Step 1: Extract the Raw Data from Odoo

Step 2: Build the Feature Matrix

Step 3: Train the Forecasting Model

Step 4: Generate the 6-Week Rolling Forecast

Step 5: Store and Display Results in Odoo

Key Takeaways

Ready to put AI to work?