← Back to Blog
AI by

Cash Flow Forecasting in Odoo: A Practical Guide to the ML Pipeline

Cash Flow Forecasting in Odoo: A Practical Guide to the ML Pipeline

Your Odoo database already has everything needed for a 6-week cash flow forecast. Here's how to build the MLA subfield of artificial intelligence where systems learn from data to improve performance on tasks without being explicitly programmed. ML algorithms identify patterns, make decisions, and generate… pipeline from raw data to a live dashboard widget.

Key Takeaways: Cash flow forecasting with machine learning doesn’t require a separate data warehouse or BI stack — your Odoo database already has everything you need. The key is extracting the right signals: receivables aging, payables schedules, confirmed order revenue, and seasonal patterns. A gradient boosting modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… trained on account.move.line, sale.order, and purchase.order data can generate a reliable 6-week rolling forecast. The hard part isn’t the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… — it’s data quality and the discipline of reviewing output against actuals each week.


Most businesses running Odoo have everything needed for cash flow forecasting sitting in their database. The data is in account.move, sale.order, purchase.order, and account.move.line. The problem is that most finance teams use aging reports and pivot tables to answer what the balance will look like in four weeks — which works, until a big customer pays late or an unexpected purchase order lands.

An MLA subfield of artificial intelligence where systems learn from data to improve performance on tasks without being explicitly programmed. ML algorithms identify patterns, make decisions, and generate…-based forecast doesn’t eliminate those surprises. It builds a probabilistic picture of the next six weeks based on historical patterns, known commitments, and receivables behavior. It also tells you where the uncertainty is concentrated, which is often more useful than the point estimate itself.

This post covers the full pipeline: data extraction, featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. engineering, modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss…, rolling forecast generation, and displaying results inside Odoo.

Step 1: Extract the Raw Data from Odoo

What to do

Connect to your Odoo PostgreSQL database directly — the ORM adds too much overhead for the volume of historical data you’ll need. Pull from three tables: account.move.line (payment history), sale.order (confirmed revenue), and purchase.order (committed spend).

import pandas as pd
import psycopg2

conn = psycopg2.connect(
    host="localhost",
    dbname="your_odoo_db",
    user="odoo",
    password="your_password"
)

# Receivables and payables from posted journal entries
receivables_query = """
SELECT
    aml.id,
    aml.date,
    aml.date_maturity,
    aml.balance,
    am.move_type,
    am.payment_state,
    rp.id AS partner_id,
    rp.name AS partner_name
FROM account_move_line aml
JOIN account_move am ON aml.move_id = am.id
JOIN account_account aa ON aml.account_id = aa.id
JOIN res_partner rp ON aml.partner_id = rp.id
WHERE aa.account_type IN ('asset_receivable', 'liability_payable')
  AND am.state = 'posted'
  AND aml.date >= NOW() - INTERVAL '18 months'
ORDER BY aml.date
"""

# Confirmed sales orders not yet fully invoiced — future cash in
sales_query = """
SELECT
    so.id,
    so.date_order,
    so.amount_total,
    so.partner_id,
    so.invoice_status,
    sol.price_subtotal,
    sol.qty_to_invoice
FROM sale_order so
JOIN sale_order_line sol ON sol.order_id = so.id
WHERE so.state = 'sale'
  AND so.invoice_status IN ('to invoice', 'invoiced')
  AND so.date_order >= NOW() - INTERVAL '18 months'
"""

# Purchase orders confirmed — committed spend
purchase_query = """
SELECT
    po.id,
    po.date_order,
    po.date_approve,
    po.amount_total,
    po.partner_id,
    po.invoice_status,
    po.state
FROM purchase_order po
WHERE po.state IN ('purchase', 'done')
  AND po.date_order >= NOW() - INTERVAL '18 months'
"""

df_ar = pd.read_sql(receivables_query, conn)
df_so = pd.read_sql(sales_query, conn)
df_po = pd.read_sql(purchase_query, conn)

Why

The ORM layers security rules, computed field recalculation, and access control on every read. For analytics on 18 months of transaction history, you want the raw columns. Direct SQL is also 10–100x faster for full-table scans.

Gotchas

  • Filter by account_typeasset_receivable for AR, liability_payable for AP. Don’t filter on move_type alone: credit notes and refunds flip the sign on the balance column.
  • date_maturity is the right field for forecasting, not date. Payment terms set the maturity; that’s when cash is expected to move.
  • Sales orders with invoice_status = 'to invoice' represent committed revenue not yet billed. Include them — they’re real future inflows, just earlier in the billing cycle.

Step 2: Build the Feature Matrix

What to do

Transform raw transaction data into a weekly time series. Each row in the featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. matrix is one week, with features derived from the state of the business at the start of that week.

import numpy as np
from datetime import timedelta

def build_weekly_features(df_ar, df_so, reference_date):
    weeks = pd.date_range(
        start=reference_date - pd.DateOffset(months=15),
        end=reference_date,
        freq='W-MON'
    )

    rows = []
    for week_start in weeks:
        week_end = week_start + timedelta(days=7)

        # Receivables due this week
        ar_due = df_ar[
            (df_ar['date_maturity'] >= week_start) &
            (df_ar['date_maturity'] < week_end) &
            (df_ar['move_type'].isin(['out_invoice', 'out_refund']))
        ]['balance'].sum()

        # Payables due this week
        ap_due = df_ar[
            (df_ar['date_maturity'] >= week_start) &
            (df_ar['date_maturity'] < week_end) &
            (df_ar['move_type'].isin(['in_invoice', 'in_refund']))
        ]['balance'].sum()

        # Overdue AR from past 30 days (collection behavior signal)
        ar_overdue_30 = df_ar[
            (df_ar['date_maturity'] < week_start) &
            (df_ar['date_maturity'] >= week_start - timedelta(days=30)) &
            (df_ar['payment_state'] != 'paid')
        ]['balance'].sum()

        # Confirmed SO value not yet invoiced (committed inflow pipeline)
        so_pipeline = df_so[
            (df_so['date_order'] < week_end) &
            (df_so['invoice_status'] == 'to invoice')
        ]['price_subtotal'].sum()

        # Seasonal features
        week_of_year = week_start.isocalendar()[1]
        month = week_start.month

        rows.append({
            'week_start': week_start,
            'ar_due': ar_due,
            'ap_due': ap_due,
            'ar_overdue_30': ar_overdue_30,
            'so_pipeline': so_pipeline,
            'quarter': (month - 1) // 3 + 1,
            # Sin/cos encoding avoids the artificial jump from week 52 to week 1
            'sin_week': np.sin(2 * np.pi * week_of_year / 52),
            'cos_week': np.cos(2 * np.pi * week_of_year / 52),
        })

    return pd.DataFrame(rows)

Why

The ar_overdue_30 featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. is one of the better predictors of actual collection timing. Customers who paid late last month tend to pay late this month — the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… picks this up quickly. This signal carries more information than date_maturity alone, which assumes everyone pays on time.

Sin/cos encoding for week-of-year (rather than the raw integer) means week 52 and week 1 are adjacent in featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. space, just as they are in the real calendar. Without it, the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… sees a large numerical gap between the last week of December and the first week of January.

Gotchas

  • Keep ar_due and so_pipeline as separate features. They represent different stages of the revenue cycle with different reliability as signals. Summing them into a single “expected inflows” featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. loses that distinction.
  • Purchase order data tells you what’s committed, not when the invoice will arrive. Use date_approve plus your vendor payment terms to estimate AP timing, or accept that the AP featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. is a lower-bound estimate.
  • If you’re running multi-company, build the featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. matrix per legal entity. Aggregating across companies in different currencies before modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… produces noise the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… can’t usefully learn from.

Step 3: Train the Forecasting Model

What to do

For a 6-week rolling forecast, gradient boosting with lag features works better than a pure time-series modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… like ARIMA. The reason: your cash position depends on business state (pipeline, overdue AR, committed POs), not just the historical cash pattern alone. Gradient boosting handles mixed featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. types without requiring stationarity.

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import TimeSeriesSplit
import joblib

def train_forecast_model(features_df, target_col='net_cash_flow'):
    # Add lag features
    for lag in [1, 2, 4, 8, 13]:
        features_df[f'cash_lag_{lag}w'] = features_df[target_col].shift(lag)

    # Rolling average — smooths one-off spikes
    features_df['cash_rolling_4w'] = (
        features_df[target_col].shift(1).rolling(4).mean()
    )

    features_df = features_df.dropna()

    feature_cols = [
        'ar_due', 'ap_due', 'ar_overdue_30', 'so_pipeline',
        'sin_week', 'cos_week', 'quarter',
        'cash_lag_1w', 'cash_lag_2w', 'cash_lag_4w',
        'cash_lag_8w', 'cash_lag_13w', 'cash_rolling_4w'
    ]

    X = features_df[feature_cols]
    y = features_df[target_col]

    # Time series cross-validation — never shuffle time-series data
    tscv = TimeSeriesSplit(n_splits=5)

    model = GradientBoostingRegressor(
        n_estimators=200,
        max_depth=4,
        learning_rate=0.05,
        subsample=0.8,
        random_state=42
    )

    scores = []
    for train_idx, val_idx in tscv.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        model.fit(X_train, y_train)
        pred = model.predict(X_val)
        mae = np.mean(np.abs(pred - y_val))
        scores.append(mae)

    print(f"Cross-validated MAE: {np.mean(scores):,.0f}")

    # Final fit on all available data
    model.fit(X, y)
    joblib.dump(model, 'cash_flow_model.pkl')
    return model, feature_cols

Why

TimeSeriesSplit is non-negotiable. Using random train/test splits on time-series data lets future information leak into trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss…, which inflates validation metrics and produces a modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… that looks good in evaluation and fails in production. The temporal fold structure forces the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… to predict forward, which is the actual task.

The 13-week lag (quarterly) matters for businesses with regular seasonal patterns — a Q4 revenue spike or a Q1 tax payment cycle. Without it, the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… has no way to connect this week to the same week last year.

Gotchas

  • 18 months of weekly data is roughly 75 rows after adding lag features. That’s thin. If validation MAE is substantially higher than trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… MAE, reduce max_depth to 3 or extend the historical window.
  • Net cash flow (ar_due - ap_due) is the right target. Forecasting inflows and outflows as separate models and then subtracting introduces correlated errors that amplify over the 6-week horizon.
  • Gradient boosting doesn’t extrapolate beyond its trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… range. An unusually large contract win or a one-off capital purchase will be under-predicted. Build in a manual override for known exceptional events.

Step 4: Generate the 6-Week Rolling Forecast

What to do

At the end of each week, re-run the pipeline against the latest Odoo data and project six weeks forward. Each forward step uses the previous step’s prediction as its lag input.

def generate_rolling_forecast(model, features_df, feature_cols, horizon=6):
    forecast_rows = []
    working_df = features_df.copy()

    for week_offset in range(1, horizon + 1):
        next_date = features_df['week_start'].max() + timedelta(weeks=week_offset)

        # Build forward features for this week
        # AR/AP due comes from maturity dates already in the database
        # so_pipeline comes from open sales orders
        next_features = build_future_week_features(working_df, next_date)

        pred = model.predict(next_features[feature_cols])[0]

        forecast_rows.append({
            'week_start': next_date,
            'forecast_net_cash_flow': pred,
            'week_offset': week_offset
        })

        # Append prediction as synthetic actuals so subsequent lags can use it
        working_df = pd.concat([
            working_df,
            pd.DataFrame([{
                'week_start': next_date,
                'net_cash_flow': pred,
                **next_features.iloc[0].to_dict()
            }])
        ], ignore_index=True)

    return pd.DataFrame(forecast_rows)

Week 1–2 forecasts are typically accurate within ±10–15%. By week 5–6, expect ±25–30%. This is normal. A 6-week cash forecast is a directional tool, not a balance sheet. Track accuracy weekly and display it next to the forecast so the finance team has calibrated expectations.


Step 5: Store and Display Results in Odoo

What to do

Create a lightweight custom modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… to hold forecast results and display them in the Accounting dashboard.

# models/cash_forecast.py
from odoo import models, fields

class CashFlowForecast(models.Model):
    _name = 'cash.flow.forecast'
    _description = 'Cash Flow Forecast'
    _order = 'week_start asc'

    week_start = fields.Date(string='Week Start', required=True)
    forecast_net_cash_flow = fields.Float(string='Forecast Net Cash Flow')
    actual_net_cash_flow = fields.Float(string='Actual Net Cash Flow')
    forecast_error = fields.Float(
        string='Forecast Error',
        compute='_compute_error',
        store=True
    )
    forecast_type = fields.Selection([
        ('model', 'Model'),
        ('manual_override', 'Manual Override'),
    ], string='Type', default='model')
    notes = fields.Text(string='Notes')

    @api.depends('forecast_net_cash_flow', 'actual_net_cash_flow')
    def _compute_error(self):
        for record in self:
            if record.actual_net_cash_flow:
                record.forecast_error = (
                    record.forecast_net_cash_flow - record.actual_net_cash_flow
                )
            else:
                record.forecast_error = 0.0

Wire a bar graph view to show forecast vs. actual side by side:

<record id="view_cash_forecast_graph" model="ir.ui.view">
    <field name="name">cash.flow.forecast.graph</field>
    <field name="model">cash.flow.forecast</field>
    <field name="arch" type="xml">
        <graph type="bar" stacked="0">
            <field name="week_start" type="row"/>
            <field name="forecast_net_cash_flow" type="measure"/>
            <field name="actual_net_cash_flow" type="measure"/>
        </graph>
    </field>
</record>

Schedule the pipeline with a ir.cron entry that runs Sunday evening, so the output is ready for Monday’s cash review meeting.

Gotchas

  • The manual_override field exists for a reason. Models drift when business conditions change — a large contract, a pricing revision, a supplier rate change. The override gives finance teams a clean way to intervene with an audit trail.
  • Don’t surface modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… internals in the dashboard. Finance teams want one number per week and a confidence band. FeatureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. importances and residual plots belong in a separate analyst view, not the CFO’s screen.
  • Populate actual_net_cash_flow weekly from the posted journal entries (the same account.move.line query from step 1, but filtered to the prior week). The modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or…’s accuracy tracking depends on it.

Key Takeaways

  • account.move.line, sale.order, and purchase.order give you the raw material for a 6-week cash forecast with no additional infrastructure.
  • date_maturity is the forecasting field. date is when the document was created.
  • Overdue AR is a stronger collection timing signal than maturity dates alone — customers who pay late once tend to do it again.
  • TimeSeriesSplit is non-negotiable. Random splits on time-series data produce misleading validation scores.
  • Weeks 1–3 are where the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… adds real value. Weeks 4–6 are directional. Track accuracy weekly — a forecast nobody reviews is just noise with a confidence interval.

At Trobz, we’ve built this pipeline as part of our Predictive Analytics service for Odoo Accounting teams. If you want to pilot it on your own data, reach out.

Ready to put AI to work?

Let's explore how Trobz AI can automate your processes, enhance your ERP, and help your team make better decisions — faster.