Key Takeaways: Cash flow forecasting with machine learning doesn’t require a separate data warehouse or BI stack — your Odoo database already has everything you need. The key is extracting the right signals: receivables aging, payables schedules, confirmed order revenue, and seasonal patterns. A gradient boosting modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… trained on
account.move.line,sale.order, andpurchase.orderdata can generate a reliable 6-week rolling forecast. The hard part isn’t the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… — it’s data quality and the discipline of reviewing output against actuals each week.
Most businesses running Odoo have everything needed for cash flow forecasting sitting in their database. The data is in account.move, sale.order, purchase.order, and account.move.line. The problem is that most finance teams use aging reports and pivot tables to answer what the balance will look like in four weeks — which works, until a big customer pays late or an unexpected purchase order lands.
An MLA subfield of artificial intelligence where systems learn from data to improve performance on tasks without being explicitly programmed. ML algorithms identify patterns, make decisions, and generate…-based forecast doesn’t eliminate those surprises. It builds a probabilistic picture of the next six weeks based on historical patterns, known commitments, and receivables behavior. It also tells you where the uncertainty is concentrated, which is often more useful than the point estimate itself.
This post covers the full pipeline: data extraction, featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. engineering, modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss…, rolling forecast generation, and displaying results inside Odoo.
Step 1: Extract the Raw Data from Odoo
What to do
Connect to your Odoo PostgreSQL database directly — the ORM adds too much overhead for the volume of historical data you’ll need. Pull from three tables: account.move.line (payment history), sale.order (confirmed revenue), and purchase.order (committed spend).
import pandas as pd
import psycopg2
conn = psycopg2.connect(
host="localhost",
dbname="your_odoo_db",
user="odoo",
password="your_password"
)
# Receivables and payables from posted journal entries
receivables_query = """
SELECT
aml.id,
aml.date,
aml.date_maturity,
aml.balance,
am.move_type,
am.payment_state,
rp.id AS partner_id,
rp.name AS partner_name
FROM account_move_line aml
JOIN account_move am ON aml.move_id = am.id
JOIN account_account aa ON aml.account_id = aa.id
JOIN res_partner rp ON aml.partner_id = rp.id
WHERE aa.account_type IN ('asset_receivable', 'liability_payable')
AND am.state = 'posted'
AND aml.date >= NOW() - INTERVAL '18 months'
ORDER BY aml.date
"""
# Confirmed sales orders not yet fully invoiced — future cash in
sales_query = """
SELECT
so.id,
so.date_order,
so.amount_total,
so.partner_id,
so.invoice_status,
sol.price_subtotal,
sol.qty_to_invoice
FROM sale_order so
JOIN sale_order_line sol ON sol.order_id = so.id
WHERE so.state = 'sale'
AND so.invoice_status IN ('to invoice', 'invoiced')
AND so.date_order >= NOW() - INTERVAL '18 months'
"""
# Purchase orders confirmed — committed spend
purchase_query = """
SELECT
po.id,
po.date_order,
po.date_approve,
po.amount_total,
po.partner_id,
po.invoice_status,
po.state
FROM purchase_order po
WHERE po.state IN ('purchase', 'done')
AND po.date_order >= NOW() - INTERVAL '18 months'
"""
df_ar = pd.read_sql(receivables_query, conn)
df_so = pd.read_sql(sales_query, conn)
df_po = pd.read_sql(purchase_query, conn)
Why
The ORM layers security rules, computed field recalculation, and access control on every read. For analytics on 18 months of transaction history, you want the raw columns. Direct SQL is also 10–100x faster for full-table scans.
Gotchas
- Filter by
account_type—asset_receivablefor AR,liability_payablefor AP. Don’t filter onmove_typealone: credit notes and refunds flip the sign on the balance column. date_maturityis the right field for forecasting, notdate. Payment terms set the maturity; that’s when cash is expected to move.- Sales orders with
invoice_status = 'to invoice'represent committed revenue not yet billed. Include them — they’re real future inflows, just earlier in the billing cycle.
Step 2: Build the Feature Matrix
What to do
Transform raw transaction data into a weekly time series. Each row in the featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. matrix is one week, with features derived from the state of the business at the start of that week.
import numpy as np
from datetime import timedelta
def build_weekly_features(df_ar, df_so, reference_date):
weeks = pd.date_range(
start=reference_date - pd.DateOffset(months=15),
end=reference_date,
freq='W-MON'
)
rows = []
for week_start in weeks:
week_end = week_start + timedelta(days=7)
# Receivables due this week
ar_due = df_ar[
(df_ar['date_maturity'] >= week_start) &
(df_ar['date_maturity'] < week_end) &
(df_ar['move_type'].isin(['out_invoice', 'out_refund']))
]['balance'].sum()
# Payables due this week
ap_due = df_ar[
(df_ar['date_maturity'] >= week_start) &
(df_ar['date_maturity'] < week_end) &
(df_ar['move_type'].isin(['in_invoice', 'in_refund']))
]['balance'].sum()
# Overdue AR from past 30 days (collection behavior signal)
ar_overdue_30 = df_ar[
(df_ar['date_maturity'] < week_start) &
(df_ar['date_maturity'] >= week_start - timedelta(days=30)) &
(df_ar['payment_state'] != 'paid')
]['balance'].sum()
# Confirmed SO value not yet invoiced (committed inflow pipeline)
so_pipeline = df_so[
(df_so['date_order'] < week_end) &
(df_so['invoice_status'] == 'to invoice')
]['price_subtotal'].sum()
# Seasonal features
week_of_year = week_start.isocalendar()[1]
month = week_start.month
rows.append({
'week_start': week_start,
'ar_due': ar_due,
'ap_due': ap_due,
'ar_overdue_30': ar_overdue_30,
'so_pipeline': so_pipeline,
'quarter': (month - 1) // 3 + 1,
# Sin/cos encoding avoids the artificial jump from week 52 to week 1
'sin_week': np.sin(2 * np.pi * week_of_year / 52),
'cos_week': np.cos(2 * np.pi * week_of_year / 52),
})
return pd.DataFrame(rows)
Why
The ar_overdue_30 featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. is one of the better predictors of actual collection timing. Customers who paid late last month tend to pay late this month — the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… picks this up quickly. This signal carries more information than date_maturity alone, which assumes everyone pays on time.
Sin/cos encoding for week-of-year (rather than the raw integer) means week 52 and week 1 are adjacent in featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. space, just as they are in the real calendar. Without it, the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… sees a large numerical gap between the last week of December and the first week of January.
Gotchas
- Keep
ar_dueandso_pipelineas separate features. They represent different stages of the revenue cycle with different reliability as signals. Summing them into a single “expected inflows” featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. loses that distinction. - Purchase order data tells you what’s committed, not when the invoice will arrive. Use
date_approveplus your vendor payment terms to estimate AP timing, or accept that the AP featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. is a lower-bound estimate. - If you’re running multi-company, build the featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. matrix per legal entity. Aggregating across companies in different currencies before modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… produces noise the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… can’t usefully learn from.
Step 3: Train the Forecasting Model
What to do
For a 6-week rolling forecast, gradient boosting with lag features works better than a pure time-series modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… like ARIMA. The reason: your cash position depends on business state (pipeline, overdue AR, committed POs), not just the historical cash pattern alone. Gradient boosting handles mixed featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. types without requiring stationarity.
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import TimeSeriesSplit
import joblib
def train_forecast_model(features_df, target_col='net_cash_flow'):
# Add lag features
for lag in [1, 2, 4, 8, 13]:
features_df[f'cash_lag_{lag}w'] = features_df[target_col].shift(lag)
# Rolling average — smooths one-off spikes
features_df['cash_rolling_4w'] = (
features_df[target_col].shift(1).rolling(4).mean()
)
features_df = features_df.dropna()
feature_cols = [
'ar_due', 'ap_due', 'ar_overdue_30', 'so_pipeline',
'sin_week', 'cos_week', 'quarter',
'cash_lag_1w', 'cash_lag_2w', 'cash_lag_4w',
'cash_lag_8w', 'cash_lag_13w', 'cash_rolling_4w'
]
X = features_df[feature_cols]
y = features_df[target_col]
# Time series cross-validation — never shuffle time-series data
tscv = TimeSeriesSplit(n_splits=5)
model = GradientBoostingRegressor(
n_estimators=200,
max_depth=4,
learning_rate=0.05,
subsample=0.8,
random_state=42
)
scores = []
for train_idx, val_idx in tscv.split(X):
X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
model.fit(X_train, y_train)
pred = model.predict(X_val)
mae = np.mean(np.abs(pred - y_val))
scores.append(mae)
print(f"Cross-validated MAE: {np.mean(scores):,.0f}")
# Final fit on all available data
model.fit(X, y)
joblib.dump(model, 'cash_flow_model.pkl')
return model, feature_cols
Why
TimeSeriesSplit is non-negotiable. Using random train/test splits on time-series data lets future information leak into trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss…, which inflates validation metrics and produces a modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… that looks good in evaluation and fails in production. The temporal fold structure forces the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… to predict forward, which is the actual task.
The 13-week lag (quarterly) matters for businesses with regular seasonal patterns — a Q4 revenue spike or a Q1 tax payment cycle. Without it, the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… has no way to connect this week to the same week last year.
Gotchas
- 18 months of weekly data is roughly 75 rows after adding lag features. That’s thin. If validation MAE is substantially higher than trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… MAE, reduce
max_depthto 3 or extend the historical window. - Net cash flow (
ar_due - ap_due) is the right target. Forecasting inflows and outflows as separate models and then subtracting introduces correlated errors that amplify over the 6-week horizon. - Gradient boosting doesn’t extrapolate beyond its trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… range. An unusually large contract win or a one-off capital purchase will be under-predicted. Build in a manual override for known exceptional events.
Step 4: Generate the 6-Week Rolling Forecast
What to do
At the end of each week, re-run the pipeline against the latest Odoo data and project six weeks forward. Each forward step uses the previous step’s prediction as its lag input.
def generate_rolling_forecast(model, features_df, feature_cols, horizon=6):
forecast_rows = []
working_df = features_df.copy()
for week_offset in range(1, horizon + 1):
next_date = features_df['week_start'].max() + timedelta(weeks=week_offset)
# Build forward features for this week
# AR/AP due comes from maturity dates already in the database
# so_pipeline comes from open sales orders
next_features = build_future_week_features(working_df, next_date)
pred = model.predict(next_features[feature_cols])[0]
forecast_rows.append({
'week_start': next_date,
'forecast_net_cash_flow': pred,
'week_offset': week_offset
})
# Append prediction as synthetic actuals so subsequent lags can use it
working_df = pd.concat([
working_df,
pd.DataFrame([{
'week_start': next_date,
'net_cash_flow': pred,
**next_features.iloc[0].to_dict()
}])
], ignore_index=True)
return pd.DataFrame(forecast_rows)
Week 1–2 forecasts are typically accurate within ±10–15%. By week 5–6, expect ±25–30%. This is normal. A 6-week cash forecast is a directional tool, not a balance sheet. Track accuracy weekly and display it next to the forecast so the finance team has calibrated expectations.
Step 5: Store and Display Results in Odoo
What to do
Create a lightweight custom modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… to hold forecast results and display them in the Accounting dashboard.
# models/cash_forecast.py
from odoo import models, fields
class CashFlowForecast(models.Model):
_name = 'cash.flow.forecast'
_description = 'Cash Flow Forecast'
_order = 'week_start asc'
week_start = fields.Date(string='Week Start', required=True)
forecast_net_cash_flow = fields.Float(string='Forecast Net Cash Flow')
actual_net_cash_flow = fields.Float(string='Actual Net Cash Flow')
forecast_error = fields.Float(
string='Forecast Error',
compute='_compute_error',
store=True
)
forecast_type = fields.Selection([
('model', 'Model'),
('manual_override', 'Manual Override'),
], string='Type', default='model')
notes = fields.Text(string='Notes')
@api.depends('forecast_net_cash_flow', 'actual_net_cash_flow')
def _compute_error(self):
for record in self:
if record.actual_net_cash_flow:
record.forecast_error = (
record.forecast_net_cash_flow - record.actual_net_cash_flow
)
else:
record.forecast_error = 0.0
Wire a bar graph view to show forecast vs. actual side by side:
<record id="view_cash_forecast_graph" model="ir.ui.view">
<field name="name">cash.flow.forecast.graph</field>
<field name="model">cash.flow.forecast</field>
<field name="arch" type="xml">
<graph type="bar" stacked="0">
<field name="week_start" type="row"/>
<field name="forecast_net_cash_flow" type="measure"/>
<field name="actual_net_cash_flow" type="measure"/>
</graph>
</field>
</record>
Schedule the pipeline with a ir.cron entry that runs Sunday evening, so the output is ready for Monday’s cash review meeting.
Gotchas
- The
manual_overridefield exists for a reason. Models drift when business conditions change — a large contract, a pricing revision, a supplier rate change. The override gives finance teams a clean way to intervene with an audit trail. - Don’t surface modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… internals in the dashboard. Finance teams want one number per week and a confidence band. FeatureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. importances and residual plots belong in a separate analyst view, not the CFO’s screen.
- Populate
actual_net_cash_flowweekly from the posted journal entries (the sameaccount.move.linequery from step 1, but filtered to the prior week). The modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or…’s accuracy tracking depends on it.
Key Takeaways
account.move.line,sale.order, andpurchase.ordergive you the raw material for a 6-week cash forecast with no additional infrastructure.date_maturityis the forecasting field.dateis when the document was created.- Overdue AR is a stronger collection timing signal than maturity dates alone — customers who pay late once tend to do it again.
TimeSeriesSplitis non-negotiable. Random splits on time-series data produce misleading validation scores.- Weeks 1–3 are where the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… adds real value. Weeks 4–6 are directional. Track accuracy weekly — a forecast nobody reviews is just noise with a confidence interval.
At Trobz, we’ve built this pipeline as part of our Predictive Analytics service for Odoo Accounting teams. If you want to pilot it on your own data, reach out.