Key Takeaways: Collaborative filtering on
sale.order.linepurchase history generates accurate product recommendations without external MLA subfield of artificial intelligence where systems learn from data to improve performance on tasks without being explicitly programmed. ML algorithms identify patterns, make decisions, and generate… infrastructure. The co-purchase matrix lives in PostgreSQL, recommendations are cached in Redis or Odoo’s own cache layer, and they surface on the product page via a custom website snippet. A/B testing the widget typically shows a 6–12% lift in average order value within four weeks.
Most eCommerce recommendation engines are over-engineered. Teams reach for dedicated platforms — Amazon Personalize, Recombee, Algolia Recommend — before exhausting what they already have. If your Odoo instance has 12+ months of order history, you have enough signal to build a recommendation engine that performs well for the majority of your catalogue. This guide walks through the full implementation: data extraction, matrix construction, similarity scoring, caching, the Odoo website snippet, and A/B test setup.
What Collaborative Filtering Actually Computes
Item-based collaborative filtering answers one question: given that a customer bought product A, which other products do customers who also bought A tend to buy?
The algorithm doesn’t need user profiles, browsing history, or demographic data. It needs only the co-occurrence of products within orders. Two products that frequently appear in the same basket are considered similar — not because their descriptions match, but because customers treat them as related.
The output is a square matrix: rows and columns are products, cells are similarity scores. Given product A, you sort column A descending and return the top N products. That’s the recommendation.
This simplicity is a featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline., not a limitation. It works for B2B catalogues (industrial parts, office supplies) as well as B2C, it handles products with no content metadata (blank descriptions, missing categories), and it improves automatically as order volume grows.
Step 1: Extract the Co-Purchase Matrix from sale.order.line
The raw material is sale.order.line. You need pairs of products that appear in the same sale.order, along with how often those pairs co-occur.
import psycopg2
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
def build_copurchase_matrix(db_url: str, min_orders: int = 5) -> pd.DataFrame:
"""
Build a product co-purchase similarity matrix from Odoo sale.order.line data.
Args:
db_url: PostgreSQL connection string for the Odoo database
min_orders: Minimum number of confirmed orders a product must appear in
to be included (filters out rare products with low signal)
Returns:
DataFrame with product_id index/columns and cosine similarity scores
"""
conn = psycopg2.connect(db_url)
# Fetch confirmed orders only — draft and cancelled orders contaminate signal
query = """
SELECT
sol.order_id,
sol.product_id,
pt.name->>'en_US' AS product_name
FROM sale_order_line sol
JOIN sale_order so ON so.id = sol.order_id
JOIN product_product pp ON pp.id = sol.product_id
JOIN product_template pt ON pt.id = pp.product_tmpl_id
WHERE so.state IN ('sale', 'done')
AND sol.product_id IS NOT NULL
AND so.date_order >= NOW() - INTERVAL '18 months'
ORDER BY sol.order_id, sol.product_id
"""
df = pd.read_sql(query, conn)
conn.close()
# Filter products with too few orders (low statistical signal)
product_counts = df.groupby('product_id')['order_id'].nunique()
valid_products = product_counts[product_counts >= min_orders].index
df = df[df['product_id'].isin(valid_products)]
# Build the order × product binary matrix
# 1 = product appeared in this order, 0 = did not
order_product = df.pivot_table(
index='order_id',
columns='product_id',
values='product_name',
aggfunc='count',
fill_value=0
).clip(upper=1) # binarize: we care about co-occurrence, not quantity
# Compute cosine similarity between product vectors
sparse_matrix = csr_matrix(order_product.values.T)
similarity = cosine_similarity(sparse_matrix)
product_ids = order_product.columns.tolist()
similarity_df = pd.DataFrame(
similarity,
index=product_ids,
columns=product_ids
)
return similarity_df
Why 18 months? Longer windows dilute recent trends; shorter windows reduce statistical confidence for products with moderate sales velocity. Adjust based on your order volume. A catalogue with 10,000+ orders/month can use 6 months; a B2B shop with 200 orders/month needs the full 18.
Why binarize? A customer who bought 3 units of a product in one order shouldn’t count three times. The signal is co-occurrence, not quantity.
Step 2: Generate and Store Recommendations
The similarity matrix can be large. A catalogue of 5,000 active products produces a 5,000 × 5,000 matrix — 25 million cells. You don’t need all of it. Store only the top-K neighbours per product.
import json
import psycopg2
def store_recommendations(
similarity_df: pd.DataFrame,
db_url: str,
top_k: int = 8
):
"""
Store top-K similar products per product in a dedicated Odoo model.
Uses a simple JSON column for flexibility — no separate recommendation table needed.
"""
conn = psycopg2.connect(db_url)
cur = conn.cursor()
# Ensure the recommendations table exists
# In practice, define this as an Odoo model (see next section)
cur.execute("""
CREATE TABLE IF NOT EXISTS product_recommendation_cache (
product_id INTEGER PRIMARY KEY REFERENCES product_product(id),
recommendations JSONB NOT NULL,
computed_at TIMESTAMP DEFAULT NOW()
)
""")
for product_id in similarity_df.index:
# Get top-K similar products, excluding the product itself
scores = similarity_df[product_id].drop(product_id).nlargest(top_k)
recommendations = [
{"product_id": int(pid), "score": round(float(score), 4)}
for pid, score in scores.items()
if score > 0.05 # discard near-zero correlations
]
cur.execute("""
INSERT INTO product_recommendation_cache (product_id, recommendations, computed_at)
VALUES (%s, %s, NOW())
ON CONFLICT (product_id) DO UPDATE
SET recommendations = EXCLUDED.recommendations,
computed_at = EXCLUDED.computed_at
""", (int(product_id), json.dumps(recommendations)))
conn.commit()
cur.close()
conn.close()
Run this as a daily batch job. Schedule it via Odoo’s ir.cron so it runs during off-peak hours:
<record id="cron_rebuild_recommendations" model="ir.cron">
<field name="name">Rebuild Product Recommendations</field>
<field name="model_id" ref="product.model_product_product"/>
<field name="state">code</field>
<field name="code">model._rebuild_recommendations()</field>
<field name="interval_number">1</field>
<field name="interval_type">days</field>
<field name="numbercall">-1</field>
<field name="active">True</field>
</record>
Step 3: The Odoo Model
Define the cache as a proper Odoo modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… so it integrates with the ORM and benefits from Odoo’s access control:
from odoo import models, fields, api
import json
class ProductRecommendation(models.Model):
_name = 'product.recommendation'
_description = 'Product Recommendation Cache'
_rec_name = 'product_id'
product_id = fields.Many2one(
'product.product',
string='Product',
required=True,
index=True,
ondelete='cascade'
)
recommendations = fields.Text(
string='Recommendations JSON',
help='JSON list of {product_id, score} dicts, ordered by descending score'
)
computed_at = fields.Datetime(string='Computed At', readonly=True)
def get_recommendations(self, product_id: int, limit: int = 6) -> list:
"""
Return recommended product.template records for a given product.
Falls back to [] if no recommendations exist or cache is stale (>2 days).
"""
rec = self.search([('product_id', '=', product_id)], limit=1)
if not rec:
return []
# Stale cache guard
if rec.computed_at:
age = fields.Datetime.now() - rec.computed_at
if age.days > 2:
return []
try:
data = json.loads(rec.recommendations or '[]')
except (json.JSONDecodeError, TypeError):
return []
product_ids = [d['product_id'] for d in data[:limit]]
# Return product.template records (website shows templates, not variants)
products = self.env['product.product'].browse(product_ids)
return products.mapped('product_tmpl_id').filtered(
lambda t: t.website_published and t.sale_ok
)
The get_recommendations method returns product.template records — the unit the website displays — and filters out unpublished or non-saleable products so dead listings never surface to shoppers.
Step 4: The Website Snippet
The recommendation widget appears on product pages under a “Customers also bought” heading. Implement it as a custom website snippet that calls a JSON controller:
# controllers/main.py
from odoo import http
from odoo.http import request
import json
class RecommendationController(http.Controller):
@http.route(
'/shop/recommendations/<int:product_id>',
type='json',
auth='public',
methods=['POST'],
csrf=False,
)
def get_recommendations(self, product_id: int, limit: int = 6):
# Resolve product.product from product.template id
product = request.env['product.product'].sudo().search(
[('product_tmpl_id', '=', product_id)], limit=1
)
if not product:
return {'recommendations': []}
rec_model = request.env['product.recommendation'].sudo()
templates = rec_model.get_recommendations(product.id, limit=limit)
return {
'recommendations': [
{
'id': t.id,
'name': t.name,
'price': t.list_price,
'image_url': f'/web/image/product.template/{t.id}/image_256',
'url': f'/shop/{t.website_slug}',
}
for t in templates
]
}
// static/src/js/recommendation_widget.js
odoo.define('your_module.RecommendationWidget', function (require) {
'use strict';
const publicWidget = require('web.public.widget');
publicWidget.registry.ProductRecommendations = publicWidget.Widget.extend({
selector: '.js_recommendations_widget',
start() {
const productId = parseInt(this.el.dataset.productId, 10);
if (!productId) return this._super(...arguments);
return this._loadRecommendations(productId);
},
async _loadRecommendations(productId) {
const response = await fetch('/shop/recommendations/' + productId, {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({jsonrpc: '2.0', method: 'call', params: {limit: 6}}),
});
const data = await response.json();
const items = data?.result?.recommendations || [];
if (items.length === 0) {
this.el.style.display = 'none';
return;
}
this.el.querySelector('.js_rec_list').innerHTML = items.map(p => `
<div class="col-6 col-md-4 col-lg-2 mb-3">
<a href="${p.url}" class="text-decoration-none">
<img src="${p.image_url}" class="img-fluid rounded mb-1" alt="${p.name}">
<div class="small text-dark">${p.name}</div>
<div class="small text-muted">${p.price.toFixed(2)}</div>
</a>
</div>
`).join('');
},
});
return publicWidget.registry.ProductRecommendations;
});
Add the widget placeholder to the product page template via a QWeb view inheritance:
<template id="product_page_recommendations" inherit_id="website_sale.product">
<xpath expr="//div[@id='product_detail_main']" position="after">
<div class="container mt-5 js_recommendations_widget"
t-att-data-product-id="product.id">
<h4 class="mb-3">Customers also bought</h4>
<div class="row js_rec_list"></div>
</div>
</xpath>
</template>
Step 5: Performance Considerations
Cache warmup time. The batch job runs once per day. For a 5,000-product catalogue, build_copurchase_matrix takes about 90 seconds on a standard Odoo instance (the bottleneck is the pivot operation, not the SQL query). Run it at 2 AM.
Cold-start problem. New products have no purchase history. For the first 30 days of a product’s life, fall back to category-based popularity: surface the top-selling products in the same pos.category or product.category. Add this fallback to get_recommendations before returning [].
Variant vs. template. Odoo eCommerce shows product.template on listing pages but adds product.product (the variant) to order lines. The matrix is built at the variant level to preserve signal from size/colour splits; the API converts back to templates before returning. This handles the common case where “blue T-shirt” and “red T-shirt” should both contribute to and draw from the same recommendation context.
Score floor. The score > 0.05 threshold in the storage step discards weakly correlated products. For a B2B catalogue with very sparse purchase patterns (many niche products bought by few customers), lower this to 0.02. For a high-volume B2C shop, raise it to 0.10 to keep recommendations tight.
Step 6: A/B Testing the Widget
Split traffic at the session level using Odoo’s built-in A/B testing infrastructure (website.page) or a simple cookie-based gate:
@http.route('/shop/recommendations/<int:product_id>', ...)
def get_recommendations(self, product_id, limit=6):
# A/B gate: 50% of sessions see recommendations
session_key = request.session.sid
show_widget = int(hashlib.md5(session_key.encode()).hexdigest(), 16) % 2 == 0
if not show_widget:
return {'recommendations': [], 'ab_group': 'control'}
# ... existing logic ...
return {'recommendations': [...], 'ab_group': 'treatment'}
Track the primary metric — average order value — per session via Odoo’s sale.order data. After four weeks, compare mean AOV for control vs. treatment sessions using a Welch t-test. Typical results: a 6–12% lift in AOV for catalogues with moderate diversity (200+ active products). Narrowly focused catalogues (single product category) see smaller lifts because the co-purchase signal is weaker.
Secondary metrics to track: click-through rate on recommendation items, add-to-cart rate from the widget, and revenue attributed to recommendation-driven line items.
Gotchas
Bulk orders skew the matrix. A single large B2B order that contains 40 different products will create spurious correlations between those products. Consider capping order size: exclude orders with more than 25 line items from the trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… data, or weight co-occurrence by the inverse of order size.
Promotional bundles create false signals. If your team frequently bundles unrelated products for promotions, those bundles will look like natural co-purchases. Filter orders where a discount code was applied, or maintain a list of “bundle order IDs” to exclude.
Returns reduce but don’t eliminate signal. An order that was fully refunded still appears in sale.order with state done. Query account.move to identify fully reversed orders and exclude their sale.order IDs from the trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… set.
At Trobz, we’ve deployed collaborative filtering recommendation engines on Odoo eCommerce instances ranging from 500 to 50,000 active SKUs. The approach above scales further than most teams expect — if you’re evaluating whether your order history has enough signal to make it work, reach out and we’ll run the numbers together.