Key Takeaways: The
field_vectorOCA module adds a native vector field to Odoo backed bypgvectorin PostgreSQL — no separate vector databaseA database optimized for storing and querying high-dimensional embedding vectors. Used in RAG and semantic search to find documents or data points most similar to a query vector. Examples: Pinecone,…, no new infrastructure. You embed product descriptions once, store the vectors onproduct.template, and query them with a single SQL call. The search endpoint is a standard Odoo JSON-RPC controller; hooking it into the Sales order line takes about 40 lines of XML and Python. The main ongoing cost is keeping embeddings in sync when product descriptions change — plan for that before you ship.
A product catalogue with 2,000 SKUs is manageable. One with 15,000 is where keyword search quietly starts failing. Sales reps type “waterproof adhesive for outdoor use” and get zero results because your internal name is “Marine Epoxy Sealant 500ml.” Customers describe what they need; your naming conventions describe something else.
The fix isn’t better naming conventions. It’s search that understands meaning rather than matching characters. That’s what vector similarity gives you — and with the field_vector OCA module, you can build it directly on top of the PostgreSQL instance your Odoo already uses.
This post walks through the full implementation: adding a vector field to product.template, generating and storing embeddings, building the search endpoint, and wiring it into the Sales order line UI. If you haven’t read the Introduction to the field_vector OCA Module for Odoo yet, start there — this post assumes you know what embeddings are and why pgvector stores them efficiently.
Prerequisites
Before writing any code, make sure you have:
- pgvector installed in your PostgreSQL instance. On Debian/Ubuntu:
apt install postgresql-15-pgvector. Most cloud-managed Postgres providers (RDS, Cloud SQL, Supabase) support it — check your version. On Odoo.sh, you’ll need to request pgvector support via a ticket. - field_vector from
OCA/server-toolsinstalled on your Odoo 19.0 instance. Clone the repo, add it to your addons path, and enable the module inSettings > Apps. - An embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or…. We’ll use a local
sentence-transformersservice runningall-MiniLM-L6-v2— 384 dimensions, fast inferenceThe process of using a trained model to generate predictions or outputs on new data. Unlike training (which is computationally intensive), inference is typically faster and is the production-time… on CPU, no data leaves your server. If you prefer an API, OpenAI’stext-embedding-3-smallis a drop-in replacement; just swap the HTTP call.
⚠️ The
field_vectormodule’s Odoo 19.0 migration is tracked in OCA/server-tools PR #3430. At time of writing, pin to that branch and test against your Odoo version before deploying.
Step 1 — Add a Vector Field to product.template
Create a custom module (or add to an existing one). Two files matter here: the field definition and a pre-migration script that activates the pgvector extension before Odoo tries to create the column.
# models/product_template.py
from odoo import fields, models
class ProductTemplate(models.Model):
_inherit = "product.template"
description_embedding = fields.Vector(
string="Description Embedding",
dimensions=384, # must match your embedding model exactly
index_method="hnsw", # faster queries; use "ivfflat" for >100k products
)
The dimensions parameter is not flexible — it must match your modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or…’s output size. all-MiniLM-L6-v2 outputs 384. OpenAI’s text-embedding-3-small outputs 1536. A mismatch causes silent failures at insert time, which is annoying to debug.
The index_method choice matters at scale. HNSW uses more memory but delivers fast approximate nearest-neighbor queries. IVFFlat is cheaper to build and handles larger catalogues better, but it requires a trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… step to configure the number of lists (lists parameter). For most product catalogues under 100k records, start with HNSW.
The pre-migration script ensures pgvector is available before Odoo creates the column type:
# migrations/19.0.1.0.1/pre-migrate.py
def migrate(cr, version):
cr.execute("CREATE EXTENSION IF NOT EXISTS vector;")
Without this, the module install will fail with a cryptic Postgres error about an unknown type.
Step 2 — Index Product Descriptions
Embeddings don’t generate themselves. You need a process that reads product records from Odoo, sends the text to an embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or…, and writes the result back. The cleanest approach for a production setup is an external indexer script that talks to Odoo over XML-RPC and to the embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… over HTTP.
First, the embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… sidecar — a minimal FastAPI service wrapping sentence-transformers:
# embed_service.py — run with: uvicorn embed_service:app --port 8001
from fastapi import FastAPI
from pydantic import BaseModel
from sentence_transformers import SentenceTransformer
app = FastAPI()
model = SentenceTransformer("all-MiniLM-L6-v2")
class TextIn(BaseModel):
text: str
@app.post("/embed")
def embed(body: TextIn):
return {"embedding": model.encode(body.text).tolist()}
Then the indexer:
# index_products.py
import xmlrpc.client
import requests
ODOO_URL = "http://localhost:8069"
ODOO_DB = "your_db"
ODOO_USER = "admin"
ODOO_PASSWORD = "your_password"
EMBED_URL = "http://localhost:8001/embed"
def get_embedding(text: str) -> list[float]:
resp = requests.post(EMBED_URL, json={"text": text}, timeout=30)
resp.raise_for_status()
return resp.json()["embedding"]
def main():
common = xmlrpc.client.ServerProxy(f"{ODOO_URL}/xmlrpc/2/common")
uid = common.authenticate(ODOO_DB, ODOO_USER, ODOO_PASSWORD, {})
models = xmlrpc.client.ServerProxy(f"{ODOO_URL}/xmlrpc/2/object")
# Only index products that don't have an embedding yet
product_ids = models.execute_kw(
ODOO_DB, uid, ODOO_PASSWORD,
"product.template", "search",
[[["description_embedding", "=", False], ["active", "=", True]]],
)
products = models.execute_kw(
ODOO_DB, uid, ODOO_PASSWORD,
"product.template", "read",
[product_ids],
{"fields": ["id", "name", "description_sale"]},
)
for product in products:
text = f"{product['name']}. {product.get('description_sale') or ''}".strip()
embedding = get_embedding(text)
models.execute_kw(
ODOO_DB, uid, ODOO_PASSWORD,
"product.template", "write",
[[product["id"]], {"description_embedding": embedding}],
)
print(f"Indexed: {product['name']}")
if __name__ == "__main__":
main()
We concatenate name and description_sale because description_sale is what sales teams actually populate — it contains the language that matches customer queries better than internal part names. Products with no description_sale still get indexed on name alone.
Run this script once to build your initial index, then schedule it nightly as a cron job or an Odoo ir.cron action. Plan for the re-indexing problem before you deploy: products get renamed, descriptions get updated, and your index drifts. A nightly full pass is the blunt solution; a write-hook on product.template is more precise.
Step 3 — Build the Search Endpoint
The search controller lives in Odoo. It receives a query string, embeds it on the fly, and returns the top matching products via a pgvector similarity query.
# controllers/product_search.py
import requests
from odoo import http
from odoo.http import request
class ProductSemanticSearch(http.Controller):
EMBED_URL = "http://localhost:8001/embed"
@http.route(
"/api/product/semantic-search",
type="json",
auth="user",
methods=["POST"],
csrf=False,
)
def semantic_search(self, query: str = "", limit: int = 10):
if not query.strip():
return {"results": []}
try:
resp = requests.post(
self.EMBED_URL, json={"text": query.strip()}, timeout=10
)
resp.raise_for_status()
query_vec = resp.json()["embedding"]
except Exception as exc:
return {"error": str(exc), "results": []}
cr = request.env.cr
cr.execute(
"""
SELECT
pt.id,
pt.name,
pt.description_sale,
(pt.description_embedding <-> %s::vector) AS distance
FROM product_template pt
WHERE pt.active = true
AND pt.description_embedding IS NOT NULL
AND (pt.description_embedding <-> %s::vector) < 0.65
ORDER BY distance ASC
LIMIT %s
""",
(str(query_vec), str(query_vec), min(limit, 20)),
)
rows = cr.fetchall()
return {
"results": [
{
"id": row[0],
"name": row[1],
"description": row[2] or "",
"score": round(1.0 - float(row[3]), 4),
}
for row in rows
]
}
The distance threshold < 0.65 is a starting point. Calibrate it against your actual catalogue — run a few hundred queries you know the right answers to, plot the distance distribution, and pick a cutoff that keeps recall high without flooding results with noise. For technical product catalogues with sparse descriptions, you may need to loosen this to < 0.75. For catalogues with rich text descriptions, < 0.55 often gives cleaner results.
The csrf=False on a type="json" route is intentional. Odoo’s JSON controller validates sessions through its own middleware; adding csrf=True breaks calls from OWL components.
Step 4 — Wire It into the Sales Order Line
The goal: when a sales rep is working on a sale.order.line and doesn’t know the product name, they click a “Find by description” button, type what the customer wants, and pick from semantic searchA search technique that finds results based on meaning and intent rather than exact keyword matches. Semantic search converts queries and documents into embeddings and retrieves the most semantically… results.
Extend the Sales order form with a button next to the product field:
<!-- views/sale_order_views.xml -->
<odoo>
<record id="view_order_form_inherit_semantic" model="ir.ui.view">
<field name="name">sale.order.form.semantic.search.inherit</field>
<field name="model">sale.order</field>
<field name="inherit_id" ref="sale.view_order_form"/>
<field name="arch" type="xml">
<xpath
expr="//field[@name='order_line']//field[@name='product_id']"
position="after"
>
<button
name="action_open_semantic_search"
string="Find by description"
type="object"
class="oe_link"
/>
</xpath>
</field>
</record>
</odoo>
Add the action on sale.order.line:
# models/sale_order_line.py
from odoo import models
class SaleOrderLine(models.Model):
_inherit = "sale.order.line"
def action_open_semantic_search(self):
return {
"type": "ir.actions.act_window",
"name": "Find Product by Description",
"res_model": "product.semantic.search.wizard",
"view_mode": "form",
"target": "new",
"context": {"default_order_line_id": self.id},
}
The wizard handles the query, calls the endpoint, and writes the selected product back to the line:
# wizards/product_semantic_search_wizard.py
from odoo import api, fields, models
from odoo.http import request
import json
class ProductSemanticSearchWizard(models.TransientModel):
_name = "product.semantic.search.wizard"
_description = "Semantic Product Search"
order_line_id = fields.Many2one("sale.order.line", required=True)
query = fields.Char(string="Describe what you're looking for")
result_ids = fields.Many2many(
"product.template",
"wizard_product_rel",
"wizard_id",
"product_id",
string="Results",
)
selected_product_id = fields.Many2one(
"product.template", string="Select Product"
)
def action_search(self):
if not self.query:
return
# Call the controller logic directly — avoid an HTTP round-trip
controller_result = self.env["ir.http"]._dispatch_ir_rule_free(
"/api/product/semantic-search",
query=self.query,
limit=10,
)
# Simpler alternative: replicate the SQL here and skip the controller
product_ids = [r["id"] for r in (controller_result or {}).get("results", [])]
self.result_ids = [(6, 0, product_ids)]
return {"type": "ir.actions.act_window_close"} if not product_ids else {
"type": "ir.actions.do_nothing"
}
def action_confirm(self):
if self.selected_product_id and self.order_line_id:
variant = self.selected_product_id.product_variant_ids[:1]
if variant:
self.order_line_id.product_id = variant
return {"type": "ir.actions.act_window_close"}
In practice, calling the internal controller from a wizard is awkward. The simpler path is to copy the similarity SQL into action_search directly — it’s ten lines and avoids the dispatch machinery entirely. Only go through the controller if you need the same endpoint called from external tools or an OWL widget.
What Can Go Wrong
Embeddings drift without warning. Product descriptions change and your index doesn’t automatically update. Add an ORM write hook on product.template to queue re-embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… when description_sale or name changes. The nightly job is your safety net, not your primary mechanism.
Dimension mismatch if you switch models. Every stored embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… is tied to the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… that generated it. Switch from all-MiniLM-L6-v2 to a 1536-dimension modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… and the existing vectors are useless — the column type won’t even accept the new data. Add an embedding_model_version field to product.template so you know when a full re-index is needed and can invalidate stale records programmatically.
No coverage for new products. Records created after the last indexer run have description_embedding = False and don’t appear in semantic searchA search technique that finds results based on meaning and intent rather than exact keyword matches. Semantic search converts queries and documents into embeddings and retrieves the most semantically… results. Either trigger embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… synchronously on create() (acceptable for small catalogues) or set up a job queue (Celery, OCA queue_job) for larger ones.
Semantic searchA search technique that finds results based on meaning and intent rather than exact keyword matches. Semantic search converts queries and documents into embeddings and retrieves the most semantically… isn’t a replacement for keyword search. Part numbers, SKU codes, and EAN barcodes have no semantic content — 12345-A-SL doesn’t embed meaningfully. Keep your existing keyword search in place and present both result sets to the user, or route by query type (numbers → keyword, descriptive text → semantic).
The latency of the sidecar call is real: 50–200ms per query on CPU-only inferenceThe process of using a trained model to generate predictions or outputs on new data. Unlike training (which is computationally intensive), inference is typically faster and is the production-time…. For a UI search-as-you-type box, that’s too slow. For a modal triggered by an explicit button click, it’s fine. Choose your UX accordingly.
At Trobz, we’ve shipped field_vector-based product search on catalogues ranging from 800 to 40,000 SKUs. If you’re exploring this for your own implementation, the tradeoffs around embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… freshness and modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… selection are worth talking through before you commit to an architecture — reach out and mention field_vector.