← Back to Blog
AI by

Adding Semantic Search to Your Odoo Product Catalogue with field_vector

Adding Semantic Search to Your Odoo Product Catalogue with field_vector

Keyword search breaks down when your sales reps don't know exact product names. Here's how to add semantic natural-language search to your Odoo product catalogue using the field_vector OCA module — no extra database required.

Key Takeaways: The field_vector OCA module adds a native vector field to Odoo backed by pgvector in PostgreSQL — no separate vector databaseA database optimized for storing and querying high-dimensional embedding vectors. Used in RAG and semantic search to find documents or data points most similar to a query vector. Examples: Pinecone,…, no new infrastructure. You embed product descriptions once, store the vectors on product.template, and query them with a single SQL call. The search endpoint is a standard Odoo JSON-RPC controller; hooking it into the Sales order line takes about 40 lines of XML and Python. The main ongoing cost is keeping embeddings in sync when product descriptions change — plan for that before you ship.


A product catalogue with 2,000 SKUs is manageable. One with 15,000 is where keyword search quietly starts failing. Sales reps type “waterproof adhesive for outdoor use” and get zero results because your internal name is “Marine Epoxy Sealant 500ml.” Customers describe what they need; your naming conventions describe something else.

The fix isn’t better naming conventions. It’s search that understands meaning rather than matching characters. That’s what vector similarity gives you — and with the field_vector OCA module, you can build it directly on top of the PostgreSQL instance your Odoo already uses.

This post walks through the full implementation: adding a vector field to product.template, generating and storing embeddings, building the search endpoint, and wiring it into the Sales order line UI. If you haven’t read the Introduction to the field_vector OCA Module for Odoo yet, start there — this post assumes you know what embeddings are and why pgvector stores them efficiently.

Prerequisites

Before writing any code, make sure you have:

  • pgvector installed in your PostgreSQL instance. On Debian/Ubuntu: apt install postgresql-15-pgvector. Most cloud-managed Postgres providers (RDS, Cloud SQL, Supabase) support it — check your version. On Odoo.sh, you’ll need to request pgvector support via a ticket.
  • field_vector from OCA/server-tools installed on your Odoo 19.0 instance. Clone the repo, add it to your addons path, and enable the module in Settings > Apps.
  • An embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or…. We’ll use a local sentence-transformers service running all-MiniLM-L6-v2 — 384 dimensions, fast inferenceThe process of using a trained model to generate predictions or outputs on new data. Unlike training (which is computationally intensive), inference is typically faster and is the production-time… on CPU, no data leaves your server. If you prefer an API, OpenAI’s text-embedding-3-small is a drop-in replacement; just swap the HTTP call.

⚠️ The field_vector module’s Odoo 19.0 migration is tracked in OCA/server-tools PR #3430. At time of writing, pin to that branch and test against your Odoo version before deploying.

Step 1 — Add a Vector Field to product.template

Create a custom module (or add to an existing one). Two files matter here: the field definition and a pre-migration script that activates the pgvector extension before Odoo tries to create the column.

# models/product_template.py
from odoo import fields, models


class ProductTemplate(models.Model):
    _inherit = "product.template"

    description_embedding = fields.Vector(
        string="Description Embedding",
        dimensions=384,       # must match your embedding model exactly
        index_method="hnsw",  # faster queries; use "ivfflat" for >100k products
    )

The dimensions parameter is not flexible — it must match your modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or…’s output size. all-MiniLM-L6-v2 outputs 384. OpenAI’s text-embedding-3-small outputs 1536. A mismatch causes silent failures at insert time, which is annoying to debug.

The index_method choice matters at scale. HNSW uses more memory but delivers fast approximate nearest-neighbor queries. IVFFlat is cheaper to build and handles larger catalogues better, but it requires a trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… step to configure the number of lists (lists parameter). For most product catalogues under 100k records, start with HNSW.

The pre-migration script ensures pgvector is available before Odoo creates the column type:

# migrations/19.0.1.0.1/pre-migrate.py
def migrate(cr, version):
    cr.execute("CREATE EXTENSION IF NOT EXISTS vector;")

Without this, the module install will fail with a cryptic Postgres error about an unknown type.

Step 2 — Index Product Descriptions

Embeddings don’t generate themselves. You need a process that reads product records from Odoo, sends the text to an embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or…, and writes the result back. The cleanest approach for a production setup is an external indexer script that talks to Odoo over XML-RPC and to the embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… over HTTP.

First, the embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… sidecar — a minimal FastAPI service wrapping sentence-transformers:

# embed_service.py — run with: uvicorn embed_service:app --port 8001
from fastapi import FastAPI
from pydantic import BaseModel
from sentence_transformers import SentenceTransformer

app = FastAPI()
model = SentenceTransformer("all-MiniLM-L6-v2")

class TextIn(BaseModel):
    text: str

@app.post("/embed")
def embed(body: TextIn):
    return {"embedding": model.encode(body.text).tolist()}

Then the indexer:

# index_products.py
import xmlrpc.client
import requests

ODOO_URL = "http://localhost:8069"
ODOO_DB = "your_db"
ODOO_USER = "admin"
ODOO_PASSWORD = "your_password"
EMBED_URL = "http://localhost:8001/embed"


def get_embedding(text: str) -> list[float]:
    resp = requests.post(EMBED_URL, json={"text": text}, timeout=30)
    resp.raise_for_status()
    return resp.json()["embedding"]


def main():
    common = xmlrpc.client.ServerProxy(f"{ODOO_URL}/xmlrpc/2/common")
    uid = common.authenticate(ODOO_DB, ODOO_USER, ODOO_PASSWORD, {})
    models = xmlrpc.client.ServerProxy(f"{ODOO_URL}/xmlrpc/2/object")

    # Only index products that don't have an embedding yet
    product_ids = models.execute_kw(
        ODOO_DB, uid, ODOO_PASSWORD,
        "product.template", "search",
        [[["description_embedding", "=", False], ["active", "=", True]]],
    )
    products = models.execute_kw(
        ODOO_DB, uid, ODOO_PASSWORD,
        "product.template", "read",
        [product_ids],
        {"fields": ["id", "name", "description_sale"]},
    )

    for product in products:
        text = f"{product['name']}. {product.get('description_sale') or ''}".strip()
        embedding = get_embedding(text)
        models.execute_kw(
            ODOO_DB, uid, ODOO_PASSWORD,
            "product.template", "write",
            [[product["id"]], {"description_embedding": embedding}],
        )
        print(f"Indexed: {product['name']}")


if __name__ == "__main__":
    main()

We concatenate name and description_sale because description_sale is what sales teams actually populate — it contains the language that matches customer queries better than internal part names. Products with no description_sale still get indexed on name alone.

Run this script once to build your initial index, then schedule it nightly as a cron job or an Odoo ir.cron action. Plan for the re-indexing problem before you deploy: products get renamed, descriptions get updated, and your index drifts. A nightly full pass is the blunt solution; a write-hook on product.template is more precise.

Step 3 — Build the Search Endpoint

The search controller lives in Odoo. It receives a query string, embeds it on the fly, and returns the top matching products via a pgvector similarity query.

# controllers/product_search.py
import requests
from odoo import http
from odoo.http import request


class ProductSemanticSearch(http.Controller):

    EMBED_URL = "http://localhost:8001/embed"

    @http.route(
        "/api/product/semantic-search",
        type="json",
        auth="user",
        methods=["POST"],
        csrf=False,
    )
    def semantic_search(self, query: str = "", limit: int = 10):
        if not query.strip():
            return {"results": []}

        try:
            resp = requests.post(
                self.EMBED_URL, json={"text": query.strip()}, timeout=10
            )
            resp.raise_for_status()
            query_vec = resp.json()["embedding"]
        except Exception as exc:
            return {"error": str(exc), "results": []}

        cr = request.env.cr
        cr.execute(
            """
            SELECT
                pt.id,
                pt.name,
                pt.description_sale,
                (pt.description_embedding <-> %s::vector) AS distance
            FROM product_template pt
            WHERE pt.active = true
              AND pt.description_embedding IS NOT NULL
              AND (pt.description_embedding <-> %s::vector) < 0.65
            ORDER BY distance ASC
            LIMIT %s
            """,
            (str(query_vec), str(query_vec), min(limit, 20)),
        )
        rows = cr.fetchall()

        return {
            "results": [
                {
                    "id": row[0],
                    "name": row[1],
                    "description": row[2] or "",
                    "score": round(1.0 - float(row[3]), 4),
                }
                for row in rows
            ]
        }

The distance threshold < 0.65 is a starting point. Calibrate it against your actual catalogue — run a few hundred queries you know the right answers to, plot the distance distribution, and pick a cutoff that keeps recall high without flooding results with noise. For technical product catalogues with sparse descriptions, you may need to loosen this to < 0.75. For catalogues with rich text descriptions, < 0.55 often gives cleaner results.

The csrf=False on a type="json" route is intentional. Odoo’s JSON controller validates sessions through its own middleware; adding csrf=True breaks calls from OWL components.

Step 4 — Wire It into the Sales Order Line

The goal: when a sales rep is working on a sale.order.line and doesn’t know the product name, they click a “Find by description” button, type what the customer wants, and pick from semantic searchA search technique that finds results based on meaning and intent rather than exact keyword matches. Semantic search converts queries and documents into embeddings and retrieves the most semantically… results.

Extend the Sales order form with a button next to the product field:

<!-- views/sale_order_views.xml -->
<odoo>
  <record id="view_order_form_inherit_semantic" model="ir.ui.view">
    <field name="name">sale.order.form.semantic.search.inherit</field>
    <field name="model">sale.order</field>
    <field name="inherit_id" ref="sale.view_order_form"/>
    <field name="arch" type="xml">
      <xpath
        expr="//field[@name='order_line']//field[@name='product_id']"
        position="after"
      >
        <button
          name="action_open_semantic_search"
          string="Find by description"
          type="object"
          class="oe_link"
        />
      </xpath>
    </field>
  </record>
</odoo>

Add the action on sale.order.line:

# models/sale_order_line.py
from odoo import models


class SaleOrderLine(models.Model):
    _inherit = "sale.order.line"

    def action_open_semantic_search(self):
        return {
            "type": "ir.actions.act_window",
            "name": "Find Product by Description",
            "res_model": "product.semantic.search.wizard",
            "view_mode": "form",
            "target": "new",
            "context": {"default_order_line_id": self.id},
        }

The wizard handles the query, calls the endpoint, and writes the selected product back to the line:

# wizards/product_semantic_search_wizard.py
from odoo import api, fields, models
from odoo.http import request
import json


class ProductSemanticSearchWizard(models.TransientModel):
    _name = "product.semantic.search.wizard"
    _description = "Semantic Product Search"

    order_line_id = fields.Many2one("sale.order.line", required=True)
    query = fields.Char(string="Describe what you're looking for")
    result_ids = fields.Many2many(
        "product.template",
        "wizard_product_rel",
        "wizard_id",
        "product_id",
        string="Results",
    )
    selected_product_id = fields.Many2one(
        "product.template", string="Select Product"
    )

    def action_search(self):
        if not self.query:
            return
        # Call the controller logic directly — avoid an HTTP round-trip
        controller_result = self.env["ir.http"]._dispatch_ir_rule_free(
            "/api/product/semantic-search",
            query=self.query,
            limit=10,
        )
        # Simpler alternative: replicate the SQL here and skip the controller
        product_ids = [r["id"] for r in (controller_result or {}).get("results", [])]
        self.result_ids = [(6, 0, product_ids)]
        return {"type": "ir.actions.act_window_close"} if not product_ids else {
            "type": "ir.actions.do_nothing"
        }

    def action_confirm(self):
        if self.selected_product_id and self.order_line_id:
            variant = self.selected_product_id.product_variant_ids[:1]
            if variant:
                self.order_line_id.product_id = variant
        return {"type": "ir.actions.act_window_close"}

In practice, calling the internal controller from a wizard is awkward. The simpler path is to copy the similarity SQL into action_search directly — it’s ten lines and avoids the dispatch machinery entirely. Only go through the controller if you need the same endpoint called from external tools or an OWL widget.

What Can Go Wrong

Embeddings drift without warning. Product descriptions change and your index doesn’t automatically update. Add an ORM write hook on product.template to queue re-embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… when description_sale or name changes. The nightly job is your safety net, not your primary mechanism.

Dimension mismatch if you switch models. Every stored embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… is tied to the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… that generated it. Switch from all-MiniLM-L6-v2 to a 1536-dimension modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… and the existing vectors are useless — the column type won’t even accept the new data. Add an embedding_model_version field to product.template so you know when a full re-index is needed and can invalidate stale records programmatically.

No coverage for new products. Records created after the last indexer run have description_embedding = False and don’t appear in semantic searchA search technique that finds results based on meaning and intent rather than exact keyword matches. Semantic search converts queries and documents into embeddings and retrieves the most semantically… results. Either trigger embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… synchronously on create() (acceptable for small catalogues) or set up a job queue (Celery, OCA queue_job) for larger ones.

Semantic searchA search technique that finds results based on meaning and intent rather than exact keyword matches. Semantic search converts queries and documents into embeddings and retrieves the most semantically… isn’t a replacement for keyword search. Part numbers, SKU codes, and EAN barcodes have no semantic content — 12345-A-SL doesn’t embed meaningfully. Keep your existing keyword search in place and present both result sets to the user, or route by query type (numbers → keyword, descriptive text → semantic).

The latency of the sidecar call is real: 50–200ms per query on CPU-only inferenceThe process of using a trained model to generate predictions or outputs on new data. Unlike training (which is computationally intensive), inference is typically faster and is the production-time…. For a UI search-as-you-type box, that’s too slow. For a modal triggered by an explicit button click, it’s fine. Choose your UX accordingly.

At Trobz, we’ve shipped field_vector-based product search on catalogues ranging from 800 to 40,000 SKUs. If you’re exploring this for your own implementation, the tradeoffs around embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… freshness and modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… selection are worth talking through before you commit to an architecture — reach out and mention field_vector.

Ready to put AI to work?

Let's explore how Trobz AI can automate your processes, enhance your ERP, and help your team make better decisions — faster.