Key Takeaways: A RAGA technique that enhances LLM responses by retrieving relevant documents from an external knowledge base and including them in the prompt context. RAG reduces hallucinations and enables LLMs to… agent over your product catalogue is useful, but only if the retrieval layer is built properly — naive keyword matching misses too much. This tutorial covers indexing
product.templatedata with pgvector, using a cross-encoder for reranking, prompting the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… to abstain when confidence is low, and wiring everything to Odoo’s chatbot widget. The hardest part isn’t the AI. It’s deciding what goes into the index and what the fallback path looks like.
A sales rep in the middle of a customer call shouldn’t have to search three different places to answer “does this come in a 5-litre drum, and is there a price break for orders over 50 units?” That question sits at the intersection of product variant data, pricing rules, and possibly archived quote notes. The answer exists in Odoo. Getting it fast is the problem.
A retrieval-augmented agent over your product data solves this — but only if the retrieval layer is tight. This tutorial builds it end-to-end: indexing, retrieval, answer generation, and the Odoo chatbot integration. It also covers the part most tutorials skip — what happens when the agent doesn’t have enough context to answer confidently.
The stack: Python, pgvector running in your existing Odoo PostgreSQL database, sentence-transformers for embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic…, a cross-encoder for reranking, and Odoo’s built-in chatbot for the UI layer. No external vector databaseA database optimized for storing and querying high-dimensional embedding vectors. Used in RAG and semantic search to find documents or data points most similar to a query vector. Examples: Pinecone,… required.
Step 1: Index Your Product Data
The index defines what the agent can answer. Build it poorly and you’ll get accurate retrieval on the easy cases and noise on the hard ones.
What to index
Pull from three sources: product.template (base product data), product.pricelist.item (pricing rules), and sale.order.line descriptions from historical quotes. Each needs a different chunkingThe process of splitting a large document into smaller, overlapping or non-overlapping pieces (chunks) before embedding and indexing. Chunk size and overlap are important parameters in RAG pipelines… strategy because they have different structures and different staleness profiles.
For product.template, combine the product name, sales description (description_sale), internal notes (description), category path, and variant attribute values into a single text chunk. Variants matter: if “Widget A” comes in three sizes, each variant’s attributes should be in the chunk so a question about a specific size retrieves the right record.
For pricelists, index each product.pricelist.item as a separate chunk that includes the product name, rule type (fixed, percentage, formula), minimum quantity, and the computed price where it’s deterministic. Formula-based rules are a special case — covered in the gotcha below.
For quote history, the useful signal sits in sale.order.line.name — the description field that sales reps edit manually during quoting. These carry institutional knowledge about how products have been positioned in real deals.
Generating embeddings
from sentence_transformers import SentenceTransformer
import psycopg2
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
def build_product_chunks(conn):
cur = conn.cursor()
cur.execute("""
SELECT
pt.id,
pt.name,
pt.description_sale,
pt.description,
pc.complete_name AS category,
STRING_AGG(DISTINCT pav.name, ', ') AS variants
FROM product_template pt
LEFT JOIN product_category pc ON pc.id = pt.categ_id
LEFT JOIN product_product pp ON pp.product_tmpl_id = pt.id
LEFT JOIN product_template_attribute_line ptal ON ptal.product_tmpl_id = pt.id
LEFT JOIN product_attribute_value pav ON pav.attribute_id = ptal.attribute_id
WHERE pt.active = true AND pt.sale_ok = true
GROUP BY pt.id, pt.name, pt.description_sale, pt.description, pc.complete_name
""")
chunks = []
for row in cur.fetchall():
tmpl_id, name, desc_sale, desc, category, variants = row
text = f"Product: {name}\nCategory: {category or 'Uncategorized'}\n"
if desc_sale:
text += f"Sales description: {desc_sale}\n"
if desc:
text += f"Internal notes: {desc}\n"
if variants:
text += f"Available variants: {variants}\n"
chunks.append({"id": tmpl_id, "type": "product", "text": text})
return chunks
def embed_and_store(conn, chunks):
cur = conn.cursor()
texts = [c["text"] for c in chunks]
embeddings = model.encode(texts, batch_size=64, show_progress_bar=True)
for chunk, embedding in zip(chunks, embeddings):
cur.execute("""
INSERT INTO ai_product_embeddings (source_id, source_type, content, embedding)
VALUES (%s, %s, %s, %s::vector)
ON CONFLICT (source_id, source_type) DO UPDATE
SET content = EXCLUDED.content, embedding = EXCLUDED.embedding
""", (chunk["id"], chunk["type"], chunk["text"], embedding.tolist()))
conn.commit()
The table schema
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE ai_product_embeddings (
id SERIAL PRIMARY KEY,
source_id INTEGER NOT NULL,
source_type VARCHAR(50) NOT NULL,
content TEXT NOT NULL,
embedding vector(384),
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE (source_id, source_type)
);
CREATE INDEX ON ai_product_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
all-MiniLM-L6-v2 produces 384-dimensional vectors. If you switch to all-mpnet-base-v2 (768 dimensions), update the schema. The IVFFlat index works well up to around 100k products. Beyond that, consider HNSW.
⚠️ Gotcha: Pricelist rules that use formula pricing (e.g., “15% above cost”) are not static. If you embed a computed price, it goes stale when the cost changes. For formula-based rules, embed the rule description, not the computed value. Calculate the actual price at query time using Odoo’s
product.pricelist._get_product_price()method.
For an alternative approach using the OCA module to store embeddings directly on Odoo records, see Introduction to the field_vector OCA Module for Odoo.
Step 2: Retrieval with Vector Search and Reranking
Cosine similarity on a single query embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic… is fast. It’s also too coarse for product questions, where the user’s phrasing rarely matches the product description precisely. A question like “something for sealing industrial pipes under 10 bar pressure” might embed near “pipe fittings” without surfacing the specific gasket material that actually matches.
The fix is two-stage retrieval: fetch 20 candidates with pgvector, then rerank with a cross-encoder that scores each candidate against the actual question text.
from sentence_transformers import SentenceTransformer, CrossEncoder
bi_encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
def retrieve(conn, question: str, top_k: int = 5, candidate_pool: int = 20) -> list[dict]:
query_embedding = bi_encoder.encode(question).tolist()
cur = conn.cursor()
cur.execute("""
SELECT source_id, source_type, content,
1 - (embedding <=> %s::vector) AS cosine_similarity
FROM ai_product_embeddings
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (query_embedding, query_embedding, candidate_pool))
candidates = [
{"source_id": row[0], "source_type": row[1], "content": row[2], "score": row[3]}
for row in cur.fetchall()
]
# Rerank with cross-encoder
pairs = [(question, c["content"]) for c in candidates]
rerank_scores = cross_encoder.predict(pairs)
for candidate, score in zip(candidates, rerank_scores):
candidate["rerank_score"] = float(score)
candidates.sort(key=lambda x: x["rerank_score"], reverse=True)
return candidates[:top_k]
The cross-encoder (ms-marco-MiniLM-L-6-v2) scores each question-passage pair jointly rather than independently. It runs in roughly 20-50ms for a batch of 20 candidates on CPU — slow enough that you don’t want to run it on more candidates than necessary. In testing on a 3,000-product catalogue, reranking moved the correct answer from position 4-7 to position 1-2 on roughly 30% of product-specific queries. That’s the difference between a useful agent and a frustrating one.
Step 3: Answer Generation with Faithful Prompting
The modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… should only answer from retrieved context. When the context doesn’t support a confident answer, the agent should say so explicitly. A sales agent that confidently confabulates pricing information is worse than one that escalates — because the sales rep trusts it.
import anthropic
client = anthropic.Anthropic()
SYSTEM_PROMPT = """You are a product knowledge assistant for a sales team. Answer questions about products using only the context provided below.
Rules:
1. Answer only from the provided context. Do not add information from your training knowledge.
2. If the context doesn't contain enough information to answer confidently, say exactly: "I don't have enough information to answer this. I'll escalate to a product specialist."
3. For pricing, state the exact figure from the context. If it's a formula-based rule, describe the rule — do not calculate a number.
4. Keep answers short. Two to four sentences. Sales reps are usually on a call.
5. If the question involves complex configuration or multiple interacting products, flag it for human review."""
def generate_answer(question: str, context_chunks: list[dict]) -> dict:
context = "\n\n---\n\n".join(
f"[{c['source_type'].upper()} #{c['source_id']}]\n{c['content']}"
for c in context_chunks
)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
system=SYSTEM_PROMPT,
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}]
)
answer = response.content[0].text
needs_escalation = (
"I don't have enough information" in answer
or "escalate" in answer.lower()
)
return {
"answer": answer,
"needs_escalation": needs_escalation,
"sources": [{"id": c["source_id"], "type": c["source_type"]} for c in context_chunks]
}
The needs_escalation flag is structural, not cosmetic. The next step uses it to route the conversation.
One failure mode worth flagging: when the context contains conflicting information — two pricelist rules that apply to the same product at the same quantity threshold — the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… tends to pick one and present it as definitive. The safer design is to surface both rules in the answer and let the sales rep interpret them. You can add a check for this before the generation step by looking for multiple product.pricelist.item chunks for the same product in the top-k results.
Step 4: Wiring to Odoo’s Chatbot
Odoo’s im_livechat module has a built-in chatbot system. It’s the right integration point for an internal agent — the UI is already in Discuss, access control works through existing Odoo users, and message history persists in mail.message.
The minimal module structure:
product_support_agent/
├── __manifest__.py
├── models/
│ └── product_agent.py
└── views/
└── chatbot_script.xml
__manifest__.py
{
"name": "Product Support Agent",
"version": "17.0.1.0.0",
"depends": ["product", "im_livechat", "mail"],
"data": ["views/chatbot_script.xml"],
"installable": True,
}
models/product_agent.py
import json
import requests
from odoo import models, api
class ProductAgentStep(models.Model):
_inherit = "im_livechat.chatbot.script.step"
@api.model
def _call_product_agent(self, question: str) -> dict:
"""
POST the question to an external FastAPI microservice running the
retrieval pipeline. Returns the structured response dict.
"""
response = requests.post(
"http://localhost:8001/ask",
json={"question": question},
timeout=15
)
response.raise_for_status()
return response.json()
⚠️ Gotcha: Don’t call the agent synchronously from within an Odoo RPC request if you can avoid it. The cross-encoder adds latency that can stall Odoo workers. Run the retrieval pipeline as a separate HTTP microservice (FastAPI works well here) and call it asynchronously, or use Odoo’s
bus.busfor non-blocking message delivery. Thetimeout=15above is a hard ceiling — tune it based on your hardware.
views/chatbot_script.xml
<odoo>
<record id="product_support_chatbot" model="im_livechat.chatbot.script">
<field name="title">Product Support Agent</field>
</record>
<record id="product_agent_step_greet" model="im_livechat.chatbot.script.step">
<field name="message">Hi! Ask me anything about our product catalogue — pricing, variants, availability.</field>
<field name="step_type">free_input_multi</field>
<field name="sequence">1</field>
<field name="chatbot_script_id" ref="product_support_chatbot"/>
</record>
</odoo>
The dynamic response injection — taking the agent’s answer and posting it back into the chat thread — requires a JavaScript override of im_livechat’s frontend chatbot service. That’s a static/src/js/ customization beyond the scope of this post. The hook to override is _sendMessageChatbot in the livechat service.
Step 5: Escalation and the Human Handoff
The escalation path needs to be as deliberate as the retrieval path. When needs_escalation is true, the agent should do three things:
1. Log the failure. Store the question, the retrieved context that wasn’t sufficient, and a timestamp. These logged failures are your index improvement backlog — each one represents a gap in product descriptions or pricing coverage.
2. Send an honest response. Something like: “I found some related products but I’m not confident this fully answers your question. Let me connect you with a product specialist.”
3. Route to a human. Either through im_livechat’s built-in operator escalation, or by creating a project.task assigned to the product team with the question and context attached.
The logging step is what most implementations skip. Those unanswered questions are the most valuable data you’ll collect. A question the agent couldn’t answer confidently today is a data gap to close tomorrow — usually by adding better descriptions to product.template.description_sale, not by changing the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or….
Track the escalation rate explicitly. If it’s above 15-20% in the first two weeks, the index isn’t covering the actual questions sales reps ask. In our experience, the fix is almost always content, not modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or…. Better product descriptions outperform better models.
Key Takeaways
- Index
product.template, pricelist items, and historical quote descriptions separately — they have different structures and different staleness profiles. - Two-stage retrieval (bi-encoder candidates + cross-encoder reranking) meaningfully improves precision over single-pass cosine similarity, especially for product-specific terminology.
- PromptThe input text provided to an LLM to guide its response. Prompt design — choosing words, structure, and examples — significantly affects output quality. Also referred to as the user message or query. the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… to abstain when the context is insufficient. A confident wrong answer in a sales context causes real damage.
- For formula-based pricing rules, embed the rule description and compute the price at query time. Never embed a stale computed value.
- Escalation rate is the most useful early metric. If it’s high, fix the index content before touching anything else.
At Trobz, we build these agents as part of AI PoC sprints — defining retrieval quality metrics before integrating into Odoo, testing against real product catalogues, and measuring escalation rates in the first week. If you’re building this for your own catalogue, reach out and we can share the evaluation setup we use.