Transformers in the browser? A concrete example: semantic search over 3,000+ Odoo modules

How we built a fully client-side engine for 3,000+ Odoo modules with no backend, no API key, no internet required after first load.

Finding the right module used to mean browsing GitHub, reading README files, and relying on knowledge held by experienced team members. We wanted something better: type what you need, get relevant results instantly. So at Trobz, we built a database of all modules from Odoo, OCA, our own Trobz modules generic or custom for projects, and a selection of partner repositories.

The result is odoo-modules.trobz.com: a semantic search engine that runs entirely in your browser, with no backend, no API key, and no internet connection required after the first load. This post explains how it works and why we built it this way.

The Problem with Keyword Search

Traditional search is exact: searching for “bank statement import” won’t find a module described as “reconcile transactions from OFX files”. Synonyms, paraphrases, and domain vocabulary all trip it up.

Semantic search solves this. Instead of matching words, it matches meaning by converting text into numerical vectors (embeddings) and measuring how close two vectors are in space. Queries and documents that mean similar things end up near each other, even if they share no words.

The catch: generating embeddings usually requires a server or an external API (OpenAI, Cohere, etc.). We wanted to try if we could do without: no dependency on an external service, and no privacy concern.

Transformers.js: ML Models in the Browser

Transformers.js is a JavaScript port of the Hugging Face Transformers library. It runs models compiled to ONNX format using the ONNX Runtime Web (WebAssembly), directly in the browser with no Python, no GPU, no server.

Model Used

We use all-MiniLM-L6-v2 (quantized), a sentence embedding model that produces 384-dimensional vectors. It’s fast (~50–200ms per query in-browser), compact (~23 MB), and produces high-quality semantic similarity scores.

Example Code

import { pipeline, env } from "./lib/transformers.min.js";

env.allowLocalModels = true;
env.allowRemoteModels = false;
env.localModelPath = "./model/";
env.backends.onnx.wasm.wasmPaths = "./lib/";

const extractor = await pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2", {
  quantized: true,
});

// Embed a query, runs entirely in the browser
const output = await extractor(["expense management"], {
  pooling: "mean",
  normalize: true,
});

The Architecture

Offline Build (Once, at Deploy Time)

Data: Each module is described in a JSON file with purpose and features fields, written by the team or generated with AI assistance.
Embedding: We run generate_embeddings.js (Node.js + the same MiniLM model) to embed every module, producing three vectors per module: combined, purpose-only, and features-only.
Storage: Embeddings are stored as raw BLOB columns in a SQLite database (sqlite_public.db).
Deploy: The database, model weights, and JS dependencies are uploaded to a static file host. No server-side code whatsoever.

In the Browser (on Every Search)

Load: On first visit, the browser downloads the database (~28 MB) and the model weights (~23 MB). Both are cached by the browser; subsequent visits are instant.
Read: sql.js (SQLite compiled to WASM) reads the database and loads all embedding vectors into Float32Array buffers in memory.
Embed query: The user’s query is embedded in-browser using the same MiniLM model.
Search: Cosine similarity is computed in pure JavaScript, a dot product loop over all vectors (L2-normalized vectors make cosine similarity equivalent to dot product).
Rank: Top 50 results are returned, sorted by score, filtered by org if selected.

The entire search pipeline (embedding, similarity computation, ranking) runs client-side in under 300ms.

Two Separate Search Fields

One insight that improved result quality: modules have two distinct types of text, a short purpose (what the module does) and a list of features (how it does it).

We store separate embeddings for each and expose two search inputs:

Purpose: “What kind of module are you looking for?” e.g. expense management
Features: “What specific behaviour do you need?” e.g. cancel validated expense reports

When both fields are filled, scores are averaged. This lets users progressively narrow results without re-running a single monolithic query.

Pros and Cons

Pros

No backend costs: The search runs entirely in the browser. Hosting is a static file server with no compute, no database server, no API quotas.
No tracking: Queries never leave the user’s device. There are no logs, no analytics on what people search for.
Works offline: After the first load, the tool works with no internet connection, useful in client environments with restricted access.
Scales freely: Every user runs their own search. More users don’t mean more server load.
Reproducible: The model weights are pinned and served locally; results don’t change because an API updated its model.

Cons

First load is heavy: Downloading ~50 MB on first visit (model + database) takes a few seconds on a slow connection. We mitigate this with a step-by-step progress indicator.
Memory usage: Loading all 3,289 module vectors into Float32Array uses ~5 MB of RAM. Acceptable for a desktop browser, but worth monitoring as the catalog grows.
Linear scan: Similarity is computed over all vectors on every search. At 3,289 modules this is fast (~5ms). At 100,000 modules it would need approximate nearest-neighbour indexing (e.g. HNSW). For our scale, brute force is fine.
No incremental updates: Adding a module requires regenerating and redeploying the database. We automate this with GitHub Actions; a push to data/ triggers regeneration and deployment automatically.

What We Learned

The most surprising thing: a 23 MB quantized model running in WASM is good enough for production use. The quality of MiniLM-L6-v2 embeddings on short technical descriptions is comparable to much larger models for this specific use case.

The second insight: separating purpose and features embeddings matters. A combined embedding averages over both, which dilutes specificity. Two separate fields let the model focus on what the user actually cares about.

Try It

The public version is at odoo-modules.trobz.com/search/: 3,289 OCA and Odoo modules, fully offline after the first load.