Key Takeaways: Vietnamese expense receipts arrive in three to five distinct formats, and each format has different failure modes for OCR. Treating them as a single problem is why early automation attempts fail. A staged pre-processing pipeline — deskew, binarise, then extract — closes most of the accuracy gap before the LLMA neural network trained on vast amounts of text data to understand and generate human language. LLMs use the Transformer architecture and can perform a wide range of tasks — summarization,… is involved. Integration with
hr.expensein Odoo is straightforward once the extraction schema is fixed; the harder part is the validation layer. After six months, the company’s average expense report submission time dropped from 22 minutes to under 3 minutes per employee.
The finance team at this company wasn’t complaining loudly. They were just quietly spending two hours every morning keying expense receipts into Odoo before anyone else arrived. That’s the kind of inefficiency that doesn’t appear in a ticket system — it becomes invisible because people adapt to it.
The company runs professional services across Vietnam, 200 employees, heavy on travel and client entertainment. Around 1,800 receipts a month. Three finance staff handling the manual data entry. The receipts arrived via email, WhatsApp, and a shared Dropbox, in no particular order, sometimes as PDFs, often as phone photos, occasionally as crumpled paper scans from the printer in the Hanoi office.
The Variety Problem Is the Real Problem
Most vendor demos for document AI show clean, flat, well-lit PDFs. That’s not what expense management looks like in practice.
The actual receipt types this company dealt with:
- Hóa đơn GTGT (Vietnamese VAT invoices) — issued by registered vendors, structured with fixed fields (tax code, buyer info, line items, VAT amount). In theory the most machine-readable. In practice, the older printed versions use inconsistent fonts and the carbon copy quality is often poor.
- Restaurant and café receipts — thermal paper, often faded, frequently photographed at an angle. No standardised layout. Some in Vietnamese, some mixed Vietnamese/English, some from international chains in English only.
- Taxi and ride-hailing receipts — Grab and Be receipts are digital PDFs and extract cleanly. Traditional taxi meters produce small thermal slips, often creased.
- Phone photos of paper receipts — the most common format and the messiest. Variable lighting, perspective distortion, fingers in the frame, motion blur.
- Manual receipts — handwritten totals on pre-printed pads. Common for small vendors, wet markets, and informal services.
The mix matters because the right tool for a structured GTGT invoice is wrong for a handwritten receipt. Running everything through a single OCR pass produces plausible-looking output that’s wrong in unpredictable ways — which is worse than failing loudly, because it passes downstream validation until it hits accounting reconciliation.
The Pre-Processing Pipeline
Before any OCR or LLMA neural network trained on vast amounts of text data to understand and generate human language. LLMs use the Transformer architecture and can perform a wide range of tasks — summarization,… involvement, each uploaded file goes through a pre-processing stage. This step has a larger effect on final accuracy than modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… selection.
Step 1: Format normalisation. All inputs are converted to TIFF at 300dpi. PDFs are rendered page by page. Multi-page documents are split — most receipts are single-page, but some GTGT invoices arrive as two-page PDFs.
Step 2: Orientation detection and deskew. Phone photos in portrait mode frequently arrive rotated 90 degrees. A simple EXIF read handles most rotation issues. Deskew (correcting tilt up to ~15 degrees) is applied using a Hough line transform. This alone improved downstream OCR accuracy on phone photos by roughly 12 percentage points in our testing.
Step 3: Binarisation. Adaptive thresholding handles the common problem of uneven lighting across a phone photo — bright near a window, dark in the corners. It also helps with low-contrast thermal receipt fading. Standard Otsu thresholding works for scanned documents but fails on photos with gradient lighting.
Step 4: Receipt type classification. A small image classifier (fine-tuned MobileNetV3) categorises each document into one of the five receipt types above before OCR runs. This determines which extraction promptThe input text provided to an LLM to guide its response. Prompt design — choosing words, structure, and examples — significantly affects output quality. Also referred to as the user message or query. is used downstream. The classifier runs in under 200ms per image and reaches ~93% accuracy on the production dataset.
Extraction by Receipt Type
Classification drives promptThe input text provided to an LLM to guide its response. Prompt design — choosing words, structure, and examples — significantly affects output quality. Also referred to as the user message or query. selection. There’s no universal extraction promptThe input text provided to an LLM to guide its response. Prompt design — choosing words, structure, and examples — significantly affects output quality. Also referred to as the user message or query. that works across all five types — any attempt to write one produces mediocre results everywhere.
GTGT invoices use a structured extraction promptThe input text provided to an LLM to guide its response. Prompt design — choosing words, structure, and examples — significantly affects output quality. Also referred to as the user message or query. that maps directly to mandatory Vietnamese tax invoice fields: seller tax code, buyer tax code, invoice serial, issue date, line items (description, quantity, unit, unit price, VAT rate, VAT amount), and totals. The LLMA neural network trained on vast amounts of text data to understand and generate human language. LLMs use the Transformer architecture and can perform a wide range of tasks — summarization,… is instructed to return a JSON object matching a fixed schema. Post-extraction, the seller tax code is validated against the Ministry of Finance’s publicly accessible tax registration database.
Thermal receipts (restaurant, taxi paper) use a looser promptThe input text provided to an LLM to guide its response. Prompt design — choosing words, structure, and examples — significantly affects output quality. Also referred to as the user message or query. focused on merchant name, date, and total amount. Line items are extracted opportunistically — if present and legible, great; if not, only the total is captured. Accuracy on totals is ~97%. Line-item accuracy on faded thermal paper is around 68%, which is why we don’t report it as a primary metric.
Grab/Be PDF receipts skip OCR entirely. They’re parsed as structured PDFs using pdfplumber, which gives near-perfect extraction. The only failure mode is PDF obfuscation (some versions of the Grab receipt PDF use text rendering that confuses most parsers — requires a fallback to OCR).
Handwritten receipts get flagged for human review after a low-confidence extraction attempt. The pipeline extracts what it can (usually total amount, sometimes date) and routes the result to the Odoo approval queue with a thumbnail of the original document. An accountant confirms or corrects the values. This is not a failure state — it’s the designed outcome for a document type that automated extraction genuinely struggles with.
Integration with Odoo Expense
The extraction output is a normalised JSON payload. The integration layer maps this to hr.expense fields and creates the record via Odoo’s JSON-RPC API.
expense_vals = {
"name": extracted["merchant_name"],
"date": extracted["date"],
"total_amount": extracted["total_amount"],
"product_id": classify_product(extracted), # maps to expense category
"employee_id": resolve_employee(submitted_by),
"company_id": 1,
"currency_id": VND_CURRENCY_ID,
"attachment_ids": [(4, attachment_id)],
"ref": extracted.get("invoice_serial", ""),
}
expense_id = models.execute_kw(db, uid, password,
"hr.expense", "create", [expense_vals])
The classify_product function maps merchant category codes and receipt types to the company’s Odoo expense product list. Restaurant receipts map to “Client Entertainment”. Taxi receipts map to “Local Transport”. GTGT invoices from registered vendors go to “Vendor Services” or “Office Supplies” based on a keyword match against the line item descriptions.
This is where the most configuration time went. The product mapping is the part no vendor demo shows you because it requires knowing the client’s actual expense categories, VAT treatment rules, and the edge cases their finance team has been handling manually for years.
The Approval Workflow
The existing Odoo expense approval workflow was left largely intact. The Document AI pipeline doesn’t bypass human review — it just eliminates the data entry step that preceded it.
High-confidence extractions create a draft hr.expense record and notify the employee via email: “We’ve pre-filled your expense for [merchant] on [date] — please review and submit.” The employee checks the auto-filled form, corrects anything wrong, and submits. The manager approves the expense report as usual.
Low-confidence extractions (confidence score below 0.75, or any handwritten document) create a draft with flagged fields highlighted in yellow and route to the finance team’s review queue first. Finance confirms the extraction before it reaches the employee’s inbox.
About 18% of receipts go through the finance review queue. That sounds high, but it’s a significant improvement over the baseline of 100%.
What Actually Changed
Six months after go-live:
- Average employee time per expense submission: 22 minutes → 2.8 minutes. The remaining time is the review-and-submit step, which the team doesn’t want to eliminate.
- Finance team data entry time: 3 staff × 2 hours/day → roughly 25 minutes/day across the team, handling edge cases and the 18% review queue.
- Month-end close: the expense accrual process that previously required chasing outstanding receipts for three days now closes in a few hours because most receipts hit Odoo on the day they’re incurred.
- Error rate on expense amounts: down from ~4% (as estimated by the finance team based on monthly correction volume) to under 0.5%.
The 0.5% error rate is not zero. Thermal receipts with severe fading still occasionally extract the wrong total, and handwritten amounts are sometimes ambiguous even to a human reviewer. The system flags uncertainty rather than hiding it, which means the errors that do occur are visible rather than buried in reconciliation.
At Trobz, we’ve deployed Document AI pipelines for expense, invoice, and logistics document processing across several SEA clients — if you’re running similar volumes and want to understand what the pipeline looks like for your document mix, get in touch at [email protected].