Key Takeaways: Bank reconciliation is one of the highest-ROI targets for an Odoo AI agentAn AI system that autonomously perceives its environment, reasons about goals, and takes actions to achieve them — often through multiple steps and tool calls. Unlike a simple chatbot, an agent can… because the matching patterns are learnable and the failure modes are bounded. A well-designed matching hierarchy handles 85–95% of lines automatically. The remaining 5–15% go to a human review queue — and that queue is where the agent adds value by surfacing context, not just flagging uncertainty. Confidence thresholds are not magic numbers; they’re business decisions expressed as code. Every match, automated or human, leaves an audit trail that makes the system improvable over time.
Bank statements don’t reconcile themselves. A busy mid-size company processes 200–500 bank statement lines per month, and a finance team member working through them manually — matching a VND 51,750,000 transfer to INV/2026/00342, hunting for a split payment, checking whether that duplicate-looking entry is actually two separate subscriptions — takes 3–4 hours. Every month.
The Bank Reconciliation Agent cuts that to under 20 minutes. Not by being clever in the LLMA neural network trained on vast amounts of text data to understand and generate human language. LLMs use the Transformer architecture and can perform a wide range of tasks — summarization,… sense — by being systematic in ways humans aren’t.
This post covers the matching logic in detail: how the hierarchy works, what confidence scoring looks like in practice, how to calibrate thresholds for your specific payment behavior, and how the human review queue is structured to keep an accountant productive rather than confused.
Why Bank Reconciliation Is the Right Problem for an Agent
Most AI agentAn AI system that autonomously perceives its environment, reasons about goals, and takes actions to achieve them — often through multiple steps and tool calls. Unlike a simple chatbot, an agent can… projects fail because the problem is too open-ended. Reconciliation succeeds because the decision space is bounded:
- Either a bank statement line matches an
account.move(invoice, bill, payment) in Odoo, or it doesn’t - The signals are structured: amount, reference string, date, partner
- The cost of a wrong match is recoverable — Odoo preserves original entries and a mis-reconciled line can be unreconciled in seconds
That structure means you can build a matching hierarchy with deterministic rules first, probabilistic scoring second, and human judgment last. The agent doesn’t need to reason about reconciliation the way an LLMA neural network trained on vast amounts of text data to understand and generate human language. LLMs use the Transformer architecture and can perform a wide range of tasks — summarization,… reasons about open-ended text. It needs to apply a scoring function well and know when it’s uncertain.
This is also what the AI revolution actually looks like in practice — not a general intelligence replacing your accountant, but a structured decision system handling the routine 90% so your accountant can focus on the edge cases that actually require judgment.
The Matching Hierarchy
The agent works through four matching strategies in order, stopping as soon as it finds a result above the auto-reconcile threshold.
Level 1: Exact Amount + Reference
The cleanest case. The bank line carries an amount of 1,500.00 and a reference of INV/2026/00198. Odoo has an open account.move with that exact reference and a matching outstanding balance.
def find_exact_match(st_line, open_moves):
"""
Level 1: Exact amount and reference match.
Returns the move and confidence=1.0 if found, else None.
"""
amount = st_line.amount
ref = (st_line.payment_ref or "").strip().upper()
for move in open_moves:
move_ref = (move.ref or move.name or "").strip().upper()
if abs(move.amount_residual + amount) < 0.01 and move_ref == ref:
return move, 1.0
return None, 0.0
account.bank.statement.line.amount is signed: positive for incoming funds (credit to bank), negative for outgoing. account.move.amount_residual carries the same sign convention from the journal entry side. The + amount check — rather than comparing absolutes — handles this correctly without a sign flip.
Gotcha: References are not always clean. A payment arriving from a Vietnamese bank might read THANH TOAN HOA DON INV/2026/00198 CONG TY ABC THANG 04. Exact matching misses this entirely. That’s what Level 2 handles.
Level 2: Exact Amount + Fuzzy Reference
Same amount requirement, but the reference is extracted from the bank description using a regex rather than compared directly.
import re
INVOICE_REF_PATTERN = re.compile(
r'\b(INV|BILL|REFUND|BNK)/\d{4}/\d{4,6}\b',
re.IGNORECASE
)
def find_fuzzy_ref_match(st_line, open_moves):
"""
Level 2: Exact amount, extracted reference from payment description.
"""
amount = st_line.amount
description = st_line.payment_ref or st_line.narration or ""
extracted_refs = INVOICE_REF_PATTERN.findall(description)
if not extracted_refs:
return None, 0.0
for move in open_moves:
move_ref = (move.name or "").strip().upper()
for ref in extracted_refs:
if move_ref == ref.upper() and abs(move.amount_residual + amount) < 0.01:
return move, 0.92 # High but not 1.0 — ref was extracted, not direct
return None, 0.0
Confidence here is 0.92, not 1.0. The reference was extracted from natural language, which means there’s a small chance the regex matched incorrectly or that the same reference string appears on two different documents. High enough to auto-reconcile; low enough to flag a small sample for review auditing.
Level 3: Date Proximity + Amount + Partner
No clean reference in the description. But the amount matches an open invoice from the right partner, and the bank transaction date falls within a reasonable window of the invoice due date.
def find_partner_date_match(st_line, open_moves):
"""
Level 3: Amount match + partner + date proximity.
Confidence degrades as the date gap widens.
"""
amount = st_line.amount
partner = st_line.partner_id
tx_date = st_line.date
if not partner:
return None, 0.0
candidates = [
m for m in open_moves
if m.partner_id == partner
and abs(m.amount_residual + amount) < 0.01
]
best_move, best_conf = None, 0.0
for move in candidates:
due_date = move.invoice_date_due or move.date
gap = abs((tx_date - due_date).days)
if gap <= 3:
conf = 0.88
elif gap <= 10:
conf = 0.75
elif gap <= 30:
conf = 0.55
else:
conf = 0.30 # Below threshold — will route to review queue
if conf > best_conf:
best_move, best_conf = move, conf
return best_move, best_conf
Date proximity scoring is where you tune for your business. A company that receives wire transfers from clients who consistently pay 15–20 days late needs a wider window than one with reliable payment behavior. The 30-day cutoff isn’t a universal truth — it’s a starting point for calibration.
One thing worth noting: this level requires a matched partner_id on the bank statement line. Odoo does attempt automatic partner detection based on IBAN and account name, but it misses frequently enough that a preprocessing step to improve partner detection pays off before you even touch the matching logic.
Level 4: Partial Amount Matching
The hardest case: a single bank transfer covers multiple invoices, or one invoice was partially paid in a prior period. The agent checks whether combinations of open move residuals sum to the statement line amount.
from itertools import combinations
def find_partial_match(st_line, open_moves, max_invoices=4):
"""
Level 4: The bank line amount matches a combination of open moves.
Only checks combinations up to max_invoices to stay tractable.
"""
amount = abs(st_line.amount)
partner = st_line.partner_id
# Restrict to same partner where possible to reduce the search space
candidates = (
[m for m in open_moves if m.partner_id == partner]
if partner else open_moves
)
for n in range(2, min(max_invoices + 1, len(candidates) + 1)):
for combo in combinations(candidates, n):
combo_total = sum(abs(m.amount_residual) for m in combo)
if abs(combo_total - amount) < 0.01:
return list(combo), 0.70
return None, 0.0
Partial matching is computationally expensive for large candidate sets. Limiting to the same partner usually reduces candidates to under 20, which keeps combination checking fast. For a company with 50 open invoices from one partner, checking all 4-combination subsets is ~230,000 iterations — fine in a background cron job, wrong in a synchronous HTTP handler.
Confidence is set at 0.70 because combinatorial matches have more ambiguity. Two invoices summing to the right amount is a reasonable signal; it’s also possible the amount is a coincidence. Human confirmation for partial matches is the right default until you’ve observed the false positive rate in your specific data.
Confidence Thresholds and How to Calibrate Them
The agent auto-reconciles when confidence exceeds a threshold, queues for human review otherwise, and ignores the match entirely below a minimum floor.
AUTO_RECONCILE_THRESHOLD = 0.85
REVIEW_QUEUE_THRESHOLD = 0.40 # Below this, the suggestion is too weak to surface
def route_match(st_line, open_moves):
strategies = [
("exact", find_exact_match),
("fuzzy_ref", find_fuzzy_ref_match),
("partner_date", find_partner_date_match),
("partial", find_partial_match),
]
for name, strategy in strategies:
result, confidence = strategy(st_line, open_moves)
if result is None:
continue
if confidence >= AUTO_RECONCILE_THRESHOLD:
return "auto", name, result, confidence
elif confidence >= REVIEW_QUEUE_THRESHOLD:
return "review", name, result, confidence
return "unmatched", None, None, 0.0
How to calibrate: Run the agent in shadow mode for 2–4 weeks before enabling auto-reconciliation. Log every match decision alongside what the accountant actually matched (the ground truth). Then plot precision by threshold value. There’s almost always a clean inflection point around 0.85–0.90 where precision reaches 99%+. Below that threshold, it drops fast as the algorithm starts guessing.
Most implementations reach 85–92% auto-reconciliation rates after 4–6 weeks of calibration. The remaining 8–15% go to human review, which is expected — that’s the genuinely ambiguous tail of the distribution.
The Human Review Queue
The review queue is not a failure mode. It’s where the agent adds value beyond raw automation — by surfacing the right context so an accountant can make a decision in 15 seconds rather than 5 minutes.
Each queued item presents:
- The bank line: amount, date, description, partner (if detected)
- The suggested match: the best candidate and why it was selected
- Confidence score and reason: expressed in plain language, not raw numbers
- Alternative candidates: the next best matches, if any
- One-click actions: Accept / Reject / Create new payment / Skip
def build_review_context(st_line, suggested_match, confidence, alternatives):
return {
'statement_line': {
'id': st_line.id,
'date': st_line.date,
'amount': st_line.amount,
'description': st_line.payment_ref,
'partner_name': st_line.partner_id.name if st_line.partner_id else None,
},
'suggested_match': {
'move_id': suggested_match.id if suggested_match else None,
'move_name': suggested_match.name if suggested_match else None,
'confidence': round(confidence, 2),
'reason': build_reason_string(st_line, suggested_match, confidence),
},
'alternatives': [
{'move_id': m.id, 'move_name': m.name, 'confidence': round(c, 2)}
for m, c in (alternatives or [])[:2]
],
'actions': ['accept', 'reject', 'create_payment', 'skip'],
}
def confidence_label(score):
if score >= 0.85:
return "High confidence"
elif score >= 0.65:
return "Medium — please verify"
else:
return "Low confidence — manual review required"
The reason string is worth investing in. “Partner matched, invoice due 18 days ago” is useful. “Confidence: 0.73” is not. An accountant working through 20 queued items can process human-readable reasons in seconds; raw scores require them to reconstruct context they shouldn’t have to reconstruct.
Audit Trail
Every reconciliation decision — automated or human — writes to a log that both auditors and the agent itself can use.
def log_reconciliation(
env, st_line, matched_moves, confidence,
strategy, decision_type, user=None
):
"""
Writes to a custom model: reconciliation.agent.log
Fields: statement_line_id, matched_move_ids, confidence,
strategy, decision_type, decided_by, decided_at
"""
env['reconciliation.agent.log'].sudo().create({
'statement_line_id': st_line.id,
'matched_move_ids': [
(6, 0, [m.id for m in (
matched_moves if isinstance(matched_moves, list)
else [matched_moves]
)])
],
'confidence': confidence,
'strategy': strategy, # 'exact', 'fuzzy_ref', 'partner_date', 'partial', 'human'
'decision_type': decision_type, # 'auto' or 'human'
'decided_by': user.id if user else env.user.id,
'decided_at': fields.Datetime.now(),
})
The audit log has two distinct uses. First, compliance: every reconciliation decision is traceable to a specific user (or the agent acting as a system user) with a timestamp. Second, continuous improvement: after 3–6 months of logs, you can identify where human reviewers consistently accept the agent’s suggestion (raise the confidence threshold) and where they consistently override it (fix the algorithm).
The overrides are the most valuable data. They reveal patterns the agent doesn’t handle — a vendor who always pays with a slightly modified reference, a client who splits invoices in a non-obvious way. Log them, cluster them, extend the matching logic.
Getting This to Production
The matching code above runs as a scheduled action — ir.cron calling the agent on new unreconciled account.bank.statement.line records every hour. The review queue is a filtered list view with the agent’s context injected via an OWL widget.
The hardest part isn’t the algorithm. It’s getting clean partner data onto statement lines before matching starts. Vietnamese banks in particular send variable-length reference strings, occasional Unicode normalization issues (NFKC vs NFC encoding of Vietnamese diacritics), and inconsistent partner name formatting. Time spent on the normalization layer before tuning confidence thresholds pays back faster than time spent on matching sophistication.
Start with Level 1 only. Measure the automatic hit rate — typically 40–60% on the first run. Add Level 2 and measure again. Each layer should contribute 15–25 percentage points until you’re matching 85–95% automatically. If a layer adds less than 5 points, the problem is upstream data quality, not the matching logic.
Key Takeaways
- Use a hierarchy: Exact match → fuzzy reference → partner + date → partial amount. Stop at the first high-confidence result.
- Calibrate thresholds with shadow mode data, not intuition. Run 2–4 weeks of logging before enabling auto-reconciliation.
- The review queue is a product featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline., not a fallback. Design it to give accountants context, not just a list of uncertain items.
- Log everything: audit trail data is the raw material for improving the matching algorithm over time.
- Fix partner detection and reference normalization first. These upstream data quality issues limit every downstream matching strategy.
At Trobz, we’ve deployed versions of this agent for Odoo Accounting clients handling statement volumes from 50 lines per month to 2,000-line daily feeds across multi-bank setups. If you’re evaluating whether your reconciliation volume justifies this investment, we’re happy to walk through it — reach out here.