Autonomous Bank Reconciliation in Odoo: How the Agent Decides What to Match

Bank reconciliation is one of the highest-ROI targets for an Odoo — repetitive, high-volume, but nuanced enough that the matching logic matters. Here's exactly how we built it.

Key Takeaways: Bank reconciliation is one of the highest-ROI targets for an Odoo AI agent because the matching patterns are learnable and the failure modes are bounded. A well-designed matching hierarchy handles 85–95% of lines automatically. The remaining 5–15% go to a human review queue — and that queue is where the agent adds value by surfacing context, not just flagging uncertainty. Confidence thresholds are not magic numbers; they’re business decisions expressed as code. Every match, automated or human, leaves an audit trail that makes the system improvable over time.

Bank statements don’t reconcile themselves. A busy mid-size company processes 200–500 bank statement lines per month, and a finance team member working through them manually — matching a VND 51,750,000 transfer to INV/2026/00342, hunting for a split payment, checking whether that duplicate-looking entry is actually two separate subscriptions — takes 3–4 hours. Every month.

The Bank Reconciliation Agent cuts that to under 20 minutes. Not by being clever in the LLM sense — by being systematic in ways humans aren’t.

This post covers the matching logic in detail: how the hierarchy works, what confidence scoring looks like in practice, how to calibrate thresholds for your specific payment behavior, and how the human review queue is structured to keep an accountant productive rather than confused.

Why Bank Reconciliation Is the Right Problem for an Agent

Most AI agent projects fail because the problem is too open-ended. Reconciliation succeeds because the decision space is bounded:

Either a bank statement line matches an account.move (invoice, bill, payment) in Odoo, or it doesn’t
The signals are structured: amount, reference string, date, partner
The cost of a wrong match is recoverable — Odoo preserves original entries and a mis-reconciled line can be unreconciled in seconds

That structure means you can build a matching hierarchy with deterministic rules first, probabilistic scoring second, and human judgment last. The agent doesn’t need to reason about reconciliation the way an LLM reasons about open-ended text. It needs to apply a scoring function well and know when it’s uncertain.

This is also what the AI revolution actually looks like in practice — not a general intelligence replacing your accountant, but a structured decision system handling the routine 90% so your accountant can focus on the edge cases that actually require judgment.

The Matching Hierarchy

The agent works through four matching strategies in order, stopping as soon as it finds a result above the auto-reconcile threshold.

Level 1: Exact Amount + Reference

The cleanest case. The bank line carries an amount of 1,500.00 and a reference of INV/2026/00198. Odoo has an open account.move with that exact reference and a matching outstanding balance.

def find_exact_match(st_line, open_moves):
    """
    Level 1: Exact amount and reference match.
    Returns the move and confidence=1.0 if found, else None.
    """
    amount = st_line.amount
    ref = (st_line.payment_ref or "").strip().upper()

    for move in open_moves:
        move_ref = (move.ref or move.name or "").strip().upper()
        if abs(move.amount_residual + amount) < 0.01 and move_ref == ref:
            return move, 1.0

    return None, 0.0

account.bank.statement.line.amount is signed: positive for incoming funds (credit to bank), negative for outgoing. account.move.amount_residual carries the same sign convention from the journal entry side. The + amount check — rather than comparing absolutes — handles this correctly without a sign flip.

Gotcha: References are not always clean. A payment arriving from a Vietnamese bank might read THANH TOAN HOA DON INV/2026/00198 CONG TY ABC THANG 04. Exact matching misses this entirely. That’s what Level 2 handles.

Level 2: Exact Amount + Fuzzy Reference

Same amount requirement, but the reference is extracted from the bank description using a regex rather than compared directly.

import re

INVOICE_REF_PATTERN = re.compile(
    r'\b(INV|BILL|REFUND|BNK)/\d{4}/\d{4,6}\b',
    re.IGNORECASE
)

def find_fuzzy_ref_match(st_line, open_moves):
    """
    Level 2: Exact amount, extracted reference from payment description.
    """
    amount = st_line.amount
    description = st_line.payment_ref or st_line.narration or ""

    extracted_refs = INVOICE_REF_PATTERN.findall(description)
    if not extracted_refs:
        return None, 0.0

    for move in open_moves:
        move_ref = (move.name or "").strip().upper()
        for ref in extracted_refs:
            if move_ref == ref.upper() and abs(move.amount_residual + amount) < 0.01:
                return move, 0.92  # High but not 1.0 — ref was extracted, not direct

    return None, 0.0

Confidence here is 0.92, not 1.0. The reference was extracted from natural language, which means there’s a small chance the regex matched incorrectly or that the same reference string appears on two different documents. High enough to auto-reconcile; low enough to flag a small sample for review auditing.

Level 3: Date Proximity + Amount + Partner

No clean reference in the description. But the amount matches an open invoice from the right partner, and the bank transaction date falls within a reasonable window of the invoice due date.

def find_partner_date_match(st_line, open_moves):
    """
    Level 3: Amount match + partner + date proximity.
    Confidence degrades as the date gap widens.
    """
    amount = st_line.amount
    partner = st_line.partner_id
    tx_date = st_line.date

    if not partner:
        return None, 0.0

    candidates = [
        m for m in open_moves
        if m.partner_id == partner
        and abs(m.amount_residual + amount) < 0.01
    ]

    best_move, best_conf = None, 0.0
    for move in candidates:
        due_date = move.invoice_date_due or move.date
        gap = abs((tx_date - due_date).days)

        if gap <= 3:
            conf = 0.88
        elif gap <= 10:
            conf = 0.75
        elif gap <= 30:
            conf = 0.55
        else:
            conf = 0.30  # Below threshold — will route to review queue

        if conf > best_conf:
            best_move, best_conf = move, conf

    return best_move, best_conf

Date proximity scoring is where you tune for your business. A company that receives wire transfers from clients who consistently pay 15–20 days late needs a wider window than one with reliable payment behavior. The 30-day cutoff isn’t a universal truth — it’s a starting point for calibration.

One thing worth noting: this level requires a matched partner_id on the bank statement line. Odoo does attempt automatic partner detection based on IBAN and account name, but it misses frequently enough that a preprocessing step to improve partner detection pays off before you even touch the matching logic.

Level 4: Partial Amount Matching

The hardest case: a single bank transfer covers multiple invoices, or one invoice was partially paid in a prior period. The agent checks whether combinations of open move residuals sum to the statement line amount.

from itertools import combinations

def find_partial_match(st_line, open_moves, max_invoices=4):
    """
    Level 4: The bank line amount matches a combination of open moves.
    Only checks combinations up to max_invoices to stay tractable.
    """
    amount = abs(st_line.amount)
    partner = st_line.partner_id

    # Restrict to same partner where possible to reduce the search space
    candidates = (
        [m for m in open_moves if m.partner_id == partner]
        if partner else open_moves
    )

    for n in range(2, min(max_invoices + 1, len(candidates) + 1)):
        for combo in combinations(candidates, n):
            combo_total = sum(abs(m.amount_residual) for m in combo)
            if abs(combo_total - amount) < 0.01:
                return list(combo), 0.70

    return None, 0.0

Partial matching is computationally expensive for large candidate sets. Limiting to the same partner usually reduces candidates to under 20, which keeps combination checking fast. For a company with 50 open invoices from one partner, checking all 4-combination subsets is ~230,000 iterations — fine in a background cron job, wrong in a synchronous HTTP handler.

Confidence is set at 0.70 because combinatorial matches have more ambiguity. Two invoices summing to the right amount is a reasonable signal; it’s also possible the amount is a coincidence. Human confirmation for partial matches is the right default until you’ve observed the false positive rate in your specific data.

Confidence Thresholds and How to Calibrate Them

The agent auto-reconciles when confidence exceeds a threshold, queues for human review otherwise, and ignores the match entirely below a minimum floor.

AUTO_RECONCILE_THRESHOLD = 0.85
REVIEW_QUEUE_THRESHOLD = 0.40  # Below this, the suggestion is too weak to surface

def route_match(st_line, open_moves):
    strategies = [
        ("exact", find_exact_match),
        ("fuzzy_ref", find_fuzzy_ref_match),
        ("partner_date", find_partner_date_match),
        ("partial", find_partial_match),
    ]

    for name, strategy in strategies:
        result, confidence = strategy(st_line, open_moves)
        if result is None:
            continue

        if confidence >= AUTO_RECONCILE_THRESHOLD:
            return "auto", name, result, confidence
        elif confidence >= REVIEW_QUEUE_THRESHOLD:
            return "review", name, result, confidence

    return "unmatched", None, None, 0.0

How to calibrate: Run the agent in shadow mode for 2–4 weeks before enabling auto-reconciliation. Log every match decision alongside what the accountant actually matched (the ground truth). Then plot precision by threshold value. There’s almost always a clean inflection point around 0.85–0.90 where precision reaches 99%+. Below that threshold, it drops fast as the algorithm starts guessing.

Most implementations reach 85–92% auto-reconciliation rates after 4–6 weeks of calibration. The remaining 8–15% go to human review, which is expected — that’s the genuinely ambiguous tail of the distribution.

The Human Review Queue

The review queue is not a failure mode. It’s where the agent adds value beyond raw automation — by surfacing the right context so an accountant can make a decision in 15 seconds rather than 5 minutes.

Each queued item presents:

The bank line: amount, date, description, partner (if detected)
The suggested match: the best candidate and why it was selected
Confidence score and reason: expressed in plain language, not raw numbers
Alternative candidates: the next best matches, if any
One-click actions: Accept / Reject / Create new payment / Skip

def build_review_context(st_line, suggested_match, confidence, alternatives):
    return {
        'statement_line': {
            'id': st_line.id,
            'date': st_line.date,
            'amount': st_line.amount,
            'description': st_line.payment_ref,
            'partner_name': st_line.partner_id.name if st_line.partner_id else None,
        },
        'suggested_match': {
            'move_id': suggested_match.id if suggested_match else None,
            'move_name': suggested_match.name if suggested_match else None,
            'confidence': round(confidence, 2),
            'reason': build_reason_string(st_line, suggested_match, confidence),
        },
        'alternatives': [
            {'move_id': m.id, 'move_name': m.name, 'confidence': round(c, 2)}
            for m, c in (alternatives or [])[:2]
        ],
        'actions': ['accept', 'reject', 'create_payment', 'skip'],
    }

def confidence_label(score):
    if score >= 0.85:
        return "High confidence"
    elif score >= 0.65:
        return "Medium — please verify"
    else:
        return "Low confidence — manual review required"

The reason string is worth investing in. “Partner matched, invoice due 18 days ago” is useful. “Confidence: 0.73” is not. An accountant working through 20 queued items can process human-readable reasons in seconds; raw scores require them to reconstruct context they shouldn’t have to reconstruct.

Audit Trail

Every reconciliation decision — automated or human — writes to a log that both auditors and the agent itself can use.

def log_reconciliation(
    env, st_line, matched_moves, confidence,
    strategy, decision_type, user=None
):
    """
    Writes to a custom model: reconciliation.agent.log
    Fields: statement_line_id, matched_move_ids, confidence,
            strategy, decision_type, decided_by, decided_at
    """
    env['reconciliation.agent.log'].sudo().create({
        'statement_line_id': st_line.id,
        'matched_move_ids': [
            (6, 0, [m.id for m in (
                matched_moves if isinstance(matched_moves, list)
                else [matched_moves]
            )])
        ],
        'confidence': confidence,
        'strategy': strategy,  # 'exact', 'fuzzy_ref', 'partner_date', 'partial', 'human'
        'decision_type': decision_type,  # 'auto' or 'human'
        'decided_by': user.id if user else env.user.id,
        'decided_at': fields.Datetime.now(),
    })

The audit log has two distinct uses. First, compliance: every reconciliation decision is traceable to a specific user (or the agent acting as a system user) with a timestamp. Second, continuous improvement: after 3–6 months of logs, you can identify where human reviewers consistently accept the agent’s suggestion (raise the confidence threshold) and where they consistently override it (fix the algorithm).

The overrides are the most valuable data. They reveal patterns the agent doesn’t handle — a vendor who always pays with a slightly modified reference, a client who splits invoices in a non-obvious way. Log them, cluster them, extend the matching logic.

Getting This to Production

The matching code above runs as a scheduled action — ir.cron calling the agent on new unreconciled account.bank.statement.line records every hour. The review queue is a filtered list view with the agent’s context injected via an OWL widget.

The hardest part isn’t the algorithm. It’s getting clean partner data onto statement lines before matching starts. Vietnamese banks in particular send variable-length reference strings, occasional Unicode normalization issues (NFKC vs NFC encoding of Vietnamese diacritics), and inconsistent partner name formatting. Time spent on the normalization layer before tuning confidence thresholds pays back faster than time spent on matching sophistication.

Start with Level 1 only. Measure the automatic hit rate — typically 40–60% on the first run. Add Level 2 and measure again. Each layer should contribute 15–25 percentage points until you’re matching 85–95% automatically. If a layer adds less than 5 points, the problem is upstream data quality, not the matching logic.

Key Takeaways

Use a hierarchy: Exact match → fuzzy reference → partner + date → partial amount. Stop at the first high-confidence result.
Calibrate thresholds with shadow mode data, not intuition. Run 2–4 weeks of logging before enabling auto-reconciliation.
The review queue is a product feature, not a fallback. Design it to give accountants context, not just a list of uncertain items.
Log everything: audit trail data is the raw material for improving the matching algorithm over time.
Fix partner detection and reference normalization first. These upstream data quality issues limit every downstream matching strategy.

At Trobz, we’ve deployed versions of this agent for Odoo Accounting clients handling statement volumes from 50 lines per month to 2,000-line daily feeds across multi-bank setups. If you’re evaluating whether your reconciliation volume justifies this investment, we’re happy to walk through it — reach out here.