The Most Valuable PoC We've Delivered Said "Don't Build This"

A distribution company hired us to automate purchase order approvals with AI. Two weeks later, we told them not to. Here's why that was the right outcome — and what we built instead.

Key Takeaways: The best PoC outcome is sometimes a clear “don’t build this.” A distribution company asked us to automate PO approvals based on value thresholds; the PoC showed that approval risk was driven by supplier relationship history — a feature that didn’t exist in the structured data. We recommended a rules-based approval matrix combined with a human-readable supplier reliability score. The client saved months of development time and got a better system. A PoC that prevents a bad investment is more valuable than one that justifies the next engagement.

A regional distribution company came to us with a clear brief: build an AI system to auto-approve purchase orders under a certain value threshold. The logic was intuitive — small POs are low-risk, so automating them frees up the procurement team for decisions that actually need human attention. Two weeks later, we delivered the PoC and told them not to build it.

The client was surprised. We weren’t.

The Brief Was Reasonable. The Assumption Wasn’t.

The initial brief rested on a common premise: that PO value is a reasonable proxy for approval risk. Under €5,000? Approve automatically. Over? Route for review. Simple, auditable, easy to explain to management.

The problem is that “low value” and “low risk” are not the same thing in distribution. A €2,000 order from a supplier who has delivered late four times in the past year, operates on 90-day payment terms, and has a pending dispute on a prior invoice is not a low-risk approval. A €4,800 order from a strategic supplier with ten years of clean history is not a high-risk one.

We knew this hypothesis needed testing before any model was built. The PoC wasn’t about building an AI — it was about finding out whether the data supported the premise.

What the Data Actually Said

The PoC dataset: 14 months of purchase.order records, with associated account.move (vendor invoices), res.partner supplier profiles, and manual approval notes from the procurement manager. We extracted structured signals: PO value, delivery timeline, payment terms, supplier age, prior order volume. We also extracted something less structured: the approval notes themselves — short free-text fields where the procurement manager had written things like “delayed last quarter” or “dispute still open” or “reliable, expedite if needed.”

The analysis was revealing. When we modelled approval decisions against structured features, PO value explained almost none of the variance. The dominant predictors were:

Supplier delivery reliability — whether the supplier had late deliveries in the previous six months
Open dispute flag — whether there was an unresolved issue on a prior invoice
Payment term risk — whether terms were net-90 or longer, increasing cash flow exposure

None of these were cleanly available in structured form. Delivery reliability could be inferred from stock.picking records, but only partially — external logistics delays weren’t captured. The open dispute flag didn’t exist as a field; it lived in the approval notes. Payment terms were on the supplier record but not being used for any approval logic.

The AI system the client had in mind would have been trained on PO value as a proxy for decisions actually being made on different criteria. It would have confidently automated approvals while systematically missing the cases that deserved review — the high-frequency small-order suppliers with deteriorating reliability.

Why AI Wasn’t the Answer

There’s a version of this project that builds the AI anyway — trains a classifier on historical approvals, achieves 82% accuracy on held-out data, and ships it. The demo would look good. The model would even be right most of the time.

The problem is the 18% it gets wrong. In a procurement workflow processing 200 POs per week, that’s 36 misclassified approvals. Some would be over-approvals — orders that should have been reviewed, approved automatically. Each one is a small financial or relationship risk that accumulates quietly. The procurement manager would start noticing patterns: “it keeps approving things from Supplier X even after that delivery problem.” Trust erodes. The system gets turned off.

We’ve written about this dynamic before in the context of AI roadmapping: the question to ask before building is not “can we build this?” but “does the data support the decision we’re trying to automate?” The answer here was no — not because the data was bad, but because the decision wasn’t what we thought it was.

What We Recommended Instead

The PoC produced two concrete recommendations.

First: a rules-based approval matrix. Three tiers, not two. Tier 1 (auto-approve): PO value under €3,000, supplier delivery reliability above 90% in the past 90 days, no open disputes, payment terms under 60 days. Tier 3 (always review): any open dispute, any delivery reliability below 70%, payment terms over 90 days. Tier 2 (everything else): route for review with a recommended decision.

This is not AI. It’s a decision table. It can be built in a day in Odoo using computed fields on purchase.order and an approval workflow triggered by tier classification. It’s auditable, explainable, and adjustable without retraining anything.

Second: a supplier reliability score surfaced as context. Rather than having the AI make the decision, surface the inputs that matter on the PO approval form — a simple score block showing 90-day delivery rate, open dispute indicator, and payment term risk rating, drawn from stock.picking, account.move, and res.partner records. The procurement manager makes the final call, but with better information at hand.

This second piece is where a small amount of engineering adds real value. The score is computed automatically, updated daily, and visible in the existing Odoo approval workflow without any new interface. It makes human judgment faster and more consistent, rather than trying to replace it.

How the Client Received It

Honestly, there was a moment of silence on the call. The client had come in expecting to sign off on a development project. We were telling them not to.

What changed the conversation was the PoC output itself. We showed the correlation analysis: PO value vs. approval decision — essentially flat. We showed the approval note text, clustered: the words “late”, “dispute”, and “reliable” appeared more than any value threshold. We showed what a rules-based Tier 1 auto-approval would have looked like on last year’s data — 61% of POs cleared automatically, with zero misclassifications on the cases the procurement manager had flagged as important.

That number — 61% auto-approved, no notable errors — was more compelling than any accuracy metric from an ML model. It was derived from the actual decision criteria the client was already using, made explicit.

The client agreed. We built the rules engine and the supplier reliability score block over the next three weeks. The procurement manager told us six weeks later that the score card alone had already changed two decisions she would otherwise have approved automatically.

What This Says About PoC Methodology

A PoC has one job: answer the hypothesis. Not build the product, not demonstrate the technology, not impress the stakeholder. Answer the hypothesis.

The hypothesis here was: “PO value is a reliable proxy for approval risk, and an AI system trained on value thresholds can automate safe approvals.” The PoC answered that hypothesis. The answer was no.

That’s a success, not a failure. The alternative — skipping the PoC and building the AI — would have produced a system that performed poorly on the cases that matter, eroded trust with the user who had to live with it, and left the client without the simpler, better solution that was available all along.

The most dangerous consulting posture is to find a way to build what the client asked for, regardless of whether it’s the right thing to build. A PoC is the structured mechanism for avoiding that posture. It’s also why we start with the bottleneck, not the buzzword — the bottleneck in this case was decision quality, and the solution was decision support, not decision automation.

When we run PoCs, we define the failure condition before we start: “if PO value explains less than X% of approval variance, the AI-first approach is wrong.” Defining failure before you start is the only way to avoid working backward from a conclusion you already want to reach.

At Trobz, some of our most useful PoCs end with a recommendation not to build the AI — and a concrete description of what to build instead. If you’re evaluating an AI project and want an honest read on whether the data supports the approach, we’re available for a scoping conversation.