Key Takeaways: A 14-day sprint is long enough to build something real and short enough to maintain urgency — anything longer invites scope creep and committee decisions. The day-by-day structure below isn’t a marketing framework; it’s the sequence we’ve converged on after discovering what fails at each phase. What separates a successful PoC from a shelf-ware report is whether it reaches production — and that outcome is determined by decisions made on day one, not day fourteen. The handoff matters as much as the build.
Most enterprise AI PoCs end with a slide deck and a “next steps” section that nobody reads. The consultant has billed 40 hours for workshops and architecture diagrams. The stakeholders have seen a polished demo of a modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… running against clean sample data. Nothing touches production.
We’ve been on both sides of that dynamic. The 14-day sprint constraint was born out of frustration with it.
Why 14 Days
The two-week sprint is not an arbitrary deadline. It’s a forcing function.
Long engagements — six-week PoCs, twelve-week “AI readiness assessments” — accumulate a specific kind of dysfunction. Stakeholders attend workshops but defer decisions. Requirements grow laterally. The team optimizes for the demo, not the deployment. By the time the engagement ends, the champion who sponsored the work has three new priorities and the data has aged.
Two weeks forces three things:
A narrow scope. You can’t build everything in two weeks, so you have to decide what matters most. That decision, made under time pressure, is usually the right one.
Real data. There’s no time to construct a sanitized sample dataset. You’re working with actual Odoo data — sale.order.line, account.move, stock.move — or you’re not running the PoC at all.
A go/no-go decision. At the end of two weeks, the sponsor has seen a system running against their data. The decision to continue is based on evidence, not a pitch.
The Day-by-Day Structure
Here’s the actual cadence, with the rationale behind each phase.
Days 1–2: Discovery Workshop
The first session runs three hours, not one. We bring a business analyst and a technical lead. The sponsor brings the people who actually feel the problem — not the IT director.
The output isn’t a requirements document. It’s a single-sentence problem statement: “The finance team spends four hours a day manually matching vendor invoices to purchase orders, and the error rate is causing supplier relationship issues.” Everything else is secondary.
We also assess data access on day one. If we can’t query the relevant Odoo tables by end of day two, the sprint stops. Data access isn’t a technicality — it’s the precondition for everything.
Days 3–4: Data Audit
This is the most honest phase. We pull the actual data, profile it, and report what we find.
Invoice matching sounds clean until you see that 30% of vendor references have typos, the purchase order numbers don’t follow a consistent format, and three years of records are split across two Odoo databases from a migration that was never fully reconciled.
The audit changes the architecture. It almost always does. A matching algorithm designed for clean data needs different tuning thresholds than one built for inconsistent references. Finding this on day four beats finding it on day twelve.
Day 5: Architecture Decision
One hour, one document, one decision. We present two or three approaches — typically: rule-based matching with configurable thresholds, embeddingA dense numerical vector representation of text (or other data) that captures semantic meaning. Semantically similar texts have embeddings that are geometrically close. Embeddings power semantic…-based similarity search using pgvector, or an LLMA neural network trained on vast amounts of text data to understand and generate human language. LLMs use the Transformer architecture and can perform a wide range of tasks — summarization,… classification layer for edge cases.
Each option gets an honest tradeoff summary: accuracy ceiling, operational complexity, cost at scale, maintenance requirements. The sponsor picks one.
We don’t over-engineer this meeting. The goal is to lock the approach so the build sprint doesn’t stall on architectural debates.
Days 6–10: Build Sprint
Five days of heads-down implementation. The technical lead builds; the business analyst handles stakeholder questions, data clarifications, and the inevitable discovery that one of the key fields is null for 15% of records.
What gets built isn’t a polished product. It’s a working system connected to real data, running the core logic, with enough logging to understand what it’s doing and why.
In the invoice matching case, this means: a matching pipeline reading from account.move and purchase.order, running similarity comparisons, writing match confidence scores to a staging table, and a basic UI surfacing results for human review. Not a finished featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. — a working demonstration of the mechanism.
Day 11: Stakeholder Demo
The demo runs on live data. No mock objects, no rehearsed inputs.
This is where PoCs optimized for the demo reveal themselves. Systems built on sanitized samples don’t handle real edge cases — the invoice from a vendor who changed payment terms mid-year, the PO split across three delivery orders. Those are exactly the cases that matter to the people in the room.
We show the failures as well as the wins. If the system misclassifies 8% of invoices, we show what those 8% look like and explain why. Trust comes from honesty about limits, not from hiding them.
Days 12–13: Iteration
Two days for the most valuable feedback: the things the sponsor sees in the demo that aren’t wrong, exactly, but don’t reflect how the business actually works.
The confidence threshold that felt right in development often needs adjustment when you see real cases. The UI that made sense to the developer doesn’t match the finance team’s mental modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or…. The output format needs one extra field that wasn’t in the original scope.
This isn’t scope creep. It’s the legitimate cost of building with real data and real users. Two days is enough to address the highest-priority adjustments without reopening the architecture.
Day 14: Go/No-Go
The final session has one agenda item: does this go to production?
By this point, the sponsor has seen the system running for three days. They have a concrete sense of what it does, what it doesn’t do, and what production quality would require. The decision is straightforward.
If yes, handoff planning starts immediately. If no — the data quality is too poor, the business case doesn’t hold, the technical complexity is higher than estimated — that’s a legitimate outcome. Fourteen days to learn that a project isn’t worth pursuing is significantly better than six months.
What the PoC Doesn’t Include
A PoC sprint is not a production deployment. This is worth stating plainly because it’s where post-PoC projects most often stall.
The system built in two weeks lacks:
- Production observability — we add error logging, not alerting or dashboards
- Retraining pipeline — if the modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… uses learned components, it will drift; we document the retraining cadence but don’t implement it
- Access controls and audit trail — sufficient for a demo, not for a regulated finance environment
- Load testing — the system works at demo scale; production volumes need a separate assessment
These aren’t surprises. The sprint contract explicitly scopes them out. The honest framing: you’re paying for evidence that this is worth building, not for the production build.
The Handoff That Actually Matters
Most PoC engagements have no real handoff. The consultant leaves and the client has a codebase they don’t fully understand, running on infrastructure they didn’t set up, with no documentation of the decisions made.
We do three things differently.
Architecture Decision Records. One page per major decision — why we chose pgvector over a cloud vector store, why the confidence threshold is set at 0.72, what the known failure modes are. Written for the person who inherits this code twelve months from now.
Retraining schedule. If the system uses a machine learning component, we define the retraining trigger (quarterly, or when accuracy drops below a defined threshold on the validation set), the data pipelineAn automated sequence of steps that ingests, transforms, validates, and delivers data for training or inference. Data pipelines ensure consistent, repeatable data preparation and are foundational to… to support it, and who owns the process. This is a business decision as much as a technical one.
Ownership transfer session. One half-day with the internal team who will maintain the system. Not a documentation review — a live walkthrough of the codebase, with them at the keyboard, running the key flows.
The goal is that Troboz is unnecessary six months after production launch. That’s the right outcome.
What a Typical Consultancy Delivers Instead
This isn’t a criticism of any specific firm, but the pattern is consistent enough to name: large enterprise AI consulting practices are optimized to extend engagements. A six-month roadmap with quarterly check-ins is a better business modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… than a two-week sprint with a clear go/no-go. The incentives are visible in the deliverables.
The 14-day structure is a forcing function for the client as well as the consultant. It requires data access, internal resources, and a decision-maker who can approve a go/no-go on day fourteen. Many organizations can’t or won’t do that — and that’s fine. But it’s worth knowing before you start.
If the answer to “can you give us database access to the relevant tables in two weeks?” is “we’ll need to go through procurement,” a two-week sprint isn’t the right structure. A twelve-week engagement probably isn’t either, for different reasons. The underlying issue is that the organization isn’t ready to run the project, regardless of timeline.
The slide deck PoC persists because it’s low-friction for everyone. It requires no real data access, no internal bandwidth, no hard decisions. It also produces nothing you can put in production. The 14-day sprint is higher friction — and that friction is the point.
At Troboz, the AI advisory sprint is how we start engagements that go to production. If you’re evaluating whether an AI project is worth pursuing, reach out and we’ll tell you whether a sprint fits your situation — or whether something else makes more sense first.