Founders ask me how a real AI MVP gets shipped in 90 days. There is no magic, but there is a rhythm, and the rhythm has not changed much across the products I have shipped at AlbTech Solutions, Neurons Lab, and DigitSapiens. This is the plan, week by week, including the four checkpoints where most projects die quietly.
Weeks 1 and 2. Discovery
The biggest mistake non-technical founders make is starting the build in week 1. Discovery is not optional and it is not bureaucratic. It is the cheapest two weeks of the entire project.
The artifacts you produce in this period are three documents. The workflow map of the work the AI will replace or augment, with the human steps written down and timed. The constraints document covering latency targets, cost ceilings, accuracy bars, and failure handling. The eval plan listing the first 50 to 100 labeled examples and the metric that defines success.
The output of the two weeks is a go or no-go. If the workflow is not yet stable enough to automate, you stop. If the constraints are incompatible with the budget, you renegotiate scope. If you cannot agree on what success means, you cannot ship.
Checkpoint 1. At the end of week 2, the founder, the product lead, and the lead engineer should sign the eval plan. If you cannot get that signed, the project is not ready to start.
Weeks 3 and 4. The thinnest slice
You build the smallest possible end-to-end version. Real model, real data, real workflow surface, real eval running. It is ugly. The UI is a single page. The retrieval is rough. The prompts are first drafts. That is the point. You are validating that the pipe works end to end, not that any of it is good yet.
By the end of week 4 you should be able to run the eval harness on your slice and produce a baseline number. That number will be bad. The point is that you have a number. From here, every change either moves it up or it does not.
Checkpoint 2. If by the end of week 4 you cannot produce an eval score on a labeled set, the project is in trouble. Stop and fix this before adding scope.
Weeks 5 to 7. Build to baseline
Now the team focuses on moving the eval score. Better prompts. Better retrieval. Better chunking. Reranking. Tool use. The team should commit a measurable change daily and run the eval against it. The score moves up. When it stops moving on prompt changes, you bring in retrieval improvements. When retrieval saturates, you look at structural prompt changes or hybrid models.
The discipline at this stage is to resist scope expansion. Founders see the slice working and ask for more workflows. The right answer is no until the first workflow hits its accuracy bar. Two half-built workflows are worth less than one fully shipped workflow.
Checkpoint 3. By end of week 7, the eval score should be at 80 percent of the target you set in week 2. If not, you have a structural problem the team is not surfacing. Stop and find it.
Weeks 8 to 10. Real customer integration
Now you put the slice in front of one real customer or one real internal user. Not a demo, not a controlled environment. A real workflow with real data and real consequences. The eval score will look worse than it did in the lab. That is normal. Real data is messier than your golden set.
What matters in this period is the feedback loop. Every interaction logs the prompt, the response, the retrieval hits, and a thumbs-up or thumbs-down from the user. By the end of week 10 you have 500 to 2,000 real interactions, you have expanded the eval set with the real edge cases, and you have closed two or three structural gaps.
Checkpoint 4. By end of week 10, your real-world success rate on user-rated interactions should be within 10 percentage points of your eval score. If the gap is larger, your eval set is not representative. Fix the eval set, then proceed.
Weeks 11 and 12. Production hardening
The product moves from one customer to ready-for-five. Logging, observability, rate limiting, cost controls, retry logic, fallback paths, an admin tool for the founder to inspect any failed interaction. None of this is glamorous. All of it is what separates a demo from a product.
The other work in this stretch is documentation, a runbook for on-call, and a clear handoff protocol if the AI fails. What the customer sees, what the support team sees, how the founder gets paged.
By end of week 12 you have a product. Not a polished one. A real one, with a real first customer, a real eval score, a real feedback loop, and a real path to scale.
The team you actually need
For a Bracket 1 or low Bracket 2 MVP, the realistic team is three or four people. One AI product manager or AI architect leading discovery, eval design, prompts, and the customer relationship. One backend engineer building the retrieval and the API. One full-stack engineer building the workflow surface. Optional fourth seat is a designer for two days a week or a domain expert for labeling.
Founders try to ship with two engineers and no AI lead. The engineers do their best work, but they are not paid to challenge the founder on scope, evals, or workflow design. The first time a hard product call needs to be made, the project drifts. Hire or contract someone whose job is to own that call.
What the plan looks like when it works
I have run this 90-day rhythm on the Sainni MVP, on the ConstructionOS proposal engine, on the Chesamel marketing workflow at Neurons Lab, and on the Sunalys solar platform at DigitSapiens. The shape is always the same. Two weeks of discovery. Four to six weeks of building to baseline. Three weeks of real customer use. Two weeks of hardening. The products that hit the rhythm shipped. The ones that skipped discovery either took six months or did not ship.
What the plan looks like when it breaks
It always breaks at one of the four checkpoints. The team cannot agree on the eval plan in week 2. The eval harness does not exist by week 4. The score plateaus at 60 percent of target by week 7. The real-world rate is 25 points below the eval rate at week 10. Each of these has a fix, but only if the founder treats the checkpoint as a real decision and not a milestone to wave through.
If you are starting an AI MVP and you want a second opinion on the discovery artifacts before you commit budget, write to me. I respond within 48 hours.