Methodology · 9 min read

The hidden costs of AI features: tokens, latency, evals, hallucinations

Founders ask me how much an AI feature costs and they almost always mean tokens. Tokens are the line they can see. The line items that drain budget after launch are latency engineering, eval upkeep, hallucination handling, and on-call coverage. Every AI product I have shipped has these costs. Most founders do not plan for them and the surprise lands in month four.

The cost layer founders see

Token cost is real, but it is the easiest layer to estimate and the easiest to optimize. The cost per active user per month for a chat-style AI feature in 2026 lands between 0.50 and 4 euros depending on usage intensity. For a workflow-style AI feature it lands between 0.05 and 0.50 per workflow. These numbers move down quarter over quarter as model providers compete on price.

The mistake founders make is treating this number as the full cost picture. Even at 4 euros per active user, a 5,000 user product is 20,000 euros a month, which sounds painful but is recoverable with reasonable pricing. The other four layers are where the budget actually breaks.

Hidden cost 1. Latency engineering

The model returns in 800 milliseconds. The user perceives it as 4 seconds. The difference is everything between the user and the model. Retrieval, prompt construction, response parsing, downstream calls, network hops, front-end rendering. Each adds latency, and a sluggish AI feature is a feature people stop using.

The realistic cost of latency engineering on a production AI feature is 15 to 30 percent of the total engineering time across the project lifetime. That budget pays for streaming, parallel retrieval, response caching, smaller fast-path models for simple cases, and front-end perceived-latency tricks. Founders who plan for token cost but not latency cost end up with products that work and feel broken.

Hidden cost 2. Eval upkeep

Evals are not a one-time setup. They are a permanent cost line. Every prompt change, every model upgrade, every retrieval index rebuild needs to be measured against the eval set. The eval set itself decays. Customer behavior shifts. New edge cases appear in real traffic. The labeled set has to grow and rebalance every quarter.

The realistic cost is 10 to 20 percent of one engineer's time, plus 10 to 30 hours of domain expert labeling per quarter. For a small team, this is the line item that gets quietly cut, and the price is paid in the form of regression that nobody catches until a customer complains.

Hidden cost 3. Hallucination handling

The model will be confidently wrong. The product needs a strategy for when this happens. The strategies have engineering cost.

Confidence scoring on output. Cross-checks against retrieval sources. Citation surfaces in the UI so users can verify. Human-in-the-loop review queues for high-stakes outputs. Audit trails that allow a manual correction to flow back into the eval set. Each of these is a real feature, not a checkbox.

The cost depends on the stakes. A casual chat feature can ship with light handling. A compliance memo draft cannot. The realistic cost on regulated or high-stakes products is 15 to 30 percent of total engineering, often more than the model integration itself. Founders see the AI feature as the model. Engineers see it as the surrounding system. The engineers are right.

Hidden cost 4. Observability

You cannot operate an AI product without logs of every prompt, every response, every retrieval hit, and every user feedback signal. The tooling alone is 200 to 1,500 euros per month at startup scale, growing fast. The engineering time to instrument it is one to two weeks of a senior engineer at the start, and 5 to 10 percent of one engineer's time ongoing.

This is non-negotiable for any production AI feature. The founders who skip it cannot answer simple questions like why did this customer churn or where is the model failing most often. They are running an AI product blind.

Hidden cost 5. On-call

AI features fail in unique ways. Provider outages. Quality regressions after a model update. Cost spikes from a misbehaving prompt. Retrieval index corruption. Each of these is a customer-impacting event that needs a human in the loop.

The realistic on-call cost is one engineer's nights and weekends in the first year, with rotation across two engineers in year two. The hidden expense is talent retention. AI on-call is hard, and engineers who carry it without relief leave. Founders who plan for two engineers and run with one pay the cost in attrition.

The five-layer real cost picture

For a 5,000 active user RAG-style product in 2026, a realistic full cost picture per month is roughly. Token and API spend, 10,000 to 20,000. Latency engineering, amortized at 15 percent of one to two engineers, 4,000 to 8,000. Eval upkeep including domain labeling, 3,000 to 6,000. Hallucination handling and review queues, 5,000 to 12,000 if the product is high-stakes. Observability tooling and on-call, 4,000 to 9,000. Total real cost per month, 26,000 to 55,000 euros. The token line is roughly a third of the picture.

This is the number founders need when they price the product. Pricing the product based on token cost alone is how AI features end up unprofitable at scale.

How to plan for this from day one

Three habits prevent the surprise.

  1. Build the eval harness, the observability, and the on-call rota in the first six weeks. Not after launch. Treat them as part of the MVP, not as polish.
  2. Price the product against the full cost picture, not the token line. Run the math at three usage tiers and confirm the unit economics work at each.
  3. Allocate ongoing engineering capacity to the four hidden layers. Roughly 30 to 50 percent of one engineer's time, perpetually, on top of new feature work.

The line that catches people out

Hallucination handling for high-stakes products. Founders consistently underestimate this and ship products that the customer cannot use without a senior employee reviewing every output. That review is fine if it is part of the workflow design. It is fatal if the founder thought the AI was supposed to remove the reviewer. Decide which world you are in early.

If you want a real cost picture for a feature you are scoping, write to me. I respond within 48 hours and I will share the spreadsheet I use with clients.

Working on something like this?

I respond to every email within 48 hours. If you want a second opinion before you commit budget, get in touch.

More on methodology