RAG vs Fine-Tuning vs Agents: A Founder's Guide | Ledian Shera

A founder asked me last month whether they should fine-tune, build an agent, or use RAG for their AI feature. They were genuinely unsure, and the engineers had given them three different answers. The decision actually has a clean shape once you separate what each approach is doing. This is the founder-level guide I wish more teams had on the wall.

The three approaches in one sentence each

RAG. You pull relevant information from your own data and put it into the prompt at the moment of the request.

Fine-tuning. You change the model's weights so it is better at a specific task.

Agents. You let the model decide which tools to call, in which order, until a goal is reached.

Each solves a different problem. They can be combined, but the founder-level mistake is treating them as interchangeable.

Reach for RAG when knowledge is the bottleneck

If your product needs the model to answer questions or draft outputs grounded in proprietary information, the answer is RAG. Your customer documents, your support tickets, your product catalog, your supplier price sheets, your past proposals. The model itself does not know your data. RAG gives it the relevant slice at the moment of the request.

This is the most common founder use case and it is where I start with almost every client. The Sainni real-time sales coaching platform is a RAG system. The ConstructionOS proposal engine is a RAG system. The compliance memo drafts at RegNexa are a RAG system. The reason is simple. Your moat is your data, and RAG is how the data shows up in the model's answer.

The cost profile is friendly. You pay for embeddings on a one-time and incremental basis, you pay for retrieval at request time, and you pay for the input tokens the retrieved chunks add to the prompt. Latency is usually 200 to 800 milliseconds for retrieval plus the model latency. The whole pipeline is observable, debuggable, and improvable.

The trap with RAG is treating it as a plumbing problem. The retrieval quality is the system. Chunk size, embedding model choice, hybrid search, reranking, recency boosts, metadata filters. Founders who skip this work and reach for fine-tuning instead are usually fixing the wrong layer.

Reach for fine-tuning when style or format is the bottleneck

Fine-tuning shines for one specific need. You want the model to consistently produce output in a specific style, format, or domain language that prompts cannot reliably enforce. A specific brand voice. A specific JSON shape. A specific medical or legal phrasing.

It is rarely the right tool for adding knowledge. People intuitively think fine-tuning teaches the model new facts. In practice, the cost-effective way to give a model facts is RAG. Fine-tuning teaches the model how to behave on tasks where prompts hit a ceiling.

The cost profile is heavier. You pay for training data preparation, for the fine-tune itself, for hosting if you go open weights, and for the perpetual cost of redoing the fine-tune when the next-generation base model arrives. Latency may be similar or slightly better than the base model. Observability is harder, because you cannot easily inspect why the fine-tune answered the way it did.

The honest signal that fine-tuning is the right call is that you have run prompts hard, you have a measurable, repeatable failure mode, and the failure is about how the model speaks rather than what it knows. If that signal is not there, fine-tuning is a side quest.

Reach for agents only when the workflow is open-ended

An agent is a model that decides which tools to call and in what order. Useful when the path from question to answer cannot be specified in advance. A research task that may need to call three or 30 tools depending on what is found. A debugging session where the model traces a hypothesis. A complex booking flow with branching constraints.

Agents are also the most expensive, slowest, and most fragile of the three approaches. A single user request can call the model 5 to 50 times, costing one to 12 dollars and taking 30 seconds to several minutes. Failure modes include loops, runaway costs, and confidently wrong tool selection. Observability is hard. Evals are harder.

The founder rule is to reach for an agent only when the workflow has open-ended branching the user cannot specify in advance. If the user can describe the steps, do not give an agent the steering wheel. A deterministic pipeline of two or three model calls beats an agent on cost, latency, and reliability for almost every business workflow I have shipped.

When agents are right, they are spectacular. The internal AI airport video intelligence project at Neurons Lab had agent characteristics, in the sense that the analysis path branched depending on what was detected. Most founder products do not have that property and should not pay the agent tax.

How to combine them

The three are not mutually exclusive. A common production pattern is RAG plus a small fine-tune plus a constrained agent loop. The RAG provides the knowledge. The fine-tune provides the format. The constrained loop provides the multi-step reasoning, with hard caps on tool calls and budget.

The combination is powerful, but it is a final-mile optimization. Start with RAG. Measure. Add a fine-tune only if a measurable, repeatable shortfall remains. Add agentic patterns only if the workflow needs branching the deterministic pipeline cannot give you.

The founder-level decision tree

Does your product need to answer or draft based on proprietary information? Use RAG.
After running RAG hard, do you still see a measurable, repeatable shortfall in style, format, or domain phrasing? Add a small fine-tune.
Is the workflow open-ended in a way that a deterministic pipeline cannot capture? Add an agent loop, with hard caps.
None of the above? You probably do not need any of them. A short prompt against a frontier model will do the job.

The cost shape at scale

For a 10,000 monthly active user product in 2026, the realistic monthly cost shape is roughly. RAG-only product, 1,000 to 6,000 euros per month including embeddings, vector DB, and model API. RAG plus light fine-tune, 1,500 to 9,000. RAG plus agent loops with caps, 4,000 to 25,000 depending on use intensity. Costs scale superlinearly with agentic patterns. Plan for it before you ship.

What to do this week

Look at your AI feature. If you have not yet shipped, default to RAG and a frontier model. If you are mid-build and the team is reaching for fine-tuning, ask for the labeled failure mode that motivated it. If the team is reaching for agents, ask whether a deterministic two-step pipeline could deliver the same outcome at a tenth of the cost. The answers will usually surprise you.

If you want a second opinion on the architecture choice before you commit, write to me. I respond within 48 hours.

RAG, fine-tuning, or agents: a founder's guide