Opinion2026-04-22

Why Most Fine-Tuning Advice Is Wrong

Common fine-tuning recommendations don't hold up under scrutiny. Here's what actually works.

By AI Signal Editorial

Open any tutorial on fine-tuning a language model and you will find the same advice: collect a few thousand examples, use LoRA, train for three epochs at a small learning rate, ship. This advice is not wrong, exactly. It is just optimised for the wrong objective — getting a tutorial to run end-to-end — rather than for producing a model that is measurably better at your task.

Most teams do not need fine-tuning

The first and most important thing to say is that the median team that thinks they need fine-tuning does not. They need better prompts, better retrieval, and a better evaluation harness. Fine-tuning is expensive to do well, expensive to maintain, and locks you into a model version. If you have not exhausted prompt-engineering and RAG, do those first. You will be surprised how often that is enough.

When you do fine-tune, data quality dominates

For the cases where fine-tuning actually helps — narrow domain language, specific output formats, latency-sensitive distillation — the variable that matters most is data quality, not training recipe. A hundred carefully curated examples will beat ten thousand mediocre ones. Spend your budget on the data pipeline: deduplication, format validation, hand-review of a random sample, explicit handling of edge cases. The hyperparameters are mostly a distraction.

LoRA is fine, full fine-tune is sometimes better

The reflexive default of "always use LoRA" is also worth questioning. LoRA is great when you are budget-constrained or when you want to swap adapters at serve time. When you are doing a one-shot specialisation on a moderate amount of data and you have the compute, a full fine-tune with a well-tuned schedule frequently produces a noticeably better model. The right answer is "measure both," not "always LoRA."

// Example: a minimal RAG retrieval step
const hits = await vectorStore.search(queryEmbedding, { topK: 50 });
const reranked = await crossEncoder.rerank(query, hits);
const context = reranked.slice(0, 6).map(h => h.text).join("\n---\n");