Retrieval-Augmented Generation (RAG), AI glossary

Retrieval-Augmented Generation (RAG) is a pattern, not a product. The idea: instead of asking a language model to answer from its training knowledge alone, you first retrieve relevant information from a knowledge base and augment the model’s input with it.

Common shape:

User asks a question
The system searches your knowledge base (e.g. your company wiki, support docs, product catalogue)
The most relevant chunks of content are pasted into the model’s prompt
The model generates an answer based on the retrieved content + the question

This lets a general-purpose LLM answer questions about your specific data without needing to retrain the model.

When you need RAG

You have a knowledge base larger than the model’s context window
You want answers grounded in specific, citable sources
The information changes often (e.g. product catalogue, pricing) and you can’t retrain a model every time it updates

When you don’t need RAG

The data fits comfortably in the model’s context window (200k+ tokens for modern Claude). Just paste it in.
You’re doing one-off analysis. Just paste the document.
The information is general knowledge the model already has well.

For Australian small business in 2026, the second case (just paste the document) is more common than people think. Modern context windows are large enough that you don’t need RAG for most operational tasks. Build RAG only when you’ve outgrown the simple approach.

How RAG retrieval typically works

Most RAG systems use embeddings: each chunk of your knowledge base is converted to a numerical vector that represents its meaning. When a question comes in, it’s also converted to a vector, and the system finds the chunks whose vectors are closest to the question’s vector (semantic search).

Alternatives include keyword search (BM25) and hybrid systems that combine both. For small data sets, simple keyword search is often surprisingly competitive with full embedding-based RAG.

Tools we’ve seen work for Australian SMBs

Built-in to Claude Projects, paste up to ~200k tokens of context; no infrastructure needed
Anthropic Files API, upload documents to a project, query them
LlamaIndex / LangChain, Python frameworks for building custom RAG
Pinecone / Weaviate / Chroma, vector databases for serious deployments

Don’t over-engineer. Most operators we work with don’t need a vector database. They need the right document in Claude’s context window.

Retrieval-Augmented Generation(RAG)

When you need RAG

When you don’t need RAG

How RAG retrieval typically works

Tools we’ve seen work for Australian SMBs

Want this built for your business?

When you need RAG

When you don’t need RAG

How RAG retrieval typically works

Tools we’ve seen work for Australian SMBs

Related terms

Get the next one in your inbox

Want this built for your business?

Keep reading

Model Context Protocol

Tool use

Fine-tuning