Glossary

Retrieval-Augmented Generation(RAG)

A pattern where an AI model retrieves relevant information from a knowledge base before generating its response. Lets AI answer questions about your specific data without retraining the model.

Retrieval-Augmented Generation (RAG) is a pattern, not a product. The idea: instead of asking a language model to answer from its training knowledge alone, you first retrieve relevant information from a knowledge base and augment the model’s input with it.

Common shape:

  1. User asks a question
  2. The system searches your knowledge base (e.g. your company wiki, support docs, product catalogue)
  3. The most relevant chunks of content are pasted into the model’s prompt
  4. The model generates an answer based on the retrieved content + the question

This lets a general-purpose LLM answer questions about your specific data without needing to retrain the model.

When you need RAG

  • You have a knowledge base larger than the model’s context window
  • You want answers grounded in specific, citable sources
  • The information changes often (e.g. product catalogue, pricing) and you can’t retrain a model every time it updates

When you don’t need RAG

  • The data fits comfortably in the model’s context window (200k+ tokens for modern Claude). Just paste it in.
  • You’re doing one-off analysis. Just paste the document.
  • The information is general knowledge the model already has well.

For Australian small business in 2026, the second case (just paste the document) is more common than people think. Modern context windows are large enough that you don’t need RAG for most operational tasks. Build RAG only when you’ve outgrown the simple approach.

How RAG retrieval typically works

Most RAG systems use embeddings: each chunk of your knowledge base is converted to a numerical vector that represents its meaning. When a question comes in, it’s also converted to a vector, and the system finds the chunks whose vectors are closest to the question’s vector (semantic search).

Alternatives include keyword search (BM25) and hybrid systems that combine both. For small data sets, simple keyword search is often surprisingly competitive with full embedding-based RAG.

Tools we’ve seen work for Australian SMBs

  • Built-in to Claude Projects, paste up to ~200k tokens of context; no infrastructure needed
  • Anthropic Files API, upload documents to a project, query them
  • LlamaIndex / LangChain, Python frameworks for building custom RAG
  • Pinecone / Weaviate / Chroma, vector databases for serious deployments

Don’t over-engineer. Most operators we work with don’t need a vector database. They need the right document in Claude’s context window.

Related terms

Want this built for your business?

Book a free 30-minute AI audit. We'll map your business and show you exactly which systems we'd build first. No pitch deck, no scoping fee.

Book my free AI audit