Embedding
A numerical representation of text (or images, audio) as a vector. Similar meanings produce similar vectors. The foundation of semantic search and most RAG systems.
An embedding is a numerical representation of a piece of content, text, an image, an audio clip, as a list of numbers (a vector), typically of length 1,024 to 4,096.
The key property: similar content produces similar embeddings. The phrase “Australian small business” and “AU SMB” have embeddings that are mathematically close (high cosine similarity). The phrase “Australian small business” and “lemon meringue pie” have embeddings that are far apart.
This is the trick behind semantic search, recommendation systems, and the retrieval step of most RAG implementations.
How they’re generated
A specialised neural network (an embedding model) reads your text and outputs the vector. OpenAI, Anthropic, Cohere, Voyage AI all sell embedding APIs. Open-source options (BGE, MiniLM, Sentence-Transformers) also exist.
You generate an embedding for each chunk of your knowledge base, store the vectors in a database (Pinecone, Weaviate, Chroma, or just Postgres + pgvector), and at query time:
- Generate the embedding for the user’s query
- Find the chunks whose vectors are closest to the query vector
- Return those chunks (often to feed into an LLM for RAG)
When you’ll use them in AU SMB work
If you’re building:
- Internal knowledge bases (“ask our docs anything”)
- Product search that handles synonyms (“hemp face cream” matches “cannabis sativa moisturiser”)
- Similar-content recommendation (“you read X, you might like Y”)
- Customer support that surfaces relevant past tickets
…you’re probably using embeddings under the hood.
For most operational tasks (Xero analysis, draft generation, document summarisation), you do not need embeddings. Just paste the content into the model’s context window. Embeddings come in when you have more content than fits.
Cost
Embedding generation is cheap, typically $0.02-0.13 USD per million tokens. You generate embeddings once (or whenever content changes) and reuse them at query time.
The cost of running a vector database is usually higher than the embedding generation itself, budget $20-50 USD/month for managed Pinecone or similar, or self-host on Postgres for near-zero marginal cost.
Related terms
Want this built for your business?
Book a free 30-minute AI audit. We'll map your business and show you exactly which systems we'd build first. No pitch deck, no scoping fee.
Book my free AI audit