Agents deep dive

Claude agents for Australian small business: when to build one, when not to, the five we ship most, and the AUD economics

The honest Australian SMB deep dive on Claude agents in 2026. The taxonomy (chat vs Project vs script vs scheduled agent vs multi-agent), the decision tree for build vs stay-with-chat, five real AU SMB build walkthroughs with AUD costs, the five-stage build progression, the traps to avoid, and how this ties to security and Skills.

Jenn Director, DotVA + Editor, On Autopilot · Melbourne Published 19/05/2026 · Updated 19/05/2026 · 26 min read

Key takeaways

Most Australian small businesses don't need an agent in their first 6 months of AI use. They need chat, Projects, and free-guide-style playbooks. Agents come after the manual workflow is producing daily value and you find yourself running the same prompt 5+ times a week for a month.
The honest agent taxonomy: chat (passive), Project (persistent context), one-shot script (you trigger it), scheduled agent (runs on cron / event without you), multi-agent orchestrator (rare, expensive, mostly wrong for SMB scale). Most AU SMBs that need 'an agent' actually need a scheduled script.
The 5 we ship most often via DotVA, in order of frequency: (1) overnight customer service triage, (2) inventory low-stock monitor for Shopify, (3) lead enrichment + outreach drafting, (4) weekly briefing producer, (5) document processor for bookkeeping. Each has a typical AUD cost band, typical time-to-build, and a typical first-month outcome.
Agents cost 3-15x what chat costs per equivalent task because they call the model multiple times in a loop. The math works when the alternative is a human's hourly rate. The math fails when the human alternative is already cheap (the agent is overkill) or when the agent runs unbounded (cost blowout). Always wire a budget cap.

In short

The Australian SMB deep dive on Claude agents in 2026. Most operators don’t need an agent in their first 6 months. They need chat, Projects, and the playbooks in our free guides. Agents come after the manual workflow is producing daily value and you find yourself running the same prompt 5+ times a week for a month. This piece is the honest taxonomy (chat vs Project vs script vs scheduled agent vs multi-agent), the build-vs-don’t decision tree, five real AU SMB build walkthroughs with AUD costs, the five-stage progression most operators should follow, the traps to avoid, and how it ties to the security flagship.

Why this piece exists

Two patterns dominate the agent conversation in 2026, and both are wrong for most Australian small businesses.

Pattern one: the gold-rush. “Build an agent for everything. AI does the work. You collect the time savings.” Sold heavily by international consultancies. Almost always over-engineered for AU SMB scale.

Pattern two: the avoidance. “Agents are too complex, too expensive, too risky. Stay with chat.” Common defensive crouch. Costs the operator the genuine productivity gains agents do unlock at the right tier.

The honest middle: agents work for some specific recurring workflows when the manual chat habit is already producing value and you’ve hit the “I’m doing this same prompt 8 times a week” threshold. This piece is the practical map of when, what to build, how, and what it costs.

Part 1: The honest agent taxonomy

The word “agent” in 2026 covers a wide range of things. Five tiers, friction-decreasing:

Tier 0: Chat (no agent)

You open Claude.ai. You ask a question. You read an answer. You close the chat.

This is not an agent. It’s a passive request-response interaction. Most AU SMB AI use sits here, correctly.

Tier 1: Project (persistent context, still passive)

You set up a Claude Project. Voice file + knowledge files load at the start of every chat. You still open a chat, ask, read, close.

Not an agent either. Just chat with better context. Most paying Pro users should be here.

Tier 2: One-shot script (you trigger it, it runs and stops)

You write a script (or have one built) that calls the Claude API with a specific prompt, possibly with tool access, possibly multi-step. You run it manually. It does its thing. It stops.

Examples: a script that takes your last 30 customer emails as input and outputs draft replies as a markdown file. A script that takes your Xero export and outputs categorisation suggestions. A script that takes your week’s notes and outputs a structured weekly briefing.

This is the first thing many operators call an “agent”. It is agent-shaped (multi-step, tool-using) but you’re the trigger.

Tier 3: Scheduled agent (runs on cron / event without you)

You take the Tier 2 script and wire it to run on a schedule (daily at 6am AEST) or in response to an event (a new email lands, a new Shopify order, a new Calendly booking).

Now it’s a real agent: it operates without a human in the seat. The human comes back at 9am to read the queue of outputs and approve / send / action.

This is where most useful SMB agents sit. Not multi-agent orchestrators. Not real-time customer-facing chatbots. Single-purpose scheduled flows with human-in-the-loop at the output.

Tier 4: Real-time / customer-facing agent

A scheduled agent flips to real-time when it has to respond to a user in seconds (a website chatbot, a phone assistant, a customer-facing booking flow).

Real-time agents add three structural complications: latency budgets, public-facing trust/security boundary, prompt injection exposure. The cost and risk profile jumps significantly.

For most AU SMBs, the right answer here is: don’t build a fully autonomous real-time agent until the scheduled version has been in production for 3+ months and you understand the actual failure modes.

Tier 5: Multi-agent orchestrator

An orchestrator agent coordinates multiple sub-agents (research agent + drafting agent + reviewer agent + publisher agent). Conceptually elegant; operationally expensive and brittle at SMB scale.

We have shipped exactly two true multi-agent systems across all DotVA work. The rest of what looks orchestrator-like is just single-agent flows with branching prompts. Multi-agent is the wrong default; start single.

Part 2: The build-vs-don’t decision tree

Before you build, run this decision tree. If any of the five questions returns a no-build signal, stop.

Question 1: Have you done this manually 5+ times a week for at least a month?

Why it matters: Manual gives you the prompt, the edge cases, the realistic input shape. Without that experience, you build the wrong agent.

No-build signal: You’ve done it twice and decided you need an agent. Almost always wrong. Do it manually for a month. The agent you’d build before vs after that month is radically different and usually better after.

Question 2: Are the inputs bounded and reasonably consistent?

Why it matters: Agents handle the body of the distribution well, the tail badly. If your inputs are wildly varied (customer-supplied PDFs that could be invoices, receipts, contracts, brochures, or blank), the agent will fail more often than it succeeds.

No-build signal: “It depends” is the answer to “what does the input look like?”. You need a narrower scope first, or a pre-processing layer that normalises inputs.

Question 3: Can you check the output before it acts?

Why it matters: Agents that act on the world (send emails, post to social, move money, update records) need a human approval step until you have reason to trust them. The trust comes from observing the output for at least a month of supervised use.

No-build signal: You want to wire the agent to act without review immediately. That’s not an agent; that’s a liability waiting for the Air Canada moment.

Question 4: Does the math work?

Why it matters: Agents have real costs: API calls per run, infrastructure, your time monitoring. The savings have to exceed the costs by a comfortable margin to be worth shipping.

Quick math: if the agent saves you 30 minutes/day at your effective hourly rate (call it $80 AUD), it saves $40/day or $1,000/month. If the agent costs $100/month in API + $300/month in monitoring, it’s net $600/month positive. Worth building. If the savings are $5/day and the agent costs $100/month, don’t build.

No-build signal: You can’t articulate the math, or the savings are below 2x the cost.

Question 5: Can you afford the first-month monitoring overhead?

Why it matters: Agents drift. Inputs change shape. API providers change defaults. Monitoring catches drift before it causes damage. The first month requires daily review; from month 2 onwards weekly review is fine. From month 6 monthly.

No-build signal: You can’t commit to daily review for the first month. The agent will break, you’ll miss it, the damage compounds.

If all five questions return build-signal, proceed.

Part 3: The five we ship most often

Across 50+ DotVA implementations, these five agent shapes account for ~70% of what we build. In approximate order of frequency:

Agent 1: Overnight customer service triage

Who buys it: Service businesses (cafes, allied health, beauty, salons, dental, vet) with 20-100 customer emails / DMs / form submissions overnight or while the operator is offline.

What it does: At 6am AEST daily, the agent reads the overnight inbox. For each message: classifies intent (booking, enquiry, complaint, supplier, spam), drafts a reply in the operator’s voice, prioritises by urgency. Queues the lot for the operator’s 9am review. Operator spends 10-15 minutes reading + approving instead of 60-90 minutes writing from scratch.

Tools / MCP needed:

Gmail or Outlook MCP server (read inbox)
The operator’s voice file as Project context
Optional: CRM MCP for customer history

Typical AUD cost band:

Setup: DIY $0 + your time (8-12 hours); productised package $497-$1,500 AUD; bespoke $2,000-5,000 AUD
Run cost: $30-80 AUD/month in API + $20-50 AUD/month infrastructure if hosted

Time to build: 4-8 hours DIY with Claude Code; 1-2 days with an agency.

First-month outcome: Operator saves 5-8 hours/week on inbox. The bigger win: every customer gets a same-business-day reply, not the “we’re behind on email, sorry” pattern.

What to watch for:

Drift in tone, review the voice file monthly
New email categories the agent didn’t see in training (new partnership offers, new vendor approaches)
Customers who switch to using your AI replies adversarially (rare but real)

Agent 2: Inventory low-stock monitor for Shopify

Who buys it: Shopify operators with 50-500 SKUs, especially those with seasonal stock or fast-moving items.

What it does: Twice daily (8am + 5pm AEST), the agent reads Shopify inventory via API or MCP. For each SKU: checks current stock against reorder threshold, projects days of cover at recent sales velocity, flags anything heading to stockout in the next 5 business days. Sends an alert via email or Slack with reorder suggestions + supplier contact + draft purchase order.

Tools / MCP needed:

Shopify Admin API or MCP
Email / Slack MCP for alerts
The operator’s supplier list as Project context

Typical AUD cost band:

Setup: productised package $497 AUD setup; bespoke $1,500-3,000 AUD
Run cost: $15-40 AUD/month in API (very cheap; small structured inputs)

Time to build: 6-10 hours DIY; 1-2 days with an agency.

First-month outcome: Stockouts drop materially. We’ve watched Shopify operators move from 3-6 stockouts per month to 0-1 within 6 weeks of deploying.

What to watch for:

Seasonal demand spikes that overwhelm the simple velocity model
New SKUs missing from the agent’s threshold table (manual addition required)
API rate limits if you have many SKUs (Shopify imposes them)

Agent 3: Lead enrichment + outreach drafting

Who buys it: B2B service businesses (recruitment, financial planning, agencies, mortgage brokers, consultants) with 5-30 inbound leads per week from forms or referrals.

What it does: When a new lead lands (via form submission, Calendly booking, email enquiry), the agent enriches with public data (Clearbit, Apollo, LinkedIn snippet, ABR lookup for Australian businesses), then drafts a personalised outreach email tailored to the specific lead context. Queues for human approval before send.

Tools / MCP needed:

Email + CRM MCP (Pipedrive, HubSpot, or Notion)
Clearbit / Apollo / similar enrichment API (or ABR for AU-specific)
Voice file as Project context

Typical AUD cost band:

Setup: productised $1,500 AUD; bespoke $3,000-6,000 AUD
Run cost: $40-150 AUD/month in API + $50-200 AUD in enrichment service subscriptions

Time to build: 12-25 hours DIY; 3-5 days with an agency.

First-month outcome: Outreach response rate typically lifts 30-80% (personalisation matters). Volume of leads worked through doubles or triples because the friction drops.

What to watch for:

Hallucinated facts about the lead, always human-review before send
Enrichment data going stale (especially job titles)
AU Privacy Act considerations on enrichment data (especially if scraping)

Agent 4: Weekly briefing producer

Who buys it: Solo operators or small-team CEOs who want a structured weekly review without doing it manually.

What it does: Every Sunday at 7pm AEST, the agent pulls: this week’s calendar (Google Calendar / Outlook), this week’s email summary (top senders, top threads), this week’s analytics (GA4 if a website, Stripe if e-commerce, Shopify if retail), this week’s social engagement, this week’s project status (Notion / Linear / Asana). Synthesises into a Monday-morning briefing: what shipped, what stalled, what’s looming, three priorities for the week ahead, the one decision the operator is avoiding.

Tools / MCP needed:

Calendar MCP, Email MCP, GA4 MCP (or Shopify / Stripe), social MCP
Project tool MCP (Notion / Linear / Asana)
The operator’s voice + business priorities as Project context

Typical AUD cost band:

Setup: productised $497-$1,500 AUD; bespoke $2,500-5,000 AUD
Run cost: $15-30 AUD/month in API (one run per week, cheap)

Time to build: 10-20 hours DIY; 2-3 days with an agency.

First-month outcome: Solo operators consistently report 2-3 hours of Monday-morning thinking compressed to 15 minutes of reading. The compounding insight is the bigger win.

What to watch for:

Data sources that update on different cadences (analytics lags 24-48 hours)
Privacy of the briefing (it contains business-sensitive synthesis, don’t email it to a personal address; encrypt at rest)
Operator skipping the Monday review and the briefing becoming noise

Agent 5: Document processor for bookkeeping / accounting

Who buys it: Bookkeepers, accountants, BAS agents managing 5-20 client businesses’ transaction coding.

What it does: Triggered when a new receipt or invoice lands in Hubdoc / Dext / shared drive, the agent: OCRs the document, extracts vendor / date / amount / line items, suggests Xero account code with one-line reasoning, flags edge cases or anomalies, drafts the Xero entry for the bookkeeper’s approval. Approval triggers actual posting to Xero via MCP.

Tools / MCP needed:

Document OCR (Anthropic vision API handles most; Textract for harder edge cases)
Xero MCP for write-back
File watcher for trigger

Typical AUD cost band:

Setup: productised $1,500 AUD (per practice); bespoke $4,000-8,000 AUD
Run cost: $30-80 AUD/month per practice in API

Time to build: 15-30 hours DIY; 3-5 days with an agency.

First-month outcome: Bookkeepers save 30-90 minutes per client per month on transaction coding. Error rate stays equivalent (because the human still reviews) but throughput rises.

What to watch for:

Edge-case receipts (handwritten, multi-currency, partial damage), these get queued for full-manual handling
Hallucinated GST classification, always verify before posting
TPB disclosure obligations (we cover in our accountants guide)

Part 4: The five-stage build progression

Most agents we ship in 2026 follow this progression. Most operators try to skip stages and fail. The stages exist because each one teaches you what the next stage actually needs.

Stage 0: Manual prompt

Open Claude.ai. Run the prompt manually. Do it for at least a month.

Goal: prove the prompt works, observe the edge cases, refine the voice.

Stage 1: Saved Project

Convert the manual prompt into a Claude Project with voice file + knowledge files. Run from the Project for another month.

Goal: verify the context loading produces better output. Refine the Project.

Stage 2: One-shot script

Convert the Project into a script (Python, JavaScript, whatever). You still trigger it manually. Add a budget cap and a logging hook.

Goal: programmatic repeatability. Catch any prompts that don’t translate cleanly outside the chat interface.

Stage 3: Scheduled agent

Add cron / scheduler. Add MCP tool access. Add human-in-the-loop approval for any output that acts on the world.

Goal: automated runs without you triggering. First month: monitor daily.

Stage 4: Production-grade (Agent SDK)

Migrate to Claude Agent SDK for proper agentic loop, budget enforcement, audit logging, retry logic. Add observability dashboards.

Goal: the agent that survives without your daily attention. Reach this by month 3-4.

Stage 5: Multi-agent orchestrator

(Most SMBs never reach here. Optional.) Decompose the agent into specialist sub-agents with an orchestrator. Justify the additional complexity with measurable outcomes.

Goal: scale. Only build this if Stage 4 has been running for 3+ months and you’ve identified specific bottlenecks decomposition would resolve.

The progression saves operators from the most expensive mistake we see: shipping a Stage 4 agent for a workflow you’ve never run manually. Without the manual phase, the agent embeds the wrong assumptions.

Part 5: The Anthropic stack for AU SMB agents

The mid-2026 reference stack:

Layer	What you use	Why
Model	Claude Sonnet 4.6 for most agents; Opus 4.7 for hardest reasoning	Sonnet is the cost-effective workhorse; Opus for tier-3 quality
Loop / orchestration	Claude Agent SDK	Production-ready, handles tool loop, budget caps, audit logs
Tool access	MCP (Model Context Protocol)	Standard for connecting to apps; official MCP servers from Anthropic, Google, GitHub etc.
Hosting	Hetzner Sydney box ($50 AUD/month) or AWS Lambda Sydney	AU data residency where required; Lambda for low-volume; box for control
Scheduling	Linux cron / Cloudflare Workers cron / Trigger.dev	Cron is free; managed services for reliability
Observability	Structured logs + Grafana / Datadog (optional at SMB scale)	Audit trail required for regulated work
Secret management	1Password CLI / AWS Secrets Manager / Doppler	Never commit API keys; rotate quarterly
Budget cap	Hard-coded daily AUD cap in the agent	Prevents runaway cost from a buggy loop

The cost of running this stack for one agent at AU SMB scale (5-100 inputs per day):

Hetzner Sydney box: $50 AUD/month (one box hosts multiple agents)
API: $30-200 AUD/month per agent depending on volume
Observability: $0-50 AUD/month (free tiers cover SMB)
Total: $80-300 AUD/month per agent

For most AU SMBs, one agent generates $1,000-5,000 AUD/month in time savings vs $80-300 AUD/month in cost. Net positive by 5-20x.

Part 6: The traps to avoid

Five real failure modes from our DotVA case base:

Trap 1: No budget cap

We’ve watched a single buggy agent burn $400 AUD in API calls in 90 minutes when an unbounded loop wasn’t caught.

Fix: every agent has a hard-coded daily AUD cap that aborts the run if exceeded. Implement in the agent’s main loop, not at the API provider level (provider caps are reactive and have minute-grain rate limits, not AUD-grain).

Trap 2: Tool sprawl

You give the agent access to “everything just in case”. 17 tools, 12 MCP servers, full filesystem access.

Fix: least privilege. Each agent gets only the tools required for its specific job. Add new tools only when the agent has demonstrated need.

Trap 3: No human-in-the-loop on consequential actions

You ship an agent that sends customer emails, posts to social, updates records, moves money, without human approval.

Fix: human approval gate for every action that touches the outside world, in the first 3 months minimum. After 3 months of supervised use, you can start auto-approving subsets where the agent has been 100% reliable.

Trap 4: No fallback / handoff path

The agent encounters an input it doesn’t know how to handle. It either guesses (bad) or silently fails (worse).

Fix: explicit “I’m not sure” classification. When the agent’s confidence drops below a threshold, route to human review with the specific reason flagged.

Trap 5: No observability

You ship the agent, it runs, and you have no idea what it’s doing.

Fix: structured logs at every step (input, model call, tool call, output). Audit log that’s queryable for “what did this agent do last Tuesday for customer X?”. The audit log is also your NDB-readiness evidence if something goes wrong.

Part 7: Security and Skills considerations

Two important sibling pieces.

Security

Every agent build inherits the 15 default-gap risks in our AI security flagship Part 2. Specifically: agents introduce three additional attack surfaces:

Prompt injection via inputs, if your agent reads emails / docs / web content, that content can contain malicious instructions
MCP server compromise, every MCP server you attach is a supply-chain dependency
Excessive blast radius, an agent with broad tool access can do significant damage if compromised

The mitigation patterns from the security flagship apply directly. Build the agent through the security-first kickoff prompt. Pre-deploy review every change. Red-team the agent monthly.

Skills

Claude Skills are reusable capability bundles. Agents and Skills are complementary: a Skill is “what the agent knows how to do well”; an Agent is “the loop that uses Skills to accomplish a job”. Most SMB agents we ship include 2-4 internal Skills (e.g. a customer service triage agent uses a “draft email” Skill + a “classify intent” Skill + a “summarise thread” Skill).

If you’re building agents you should also be building Skills as the reusable layer. The full treatment is in the Claude Skills flagship, including five copy-paste SKILL.md examples for the most common AU SMB patterns.

Part 8: The honest economics

When does an agent pay back?

Worked example: an overnight customer service triage agent for a 25-seat Brunswick cafe.

Item	AUD
Manual cost: 60 min/day of inbox at $50/hr effective rate	$1,500 / month
Agent setup (DotVA productised): one-off	$1,500
Agent API + infrastructure: ongoing	$80 / month
First-month monitoring time: 30 min/week at $50/hr	$400
Month 1 net	-$480 (still in setup)
Month 2 net	+$1,420
Month 3 net	+$1,420
Cumulative by month 6	+$5,500 AUD positive

Payback: between month 2 and month 3. Compounding from there.

The same math fails when: the manual task takes less than 30 minutes/day to begin with, or the operator isn’t going to monitor for the first month, or the agent’s API costs balloon because the inputs are larger than expected.

Always do this math before building. If the math doesn’t pencil at $80/hour rate, your time isn’t worth automating that workflow yet. Pick a different workflow.

Part 9: What this doesn’t solve

Be honest about limits.

Strategic decisions. Agents don’t make strategy. They execute on strategy you’ve decided.
Customer relationships. Agents draft. Humans relate.
Hard problems with high stakes. Anything legal / financial / clinical where the cost of a wrong answer is high, agents can assist but humans are accountable.
Edge cases. Agents work on the body of the distribution. The tails need humans.
The first time you’ve thought of a workflow. Run it manually for a month. Don’t skip the manual phase.

For the workflows that fit, agents are the highest-use AI investment most SMBs will make in 2026. Pick the right workflow, follow the five-stage progression, mind the five traps, ship.

Key takeaways

Most AU SMBs don't need an agent in their first 6 months. They need chat, Projects, and the free-guide playbooks. Agents come after you've done the same prompt 5+ times a week for a month.
The taxonomy that matters: chat (passive), Project (persistent context), one-shot script, scheduled agent, multi-agent orchestrator. Most useful SMB agents are scheduled scripts with human-in-the-loop, not multi-agent systems.
The five we ship most: customer service triage, inventory monitor, lead enrichment, weekly briefing producer, document processor. Each has predictable cost bands and time-to-build. Agency setup ranges $497-8,000 AUD; run cost typically $30-200 AUD/month.
Follow the five-stage progression: manual -> Project -> one-shot script -> scheduled agent -> Agent SDK production. Skip a stage, embed wrong assumptions, ship the wrong agent.
Five traps: no budget cap, tool sprawl, no human-in-the-loop, no fallback path, no observability. Each is preventable; each has caused real DotVA-case incidents we've learned from.

What’s next

AI security for Australian small business for the security overlay that every agent build inherits.
Claude for the not-quite-beginner for the Projects-as-foundation work that should precede any agent build.
Self-hosting AI in Australia if your agents need Tier 4 sovereignty.
Book a free 30-minute audit if you want help running the build-vs-don’t decision tree against your specific workflows.

Sources cited

Anthropic, Claude Agent SDK documentation (mid-2026 stable release)
Anthropic, Model Context Protocol (MCP) specification + official server catalogue
Anthropic, tool use and function calling documentation
ACSC Essential Eight Maturity Model (referenced via AI security flagship)
OAIC Notifiable Data Breaches scheme guidance (referenced via security flagship)
DotVA + On Autopilot internal agent build patterns across 50+ Australian SMB implementations (anonymised composite)

This piece will be updated as the Anthropic agent stack evolves. Last updated: 19/05/2026.

Common questions

What's the difference between Claude Code, an agent, and just a chat with tools enabled?

All three are agentic in 2026 terminology, with different friction levels. Chat with tools enabled is what claude.ai now offers in some modes: you ask, Claude calls tools, returns an answer. Claude Code is the local-terminal agent: it has filesystem access by default, multi-step task execution, loop logic built in. A 'true' production agent is the same Claude API in a loop you've wired up (or via the Claude Agent SDK), running on a server, triggered by schedule or event, with no human in the seat. The progression is friction-decreasing: chat (manual) -> Code (semi-automated) -> production agent (automated).

Do I need the Claude Agent SDK to build an agent?

No, but it makes production agents materially easier. The Claude Agent SDK is Anthropic's framework that handles the agentic loop (think-act-observe), tool integration, MCP server attachment, audit logging, and budget caps as first-class concerns. You can build the same thing from raw API calls; you'll just re-implement what the SDK gives you. For SMB scale, most ops we ship use the SDK with a thin custom wrapper. The SDK is in production-ready release as of mid-2026.

What's MCP and why does every agent piece mention it?

MCP (Model Context Protocol) is the standard way Claude (and other AI models) plug into your existing apps. It's what lets an agent read your Gmail, write to your Xero, query your Shopify, post to your Buffer. Without MCP, agents are limited to whatever's in their training data. With MCP, agents become operationally useful. The two patterns: official MCP servers (Anthropic, Google, GitHub, etc.), trust them; community MCP servers, vet them (see security flagship Part 1.4 for the supply chain risk).

How much does a typical SMB agent cost to run per month in AUD?

Depends on the agent. Light-use scheduled agents (overnight triage running once daily, processing 30-50 inputs): $15-50 AUD/month in API costs. Medium-use (multi-trigger, 100-300 inputs/day): $50-200 AUD/month. Heavy-use (real-time customer-facing chatbot, 1000+ interactions/day): $300-1500 AUD/month. Setup cost is separate: simple DIY $0 plus your time, agency-built $497-$2,000 AUD setup for our productised packages, custom builds $4-12k AUD.

I've heard about multi-agent systems and orchestrators. Should I build one?

Almost certainly not at SMB scale. Multi-agent systems are when you have one orchestrator agent coordinating multiple specialist sub-agents. They're conceptually elegant and operationally complex. The cost goes 3-5x. The failure modes proliferate. We've shipped exactly two true multi-agent systems across all DotVA work; the rest are single-agent flows that look orchestrator-like from the outside but are simpler underneath. Build simple, validate, scale only if you genuinely need the complexity.

When does an agent NOT make sense?

Five clear no-go signals. (1) You run the prompt manually less than 5 times a week, manual is faster than building. (2) The task has high-stakes legal/financial/clinical consequences and no good rollback path, keep human in the loop, or don't build. (3) The task requires real judgement on edge cases, agents are fine on the body of the distribution, bad on the tails. (4) The input source is unstable (file formats change, customer-supplied data is messy), the agent will break weekly. (5) You don't have time to monitor it for the first month, unmonitored agents drift, fail silently, or burn budget. If you can't afford the first-month monitoring overhead, don't build it yet.

Can I have an agent without writing code?

Increasingly yes, especially in 2026. Tier-1 no-code options (n8n, Make.com, Zapier with AI nodes) let you wire 'when X happens, ask Claude Y, then do Z' flows without writing a line of code. Limited compared to a real Agent SDK build, but enough for 60-70% of SMB agent use cases. The other 30-40% (complex multi-step reasoning, MCP integrations, real loop logic) still benefit from a developer + Agent SDK build. Path that works for non-developers: start with n8n / Make / Zapier; if you outgrow it, get a developer or an agency to graduate it.

What about Claude Code as 'the agent that builds the agent'?

Increasingly the right pattern. You use Claude Code to scaffold the production agent: spec the inputs, the outputs, the tools, the loop logic, the budget cap, the audit log. Claude Code generates the code, you review, deploy. The setup cost is dramatically lower than a from-scratch agency build. The output is usually 70-80% production-ready and benefits from a security review before deploying (see Part 6 of our AI security flagship). Most DotVA agent builds in mid-2026 follow this pattern: Claude Code scaffolds, human reviews + adds the safety rails, ships.

You'll be talking to Jenn, Director, DotVA + Editor, On Autopilot Replies within one business day, AEST. jenn@onautopilot.com.au

Want this built for your business?

Book a free 30-minute AI audit. We'll map your business and show you exactly which systems we'd build first. No pitch deck, no scoping fee.

Book my free AI audit

Or have us run it for you, end to end: On Autopilot is Australia's outsourced AI department.

Why this piece exists

Part 1: The honest agent taxonomy

Tier 0: Chat (no agent)

Tier 1: Project (persistent context, still passive)

Tier 2: One-shot script (you trigger it, it runs and stops)

Tier 3: Scheduled agent (runs on cron / event without you)

Tier 4: Real-time / customer-facing agent

Tier 5: Multi-agent orchestrator

Part 2: The build-vs-don’t decision tree

Question 1: Have you done this manually 5+ times a week for at least a month?

Question 2: Are the inputs bounded and reasonably consistent?

Question 3: Can you check the output before it acts?

Question 4: Does the math work?

Question 5: Can you afford the first-month monitoring overhead?

Part 3: The five we ship most often

Agent 1: Overnight customer service triage

Agent 2: Inventory low-stock monitor for Shopify

Agent 3: Lead enrichment + outreach drafting

Agent 4: Weekly briefing producer

Agent 5: Document processor for bookkeeping / accounting

Part 4: The five-stage build progression

Stage 0: Manual prompt

Stage 1: Saved Project

Stage 2: One-shot script

Stage 3: Scheduled agent

Stage 4: Production-grade (Agent SDK)

Stage 5: Multi-agent orchestrator

Part 5: The Anthropic stack for AU SMB agents

Part 6: The traps to avoid

Trap 1: No budget cap

Trap 2: Tool sprawl

Trap 3: No human-in-the-loop on consequential actions

Trap 4: No fallback / handoff path

Trap 5: No observability

Part 7: Security and Skills considerations

Security

Skills

Part 8: The honest economics

Part 9: What this doesn’t solve

What’s next

Sources cited

Common questions

Get the next one in your inbox

Want this built for your business?

Keep reading

AI security for Australian small business: the threats, the gaps in what Claude builds by default, and the playbook for shipping safely

Claude for the not-quite-beginner, the Australian small business follow-up

Self-hosting AI in Australia: Ollama, llama.cpp, and the data-residency play