Reinforcement Learning from Human Feedback (RLHF), AI glossary

Reinforcement Learning from Human Feedback (RLHF) is one of the key techniques that turned base language models (which just predict plausible next tokens) into useful assistants (which try to be helpful, harmless and honest).

The shape:

Pre-train a base language model on a huge text corpus
Have humans rate or rank pairs of model outputs by quality, helpfulness, safety
Train a “reward model” to predict those ratings
Use reinforcement learning to fine-tune the base model toward responses that score higher on the reward model

Anthropic’s Claude family uses RLHF plus an extra technique called Constitutional AI, which uses a written constitution of principles the model is trained against (rather than relying purely on human ratings).

Why this matters in practice

The base language model is plausible but unaligned, it’ll happily make up facts confidently, give harmful instructions, or be unhelpfully verbose. RLHF is the bridge that takes “GPT-3 in 2020” and turns it into “ChatGPT in 2022”.

By 2026, the techniques have evolved (RLAIF, DPO, GRPO, constitutional methods, mechanistic interpretability-based training), but RLHF is still the canonical reference for “how AI assistants got trained to be assistants.”

What this means for Australian SMB use

You don’t interact with RLHF directly. But it’s why:

Modern Claude won’t help you commit fraud or scam customers
ChatGPT softens its tone in ways base GPT wouldn’t
Models hedge (“I can help with that, though you should verify…”) on regulated topics like tax and law
Frontier models in 2026 follow instructions much more reliably than 2022 models did

If you’re noticing that one model feels “too cautious” or “too eager” compared to another, you’re usually noticing the different RLHF + constitutional training approaches each lab uses.

Reinforcement Learning from Human Feedback(RLHF)

Why this matters in practice

What this means for Australian SMB use

Want this built for your business?

Why this matters in practice

What this means for Australian SMB use

Related terms

Get the next one in your inbox

Want this built for your business?

Keep reading

Structured outputs

Fine-tuning

In-context learning