A guide to transformer AI

How AI Actually Works

Forget the hype. Here's what's really going on inside ChatGPT, Claude, and every AI that's taken over the internet — from the raw mechanics to the real-world impact.

↓ SCROLL TO START ↓
The Core Mechanic

What Comes Next?

The fundamental idea powering every AI language model

Here’s the secret that powers every AI chatbot: all they’re really doing is predicting the next word. That’s it. When ChatGPT writes a poem or Claude explains quantum physics, neither one “understands” anything the way you do. They’re incredibly good at guessing which word should come next.

It’s like autocomplete on your phone, except trained on basically the entire internet and way more sophisticated.

→ Try it: predict the pattern

AI looks at patterns and predicts what comes next. Can you?

Key insight: AI doesn’t “think” or “know” things. It recognizes statistical patterns in language and generates text that looks like it was written by someone who does.
The Breakthrough

Paying Attention

How AI figures out which words matter most

Early AI language models were like reading through a straw — they could only look at the last few words at a time. That made them terrible at understanding context. Consider: “The trophy didn’t fit in the suitcase because it was too big.”

What does “it” refer to? You know instantly: the trophy. Because you paid attention to the right word. In 2017, researchers figured out how to make AI do the same thing. They called it the Attention Mechanism, and the AI architecture built around it became known as a Transformer — the technology behind ChatGPT, Claude, and basically every modern AI chatbot.

→ Interactive: click a word to see attention

Click any word to see which others it “pays attention to.” Brighter = more attention.

The research paper was literally titled “Attention Is All You Need” — and it turned out to be true. Before this, AI struggled with long text. With attention, it could connect ideas across an entire conversation. That’s why the jump from old chatbots to modern AI feels so dramatic.

Building Blocks

Breaking Language Apart

How AI chops text into pieces it can process

AI doesn’t read words the way you do. It breaks text into smaller pieces called tokens — the Lego bricks of language.

→ Try it: see how text gets tokenized

Type something and see it broken into tokens. Each color = a different token.

Fun fact: A “128K context window” means ~128,000 tokens — roughly a 300+ page book.
Rolling the Dice

Maybe This, Maybe That

How AI picks from millions of possible next words

AI calculates a probability for every token in its vocabulary — typically 50,000+ options. Then it picks one.

→ Interactive: next-word probabilities

Probabilities for what comes after “The weather today is really...”

Click resample to see different choices from the same probabilities.

Temperature controls creativity: low = predictable, high = creative but potentially weird.

The Whiteboard

The Context Window

How AI “remembers” your conversation (and why it forgets)

Every time you send a message, the AI receives your entire conversation so far as one giant input. This is the context window — like a whiteboard. When full, older parts fall off.

→ Visualized: the context window

Earlier messages fade as the window fills up.

Every conversation is independent — start a new chat and the AI has zero memory of previous ones.

Size Matters

Bigger and Bigger

Parameters, data, and compute

Models are defined by parameters — tiny numerical knobs adjusted during training. GPT-2 had 1.5B. Frontier models today: trillions.

→ Visualized: model scale

Each dot = 1 million parameters.

Training a frontier model can cost over $100 million in compute alone.

Getting Better

Oops, Try Again

How AI learns by getting things wrong

Show the AI a sentence with the last word hidden, ask it to predict, tell it the answer. If wrong, nudge its parameters. Do this trillions of times. The measure of “how wrong” is the loss.

→ Visualized: training loss over time

Watch loss decrease as the model trains.

This is gradient descent — the model follows the “downhill slope” of its errors.

The Human Touch

Good Job! (Human Feedback)

How AI learns to be helpful instead of just smart

RLHF — Reinforcement Learning from Human Feedback. Human reviewers pick which of two AI responses is better. The AI learns to produce more of the preferred kind.

→ Try it: be the human reviewer

Prompt: “Explain gravity to a 5-year-old.”

✓ YOUR PICK Response A:

You know how when you throw a ball up, it always comes back down? That’s gravity! The Earth is like a big magnet pulling everything toward it.

✓ YOUR PICK Response B:

Gravity is a fundamental force described by Einstein’s general relativity as the curvature of spacetime, governed by the field equations.

The Bigger Picture

Machine Learning Concepts

The core ideas behind all AI — not just chatbots

Transformers are just one branch of Machine Learning (ML) — the science of getting computers to learn from data. Here are the key concepts.

Here’s how these concepts connect in a typical ML pipeline:

→ Interactive: train a mini classifier

Click cells to label them as 🐱 Cat or 🐶 Dog. Watch the model’s confidence change.

Click once = Cat • Again = Dog • Again = Reset

🐱
🐶
Key takeaway: Machine learning isn’t magic — it’s math and data. Every AI system follows the same pipeline: collect data, build a model, train it, evaluate it, deploy it, keep improving.
What AI Can Do

AI Solutions & Capabilities

The different kinds of things AI makes possible

AI isn’t just chatbots. It’s a family of technologies, each solving different problems.

The key pattern: Most real products combine multiple capabilities. A self-driving car uses vision + prediction + autonomy. The magic happens when building blocks work together.
Using AI Right

Responsible AI

The principles that should guide how we build and use AI

AI can do incredible things — but “can” doesn’t mean “should.” AI needs guardrails, just like cars need road rules.

→ Interactive: the responsible AI checklist

Click each principle to check it off.

How responsible AI applies to each solution type:

Remember: Technology is never neutral. Every AI reflects the choices and biases of its builders. Responsible AI isn’t about slowing progress — it’s about making sure progress benefits everyone.
Eyes Open

The Dark Side

Real risks happening right now

Like any powerful technology, AI can be misused. These aren’t science fiction.

Bottom line: These are reasons to understand AI, not fear it. Push for smart rules and responsible use.
The Bright Side

Opportunities

How AI could genuinely make the world better

The opportunities are staggering. AI is a multiplier for human capability.

Your generation is the first to grow up with AI as a daily tool. Learn it, stay critical, use it responsibly — and help build the rules for a future where AI serves everyone.