How Small Can an AI Brain Be and Still Actually Learn?

Chapter 1 - The Big Idea

Imagine hiring someone to memorize and truly understand an encyclopedia. You could hire a genius with a large brain, or you could give a smaller brain a perfectly organised, step-by-step version of the same encyclopedia. Which matters more - the size of the brain, or the quality of the book?

Idea: That's the core of what we're studying. An AI model's "parameters" are its brain cells. Training "tokens" are the words it reads. We want to know: what is the smallest possible brain that can truly understand a given amount of information?

This question has never been formally answered below one million parameters. We're starting with just ten thousand. That's not a typo. And it's deliberate - it puts us in completely uncharted territory where every experiment runs in minutes on a laptop.

The Two Questions We're Asking

Question A - Forward

"I have this much data. How small can my AI be?"

You want an AI that knows everything about tax law. How many parameters does it actually need to genuinely understand that domain?

Question B - Inverse

"My AI has only 10K params. How much can it learn?"

You're building an AI for a tiny edge device. What's the maximum knowledge you can squeeze in, and how should you write that knowledge?

Target: Why this matters in the real world

If we can answer these questions, we can build dramatically smaller, cheaper, faster AIs for specific tasks. A tax assistant AI doesn't need to know how to write poetry. If we know the exact minimum size it needs to understand the law, that changes how we build and deploy AI everywhere.

What "Understanding" Means in This Experiment

"Understanding" is a fuzzy word, so we pin it down with four concrete, measurable tests. A model only counts as having truly understood its training data if it passes all four:

It answers questions it has never seen

Not regurgitating memorised answers - genuinely applying what it learned to new combinations and situations.

It reasons across multiple steps

Chaining facts together: 'A is bigger than B, B is bigger than C, therefore A is bigger than C.'

It knows what it doesn't know

Confident on things it learned, appropriately uncertain on things outside its training. This is called calibration.

Its knowledge transfers to adjacent tasks

Its internal representations are good enough that a tiny 500-parameter add-on can learn a related task very fast.