Chapter 1 - The Big Idea
Imagine hiring someone to memorize and truly understand an encyclopedia.
You could hire a genius with a large brain, or you could give a smaller brain a
perfectly organised, step-by-step version of the same encyclopedia. Which matters
more - the size of the brain, or the quality of the book?
Idea: That's the core of what we're studying. An AI model's
"parameters" are its brain cells. Training "tokens" are the words it reads. We want
to know: what is the smallest possible brain that can truly understand a given amount of information?
This question has never been formally answered below one million parameters. We're
starting with just ten thousand. That's not a
typo. And it's deliberate - it puts us in completely uncharted territory where every
experiment runs in minutes on a laptop.
...
The Two Questions We're Asking
Question A - Forward
"I have this much data. How small can my AI be?"
You want an AI that knows everything about tax law. How many parameters does it actually need to genuinely understand that domain?
Question B - Inverse
"My AI has only 10K params. How much can it learn?"
You're building an AI for a tiny edge device. What's the maximum knowledge you can squeeze in, and how should you write that knowledge?
Target: Why this matters in the real world
If we can answer these questions, we can build dramatically smaller, cheaper,
faster AIs for specific tasks. A tax assistant AI doesn't
need to know how to write poetry. If we know the exact minimum size it needs to
understand the law, that changes how we build and deploy AI everywhere.
...
What "Understanding" Means in This Experiment
"Understanding" is a fuzzy word, so we pin it down with four concrete, measurable
tests. A model only counts as having truly understood its training data if it passes
all four:
1
It answers questions it has never seen
Not regurgitating memorised answers - genuinely applying what it learned to new combinations and situations.
2
It reasons across multiple steps
Chaining facts together: 'A is bigger than B, B is bigger than C, therefore A is bigger than C.'
3
It knows what it doesn't know
Confident on things it learned, appropriately uncertain on things outside its training. This is called calibration.
4
Its knowledge transfers to adjacent tasks
Its internal representations are good enough that a tiny 500-parameter add-on can learn a related task very fast.