← All Models This is a small GPT-style language model I built from scratch following Andrej Karpathy’s
Zero to Hero series. It’s a character-level
transformer — 4 layers, 4 attention heads, 256-dimensional embeddings — trained
on the complete works of Shakespeare (~1MB of text).
The Zero to Hero series is an excellent intro to ML and building out models in
PyTorch. I coded along with Karpathy, and it took me around a month to get through
the whole series. I highly recommend it, but I would definitely say you need to
code along to get the most out of it.
My implementation ended up diverging slightly because I was coding along in the
REPL rather than a Colab or Jupyter notebook, so needed to expose certain methods
and organise the code slightly differently. The substance is the same, though.
Try it out
Here is the model running entirely in your browser via ONNX Runtime.
It’s around 12MB of weights loaded into WebAssembly.
You’ll need to let it load if it’s the first time! Subsequent runs are using the
cache though, so should be quick.
The model uses character-level vocab, so has a vocab of 65 characters: letters in
lower and upper case, plus whitespace and punctuation and so on. Each character is
a token with a corresponding numeric ID, and it’s these IDs that the model learns
to predict. The context window is 128 tokens / characters.
It works well, given it’s just predicting the next character — you can see it
clearly outputs Shakespeare-ish text, and you also get a flavour of the iambic
pentameter.
Something that is particularly nice about this choice of dataset is that it
looks superficially like a chat. In the real world, models like GPT are fine-tuned
on example chats between users and helpful assistants, and they learn to predict
the next token that would occur in a chat.
I’ll write up a detailed notebook on transformer architecture soon.
Training data
The model was trained on the
Tiny Shakespeare
dataset — a ~1MB file containing 40,000 lines of Shakespeare. Here’s a sample:
shakespeare.txt (training data)
First Citizen:
Before we proceed any further, hear me speak.
All:
Speak, speak.
First Citizen:
You are all resolved rather to die than to famish?
All:
Resolved. resolved.
First Citizen:
First, you know Caius Marcius is chief enemy to the people.
All:
We know't, we know't.
First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?
All:
No more talking on't; let it be done: away, away!
Second Citizen:
One word, good citizens.
First Citizen:
We are accounted poor citizens, the patricians good.
What authority surfeits on would relieve us: if they
would yield us but the superfluity, while it were
wholesome, we might guess they relieved us humanely;
but they think we are too dear: the leanness that
afflicts us, the object of our misery, is as an
inventory to particularise their abundance; our
sufferance is a gain to them Let us revenge this with
our pikes, ere we become rakes: for the gods know I
speak this in hunger for bread, not in thirst for revenge.
Second Citizen:
Would you proceed especially against Caius Marcius?
All:
Against him first: he's a very dog to the commonalty.
Second Citizen:
Consider you what services he has done for his country?
First Citizen:
Very well; and could be content to give him good
report fort, but that he pays himself with being proud.
Second Citizen:
Nay, but speak not maliciously.
First Citizen:
I say unto you, what he hath done famously, he did
it to that end: though soft-conscienced men can be
content to say it was for his country he did it to
please his mother and to be partly proud; which he
is, even till the altitude of his virtue.
Second Citizen:
What he cannot help in his nature, you account a
vice in him. You must in no way say he is covetous.
First Citizen:
If I must not, I need not be barren of accusations;
he hath faults, with surplus, to tire in repetition.
What shouts are these? The other side o' the city
is risen: why stay we prating here? to the Capitol!
All:
Come, come.
First Citizen:
Soft! who comes here?
Second Citizen:
Worthy Menenius Agrippa; one that hath always loved
the people.
First Citizen:
He's one honest enough: would all the rest were so!
MENENIUS:
What work's, my countrymen, in hand? where go you
With bats and clubs? The matter? speak, I pray you.
First Citizen:
Our business is not unknown to the senate; they have
had inkling this fortnight what we intend to do,
which now we'll show 'em in deeds. They say poor
suitors have strong breaths: they shall know we
have strong arms too.
MENENIUS:
Why, masters, my good friends, mine honest neighbours,
Will you undo yourselves?
First Citizen:
We cannot, sir, we are undone already.
MENENIUS:
I tell you, friends, most charitable care
Have the patricians of you. For your wants,
Your suffering in this dearth, you may as well
Strike at the heaven with your staves as lift them
Against the Roman state, whose course will on
The way it takes, cracking ten thousand curbs
Of more strong link asunder than can ever
Appear in your impediment. For the dearth,
The gods, not the patricians, make it, and
Your knees to them, not arms, must help. Alack,
You are transported by calamity
Thither where more attends you, and you slander
The helms o' the state, who care for you like fathers,
When you curse them as enemies.
Architecture summary
The model is a decoder-only transformer with:
- 65-character vocabulary — every unique character in the training set (letters, punctuation, whitespace)
- 4 transformer blocks, each with multi-head self-attention + feedforward
- 4 attention heads per block (64-dim per head)
- 256-dimensional embeddings (token + positional)
- 128-token context window
- ~5M parameters total
Built with PyTorch and exported to ONNX for browser inference.