How to built a small Language Model from scratch
DevBlog
Jun 1, 2026 ยท 2 min read ยท 27 views

Here's exactly how it works ๐งต
No APIs. No pre-trained weights. Just PyTorch and math.
Step 1 โ Feed it text
I grabbed 50,000 children's stories from a dataset called TinyStories (huggingface). Then I used a tokenizer to convert every word into a number.
"Once upon a time" โ [7406, 3504, 257, 640]
Computers don't read words. They read numbers.
Step 2 โ Build the brain
I coded a GPT Transformer from scratch using claude. It has 3 key ingredients:
โ Embeddings โ converts each number into a rich vector that captures meaning
โ Self-Attention โ lets every word look at every other word to understand context. This is literally how "bank" knows if you mean a river or money.
โ 6 stacked Transformer Blocks โ each one learns deeper patterns, from grammar all the way to storytelling
30 million parameters total. All built by hand.
Step 3 โ Teach it to predict
The training loop is brutally simple:
Show the model: "Once upon a" Model guesses the next word. Gets it wrong. Calculate how wrong (the Loss). Adjust every single weight slightly in the right direction. Repeat 5,000 times.
That's it. Grammar, story structure, common sense โ all of it emerges from this one loop.
Step 4 โ Run it on a GPU
I connected the code to Modal.com and trained on an NVIDIA A10G cloud GPU.
30 minutes. Less than $1.
The result?
"๐๐ฃ๐๐ ๐ช๐ฅ๐ค๐ฃ ๐ ๐ฉ๐๐ข๐, ๐ฉ๐๐๐ง๐ ๐ฌ๐๐จ ๐ ๐ก๐๐ฉ๐ฉ๐ก๐ ๐๐ค๐ฎ ๐ฃ๐๐ข๐๐ ๐๐๐ข๐ข๐ฎ. ๐๐ ๐๐๐ก๐ก ๐๐ค๐ฌ๐ฃ ๐๐ฃ๐ ๐๐ช๐ง๐ฉ ๐๐๐จ ๐ ๐ฃ๐๐. ๐๐๐จ ๐ข๐ค๐ข ๐๐๐ซ๐ ๐๐๐ข ๐ ๐๐๐ฃ๐-๐๐๐. ๐๐๐ข๐ข๐ฎ ๐๐๐ก๐ฉ ๐๐๐ฉ๐ฉ๐๐ง."
for codebase email to: amaanprogramming@gmail.com
A model I built myself is writing coherent stories.
If you want to truly understand AI โ don't just use the tools. Build them.