Hey! I’ve been thinking about a special Christmas gift for my subscribers. How about the sixth chapter of my upcoming The Hundred-Page Language Models Book which I just put online (in addition to the other five chapters)?
In this chapter, you’ll read about the Transformer architecture, exploring:
The decoder block
Self-attention
Multi-head attention
Rotary position embeddings (RoPE)
Residual connections
Root mean square normalization (RMSNorm)
You’ll find plenty of math, illustrations, and Python code. By the end, you’ll have trained your own Transformer-based language model from scratch.
What better way to spend the holidays than by learning something new from a fun to read book?
Enjoy and Happy Holidays!