Last Updated on 2023-12-06 by Clay
Introduction
Although I reach AI and research it all day, but there are too many new architectures, breakout, theories... many things I just know a rough concepts and put it away. The almost deeper understand need to until I must to be adjust the model architecture.
Recently, the Large Language Model (LLMs) that have been the rage this past year are exactly what I'm talking about. Naturally, I know most of them are decoder-only structures, with a few being encoder-decoder, and I get that almost all are auto-regressive models.
But ask me to suddenly describe the architecture of Llama 2? Maybe I could scrape something together from memory about how it tweaks the original Transformer design.
Want me to suddenly dish on what makes Mistral special? I could chat about how its Sliding Window Attention (SWA) isn't exactly new, and that Mistral's secret sauce might lie in its high-quality dataset...
But ask me to detail the entire process of how data flows from input to the final LM head output in a model? oh no! I might sketch you a flowchart, but without Google, I can't even draw a detailed model architecture.
And then today, a colleague share me to a super cool website: LLM Visualization (bbycroft.net).
It's a site showcasing the classic GPT architecture in 3D! Not only does it have detailed 3D models, but the creator also put together a series of tutorials. By clicking the explanatory play buttons on the left side of the screen, you can animate the 3D models, simulating how data is processed after entering the model.
It's not just a clear view of the model's architecture, but also an easy way to grasp what these different models are actually doing.
Moreover, since it's a 1:1 scale of the original models, you can compare them side by side. Seeing the massive GPT-3 (with its 175 billion parameters) makes the nano-gpt look like a tiny speck of dust!
I tried running a few example animations and felt like I gained some nice insights. Plus, now I can finally throw this website in the faces of my families and friends, showing them what I tangle with every day.