✦ Interactive Visual Guide

AI Stack Explained

Tap nodes to trace data flow. Switch topics below to explore how LLMs, Agentic AI, MCP, ML Pipelines, Vector DBs and MLOps actually work — visually.

LLM Internals

→ How it works:How a large language model transforms a prompt into an output token — tokenisation, attention, feed-forward layers and the autoregressive loop.

Animated data flowFeedback / return pathTap / hover a node

Transformers & Attention

Self-attention lets each token attend to every other token simultaneously — O(n²) complexity but massively parallelisable on GPUs.

Multi-head attention splits the embedding space into subspaces, each learning different relationship types (syntax, semantics, coreference).

RAG vs Fine-tuning

RAG retrieves relevant docs at inference time — no weight updates, always fresh data, but adds latency.

Fine-tuning bakes knowledge into weights — faster inference, but requires retraining as data changes. Combine both for best results.

MCP vs Function Calling

Function calling is proprietary per-model JSON schema; MCP is a universal open protocol any model can speak.

MCP servers expose *resources* (persistent data) and *tools* (actions) over stdio or SSE — no vendor lock-in.

Agentic Loops

ReAct (Reason + Act) interleaves thinking traces with tool calls. Chain-of-thought happens before every action.

Reflexion adds an evaluator that writes verbal feedback to memory so the next iteration avoids past mistakes.

Vector Databases

HNSW (Hierarchical Navigable Small World) graphs enable sub-millisecond approximate nearest-neighbour search at billion scale.

Hybrid search combines dense vector similarity with sparse BM25 keyword scores — critical for real-world RAG precision.

MLOps Fundamentals

Feature stores decouple feature computation from training/serving, eliminating training-serving skew — the #1 silent killer in prod ML.

Shadow deployments run new models in parallel receiving live traffic without serving their outputs — risk-free A/B testing.

Back to Blog