Tap nodes to trace data flow. Switch topics below to explore how LLMs, Agentic AI, MCP, ML Pipelines, Vector DBs and MLOps actually work — visually.
→ How it works:How a large language model transforms a prompt into an output token — tokenisation, attention, feed-forward layers and the autoregressive loop.
Self-attention lets each token attend to every other token simultaneously — O(n²) complexity but massively parallelisable on GPUs.
Multi-head attention splits the embedding space into subspaces, each learning different relationship types (syntax, semantics, coreference).
RAG retrieves relevant docs at inference time — no weight updates, always fresh data, but adds latency.
Fine-tuning bakes knowledge into weights — faster inference, but requires retraining as data changes. Combine both for best results.
Function calling is proprietary per-model JSON schema; MCP is a universal open protocol any model can speak.
MCP servers expose *resources* (persistent data) and *tools* (actions) over stdio or SSE — no vendor lock-in.
ReAct (Reason + Act) interleaves thinking traces with tool calls. Chain-of-thought happens before every action.
Reflexion adds an evaluator that writes verbal feedback to memory so the next iteration avoids past mistakes.
HNSW (Hierarchical Navigable Small World) graphs enable sub-millisecond approximate nearest-neighbour search at billion scale.
Hybrid search combines dense vector similarity with sparse BM25 keyword scores — critical for real-world RAG precision.
Feature stores decouple feature computation from training/serving, eliminating training-serving skew — the #1 silent killer in prod ML.
Shadow deployments run new models in parallel receiving live traffic without serving their outputs — risk-free A/B testing.