blog.terminal — where silicon meets syntax

featured

★ featured

tutorialsJun 14, 202618 min

Inference, end to end

A visual walkthrough of LLM inference from prompt to token — prefill vs decode, the KV-cache, precision, speculative decoding, the memory hierarchy, and how the big labs serve at scale.

InferenceKV-cacheGPU

cat inference-end-to-end.md →

◆"if your timing closure strategy is 'hope', you need a new strategy."#silicon◆"every bug in RTL is a feature in the testbench."#silicon◆"the best code is the code you delete."#software◆"coffee is just a liquid PCB solvent for the brain."#life◆"version control is not optional. yes, even for HDL."#tutorials◆"0xDEADBEEF is a valid emotional state."#thoughts◆"premature optimization is the root of all evil. late optimization is the root of all rewrites."#software◆"if your timing closure strategy is 'hope', you need a new strategy."#silicon◆"every bug in RTL is a feature in the testbench."#silicon◆"the best code is the code you delete."#software◆"coffee is just a liquid PCB solvent for the brain."#life◆"version control is not optional. yes, even for HDL."#tutorials◆"0xDEADBEEF is a valid emotional state."#thoughts◆"premature optimization is the root of all evil. late optimization is the root of all rewrites."#software◆"if your timing closure strategy is 'hope', you need a new strategy."#silicon◆"every bug in RTL is a feature in the testbench."#silicon◆"the best code is the code you delete."#software◆"coffee is just a liquid PCB solvent for the brain."#life◆"version control is not optional. yes, even for HDL."#tutorials◆"0xDEADBEEF is a valid emotional state."#thoughts◆"premature optimization is the root of all evil. late optimization is the root of all rewrites."#software

// 0x01.

research.feeds

scheduled deep-research jobs. each feed runs on its own cadence and appends a new edition — a post + dashboard — every cycle.

active↻ weekly

~/research/

inference engines

weekly tracker of the LLM inference-engine landscape — vLLM, SGLang, TensorRT-LLM, LMDeploy, llama.cpp and friends. open the live dashboard →

2 editions

LAST RUN

W24 ✓

NEXT RUN

3d 04h 36m 50s

cd inference-engines →

active↻ weekly

~/research/

frontier models

weekly tracker of the AI model landscape — frontier closed/open, small & on-device, agentic frameworks, classifiers. open the live dashboard →

2 editions

LAST RUN

W24 ✓

NEXT RUN

0d 00h 00m 00s

cd frontier-models →

active↻ weekly

~/research/

inference silicon

weekly tracker of the chips powering inference — NVIDIA/AMD GPUs, wafer-scale, dataflow LPUs, inference ASICs, RISC-V. open the live dashboard →

2 editions

LAST RUN

W24 ✓

NEXT RUN

0d 04h 36m 50s

cd inference-silicon →

// 0x02.

entries

01 / 03

$grep/

tutorials

tutorialsfeatured

Inference, end to end

A visual walkthrough of LLM inference from prompt to token — prefill vs decode, the KV-cache, precision, speculative decoding, the memory hierarchy, and how the big labs serve at scale.

Jun 14, 202618 min

InferenceKV-cache

silicon

siliconfeatured

How Inference Chips Are Built

From transistors and dataflow to wafer-scale engines — a visual tour of how the silicon behind AI inference is designed, and why startups think they can unseat the GPU.

Jun 12, 202616 min

SiliconASIC

silicon

siliconfeatured

RTL to GDS: A Complete Walkthrough

From Verilog to tapeout — the full ASIC design flow broken down into digestible steps. Covering synthesis, P&R, timing closure, and the dark arts of DRC.

Apr 28, 202612 min

ASICRTL

software

softwarefeatured

Building a Cyberpunk Portfolio with Astro

How I turned terminal aesthetics and chip schematics into a personal site. Starfields, FSM diagrams, and unhealthy amounts of CSS.

Apr 15, 20268 min

AstroCSS

silicon

Neural Networks on FPGAs: A Practical Guide

Deploying quantized models on Xilinx Zynq. Covers HLS, resource budgeting, and why your first attempt will always be too slow.

Mar 30, 202615 min

FPGAAI

tutorials

My Vim Setup for ASIC Design

Mar 18, 20266 min

VimWorkflow

// 0x03.

bulletin

◇backlog3

FPGA neural net vs. GPU

Head-to-head latency and throughput on quantized inference.

↗ Neural Networks on FPG…

Migrate EDA scripts to Python 3.12

Exorcising the last Python 2 ghosts from the synthesis flow.

↗ Automating EDA Flows w…

Coffee shop tier list update

Re-ranking the local espresso spots. Methodology disputed.

↗ // devlog: ranking eve…

◆in_progress3

Timing closure on 7nm block

Chasing the last 40ps of setup slack across three clock domains.

↗ The Art of Timing Clos…

RISC-V branch predictor series

A multi-part deep dive on speculative fetch and recovery.

↗ Designing a Toy RISC-V…

Starfield performance on mobile

Profiling the canvas starfield to hold a steady 60fps on phones.

↗ Procedural Starfield w…

✓shipped2

vim + SystemVerilog LSP tutorial

Completion, lint and go-to-def for RTL right inside vim.

↗ My Vim Setup for ASIC …

RTL-to-GDS series, part 2

From gate-level netlist all the way to final tapeout.

↗ RTL to GDS: A Complete…

// subscribe.sh

get notified

new post alerts. no spam. unsubscribe anytime. probably monthly at best.