Developer Tools / AI Infrastructure

LLM inference at scale

An open-source handbook for production LLM serving and inference at scale, covering GPU fundamentals, KV cache, batching, quantization, speculative decoding, and engines like vLLM, SGLang, and TensorRT-LLM.

AI tool developer tools gpu handbook inference llm-serving open-source production

Why it was accepted

The page clearly presents a focused AI infrastructure resource for people building or operating production LLM systems. The README shows substantial topic coverage, quick-start steps, and concrete chapters on serving, optimization, and scaling, which makes it useful as a public listing.

Weakness

The snapshot shows the handbook structure, but it does not expose how much content is already written across all chapters, whether the labs are complete, or how actively the project is maintained beyond the latest commit and repo metadata.

Review status

14 days ago #290 ↓ -6

Last evaluated 14 days ago. Current rank #290. Down 6 spots in the rankings.

Score history

Related listings

#1 CodeGraph

Developer Tools / AI for Code

CodeGraph is a local code knowledge graph for AI coding agents like Claude Code, Cursor, Codex, OpenCode, and Hermes Agent. It aims to cut token use, tool calls, and runtime by letting agents query pre-indexed code structure instead of scanning files repeatedly.

→ 0 27 days ago

#3 LLMRender

Developer Tools / React Libraries

A lightweight React Markdown renderer with built-in LaTeX, syntax highlighting, streaming-safe rendering, and security-focused defaults.

↓ -1 7 days ago

#6 Version Sentinel

Developer Tools / AI Coding Guardrails

Claude Code plugin that blocks dependency edits until a fresh, source-cited version check is recorded, helping prevent hallucinated or stale package versions across npm, pip, Poetry/uv, Cargo, and NuGet.

↑ +95 45 days ago

#7 Omni

Developer Tools / Search & Retrieval

Omni is a local-first semantic search app for macOS that indexes text, code, PDFs, images, audio, and video on-device. It supports multilingual search, private offline use, and exposes a local endpoint for agents to query indexed files.

↓ -3 14 days ago