AgentDish directory

llm-inference

Accepted listings with this tag.

Listing Category Score Trend Checked
#44 ↓ -3
tiny-vllm

Open-source C++ and CUDA LLM inference engine inspired by vLLM, with a teaching-focused course that walks through model serving, batching, KV cache, and attention kernels.

Developer Tools / AI Inference / LLM Serving 88 ↓ -3 3 days ago Details

A research article from Applied Compute on how agentic, tool-using workloads differ from traditional LLM benchmarks, with production observations, workload profiles, and an open-source harness for replaying traces.

Research / Knowledge Work 87 ↓ -34 27 days ago Details
#98 ↑ +2
ZSE v2.0.0

A pure-Python LLM inference engine and server with CUDA/HIP/Metal code generation, OpenAI-compatible API support, built-in RAG, and multi-GPU backend support.

Developer Tools / AI / ML Infrastructure 86 ↑ +2 15 hours ago Details

Google Developers Blog post about integrating DFlash, a diffusion-style speculative decoding framework, into the vLLM TPU ecosystem to improve LLM serving speed on TPU v5p.

Developer Tools / Code Assistant 78 ↓ -83 27 days ago Details
#465 ↓ -12
vLLM-Compile

A public slide deck about vLLM-compile, a project focused on bringing compiler optimizations to LLM inference and speeding up torch.compile for vLLM workflows.

Developer Tools / Code Assistant 72 ↓ -12 27 days ago Details