AgentDish directory

speculative decoding

Accepted listings with this tag.

Listing Category Score Trend Checked

Google Developers Blog post about integrating DFlash, a diffusion-style speculative decoding framework, into the vLLM TPU ecosystem to improve LLM serving speed on TPU v5p.

Developer Tools / Code Assistant 78 ↓ -83 27 days ago Details

arXiv paper on a self-speculative decoding framework for speeding up reasoning LLM inference on edge hardware, with hardware co-design and reported speedups.

Research / AI/ML Paper 77 → 0 4 days ago Details