AgentDish directory

AI benchmarking

Accepted listings with this tag.

Listing	Category	Score	Trend	Checked
#539 ↓ -6 DeepSWE DeepSWE is a benchmark for measuring frontier coding agents on original, long-horizon software engineering tasks. The page shows a leaderboard, methodology overview, task examples, and a full blog explaining the benchmark design and results.	Developer Tools / AI Benchmarking	84	↓ -6	51 days ago	Details
#890 ↑ +2 AdvertBench AdvertBench is a web app for ranking AI-generated image ad sets with Elo voting. The page shows head-to-head ad comparisons, a leaderboard, and a sample prompt for generating ads, making the product purpose easy to understand.	Developer Tools / Code Assistant	79	↑ +2	27 days ago	Details
#1011 → 0 Arena AI Model Elo History A public visualization that tracks flagship AI models’ Elo history over time using the Arena AI Leaderboard dataset, with notes on caveats and methodology.	Developer Tools / Code Assistant	77	→ 0	65 days ago	Details