AgentDish directory
leaderboard
Accepted listings with this tag.
| Listing | Category | Score | Trend | Checked | |
|---|---|---|---|---|---|
|
#56
↓ -3
CAD-Bench
An open benchmark and leaderboard for AI CAD agents, with 308 prompts across 20 categories and layered scoring for geometry, engineering, manufacturability, and cognition. |
Research / Knowledge Work | 88 | ↓ -3 | 24 days ago | Details |
|
#142
↓ -81
Agent Friendly Code
A public leaderboard that ranks GitHub, GitLab, and Bitbucket repos by how friendly they are to AI coding agents such as Claude Code, Cursor, Devin, Codex, Gemini, Aider, and OpenHands. |
Developer Tools / Code Assistant | 86 | ↓ -81 | 27 days ago | Details |
|
#180
↓ -6
DeepSWE
DeepSWE is a benchmark for measuring frontier coding agents on original, long-horizon software engineering tasks. The page shows a leaderboard, methodology overview, task examples, and a full blog explaining the benchmark design and results. |
Developer Tools / AI Benchmarking | 84 | ↓ -6 | 5 days ago | Details |
|
#351
↓ -1
BattleClaws
BattleClaws is an AI agent battle arena where you paste a prompt, send an agent into fights, and watch it evolve, rank up, and trash talk on its own. |
Gaming / AI Battle Arena | 80 | ↓ -1 | 26 days ago | Details |
|
#420
→ 0
Arena AI Model Elo History
A public visualization that tracks flagship AI models’ Elo history over time using the Arena AI Leaderboard dataset, with notes on caveats and methodology. |
Developer Tools / Code Assistant | 77 | → 0 | 19 days ago | Details |