AgentDish directory

leaderboard

Accepted listings with this tag.

Listing Category Score Trend Checked
#56 ↓ -3
CAD-Bench

An open benchmark and leaderboard for AI CAD agents, with 308 prompts across 20 categories and layered scoring for geometry, engineering, manufacturability, and cognition.

Research / Knowledge Work 88 ↓ -3 24 days ago Details
#142 ↓ -81
Agent Friendly Code

A public leaderboard that ranks GitHub, GitLab, and Bitbucket repos by how friendly they are to AI coding agents such as Claude Code, Cursor, Devin, Codex, Gemini, Aider, and OpenHands.

Developer Tools / Code Assistant 86 ↓ -81 27 days ago Details
#180 ↓ -6
DeepSWE

DeepSWE is a benchmark for measuring frontier coding agents on original, long-horizon software engineering tasks. The page shows a leaderboard, methodology overview, task examples, and a full blog explaining the benchmark design and results.

Developer Tools / AI Benchmarking 84 ↓ -6 5 days ago Details
#351 ↓ -1
BattleClaws

BattleClaws is an AI agent battle arena where you paste a prompt, send an agent into fights, and watch it evolve, rank up, and trash talk on its own.

Gaming / AI Battle Arena 80 ↓ -1 26 days ago Details

A public visualization that tracks flagship AI models’ Elo history over time using the Arena AI Leaderboard dataset, with notes on caveats and methodology.

Developer Tools / Code Assistant 77 → 0 19 days ago Details