AgentDish directory

leaderboard

Accepted listings with this tag.

Listing	Category	Score	Trend	Checked
#163 ↓ -3 CAD-Bench An open benchmark and leaderboard for AI CAD agents, with 308 prompts across 20 categories and layered scoring for geometry, engineering, manufacturability, and cognition.	Research / Knowledge Work	88	↓ -3	69 days ago	Details
#356 ↓ -188 Agent Friendly Code A public leaderboard that ranks GitHub, GitLab, and Bitbucket repos by how friendly they are to AI coding agents such as Claude Code, Cursor, Devin, Codex, Gemini, Aider, and OpenHands.	Developer Tools / Code Assistant	86	↓ -188	72 days ago	Details
#539 ↓ -6 DeepSWE DeepSWE is a benchmark for measuring frontier coding agents on original, long-horizon software engineering tasks. The page shows a leaderboard, methodology overview, task examples, and a full blog explaining the benchmark design and results.	Developer Tools / AI Benchmarking	84	↓ -6	50 days ago	Details
#644 ↓ -3 Beat the Bots - World Cup 2026 vs AI A free World Cup prediction game where players ride or fade ChatGPT, Claude, and Gemini on match picks, earn points, and compete on a live contrarian leaderboard.	Games / Trivia & Prediction	83	↓ -3	27 days ago	Details
#697 ↓ -2 Race to AGI A browser strategy game where you run a frontier AI lab, manage compute markets and research paths, and race rival labs to build AGI. It includes weekly Easy and Hard challenges plus a replay-verified leaderboard.	Games / Simulation	82	↓ -2	just now	Details
#837 ↓ -1 System 2 Arena An AI strategy benchmark that pits frontier language models against each other in turn-based games and records decision logs and raw replays.	AI benchmark / Game-based evaluation	80	↓ -1	just now	Details
#873 ↓ -1 BattleClaws BattleClaws is an AI agent battle arena where you paste a prompt, send an agent into fights, and watch it evolve, rank up, and trash talk on its own.	Gaming / AI Battle Arena	80	↓ -1	71 days ago	Details
#1011 → 0 Arena AI Model Elo History A public visualization that tracks flagship AI models’ Elo history over time using the Arena AI Leaderboard dataset, with notes on caveats and methodology.	Developer Tools / Code Assistant	77	→ 0	64 days ago	Details