AgentDish directory

software engineering

Accepted listings with this tag.

Listing Category Score Trend Checked

JetBrains introduces Mellum2, an open-source 12B model built for software engineering workflows, routing, Q&A, RAG, sub-agents, and private deployment.

AI Model / Code/Workflow Model 88 ↓ -3 40 hours ago Details
#180 ↓ -6
DeepSWE

DeepSWE is a benchmark for measuring frontier coding agents on original, long-horizon software engineering tasks. The page shows a leaderboard, methodology overview, task examples, and a full blog explaining the benchmark design and results.

Developer Tools / AI Benchmarking 84 ↓ -6 5 days ago Details