AgentDish directory
eval harness
Accepted listings with this tag.
| Listing | Category | Score | Trend | Checked | |
|---|---|---|---|---|---|
|
A workbench report comparing MiniMax M3 and GLM 5.2 on autonomous coding tasks, with scored results, latency and cost data, task-type breakdowns, and examples of where each model performed better. |
Developer Tools / Code Assistant | 81 | ↑ +2 | 10 hours ago | Details |
|
#575
↑ +6
Dropstone
Dropstone is a versioned coding agent runtime that routes work through open-weight models on US-hosted, no-retention infrastructure. The report explains its monthly re-baselining process, safety boundary, and cost model for Fast, Pro, and Heavy tiers. |
Developer Tools / Code Assistant | 78 | ↑ +6 | 12 days ago | Details |