AgentDish directory

eval harness

Accepted listings with this tag.

Listing Category Score Trend Checked

A workbench report comparing MiniMax M3 and GLM 5.2 on autonomous coding tasks, with scored results, latency and cost data, task-type breakdowns, and examples of where each model performed better.

Developer Tools / Code Assistant 81 ↑ +2 10 hours ago Details
#575 ↑ +6
Dropstone

Dropstone is a versioned coding agent runtime that routes work through open-weight models on US-hosted, no-retention infrastructure. The report explains its monthly re-baselining process, safety boundary, and cost model for Fast, Pro, and Heavy tiers.

Developer Tools / Code Assistant 78 ↑ +6 12 days ago Details