AgentDish directory

testing

Accepted listings with this tag.

Listing	Category	Score	Trend	Checked
#108 ↓ -3 Excalibur Excalibur is an open-source AI coding agent for product engineers. It runs from the terminal, supports multiple model providers, and covers the full workflow from discovery and planning through implementation, verification, review, and shipping.	Developer Tools / Code Assistant	88	↓ -3	12 days ago	Details
#170 ↓ -129 LLM-test-kit An open-source CLI for testing LLM prompts across consistency, latency, cost, and behavior, with HTML reports and support for OpenAI and Anthropic.	Developer Tools / Testing	88	↓ -129	73 days ago	Details
#230 ↓ -4 llm-mock Python package for recording real LLM API responses and replaying them in tests so LLM-driven code can run deterministically without live API calls.	Developer Tools / Testing	87	↓ -4	58 days ago	Details
#231 ↓ -4 dari-docs CLI for testing and improving documentation with simulated developer agents. It checks whether docs are clear enough for agents to complete real tasks, reports where they get stuck, and can generate proposed edits.	Developer Tools / Documentation	87	↓ -4	58 days ago	Details
#437 ↓ -6 Libretto Playwright PR Agents An open-source tool that watches Playwright failures, inspects the live page, and opens a GitHub PR with a proposed fix.	Developer Tools / Code Assistant	84	↓ -6	24 hours ago	Details
#440 ↓ -6 I Use My Browser Tools to Find Bugs in My Browser Tools Omnideck describes a self-testing browser-tool workflow where an AI agent runs daily checks on real websites, diagnoses failures against its own source code, and files GitHub issues with proposed fixes.	Developer Tools / Code Assistant	84	↓ -6	2 days ago	Details
#487 ↓ -6 AI DevOps Engine An open-source, self-hosted AI DevOps pipeline that ingests GitHub webhooks, generates code patches with an LLM, runs them in network-isolated Docker sandboxes, and posts validated fixes back as PR comments.	Developer Tools / AI DevOps / CI/CD	84	↓ -6	24 days ago	Details
#566 ↓ -6 JDS A Copilot skill suite that enforces structured coding workflows for AI-assisted development, with design, planning, TDD, debugging, verification, and cleanup steps plus a live task-graph visualization server.	Developer Tools / AI Coding Assistants	84	↓ -6	64 days ago	Details
#717 ↓ -2 Make No Mistakes An open-source enforcement layer for AI coding agents that adds frozen specs, tamper-detected tests, independent verification, and hard-blocking gates so unverified work cannot pass.	Developer Tools / AI Coding	82	↓ -2	12 days ago	Details
#746 ↓ -2 LainDOS A tiny single-tasking DOS clone written from scratch in x86 assembly, built to boot and run period games in emulators. The repo includes a detailed README, build instructions, docs, and automated QEMU regression testing.	Developer Tools / Operating Systems	82	↓ -2	36 days ago	Details
#808 ↑ +2 Faultsense agent A source-available JavaScript browser agent for asserting end-to-end behavior in real user sessions. The repo explains the annotation-based approach, how it runs in staging and production, and how to install and initialize it via CDN or npm.	Developer Tools / Testing	81	↑ +2	15 days ago	Details
#1014 ↑ +75 Agent Eval A GitHub repo for evaluating agentic AI pipeline systems, with guidance for defining metrics, building eval cases, running repeatable tests, and tracking regressions.	Developer Tools / Copywriting	77	↑ +75	73 days ago	Details
#1035 ↓ -1 Pure Effect Pure Effect is a small JavaScript library for building testable business logic as data-driven commands instead of relying on mocks. It includes primitives like Success, Failure, Command, Ask, Retry, and Parallel, plus helpers like effectPipe and runEffect for composing and executing flows.	Developer Tools / Code Assistant	76	↓ -1	49 days ago	Details