Developer Tools / AI Evaluation

agent-skills-eval

A TypeScript CLI and SDK for testing whether Agent Skills improve model outputs by running with-skill vs baseline evaluations and generating reports.

AI tool SDK TypeScript agent skills benchmarking cli evaluation repository

Why it was accepted

The page clearly describes an AI-adjacent developer tool with a specific purpose: evaluating Agent Skills through paired runs, judge grading, and report generation. It includes concrete usage, install instructions, config examples, SDK usage, and output artifacts, which is enough for a useful public listing.

Weakness

The snapshot does not show maintainer activity beyond commit count, release status, or any real-world examples of evaluation results. It’s also unclear which task types or skill formats work best in practice beyond the agentskills.io-style workflow.

Review status

71 days ago #168 ↓ -3

Last evaluated 71 days ago. Current rank #168. Down 3 spots in the rankings.

Score history

Related listings

#1 CodeGraph

Developer Tools / AI for Code

CodeGraph is a local code knowledge graph for AI coding agents like Claude Code, Cursor, Codex, OpenCode, and Hermes Agent. It aims to cut token use, tool calls, and runtime by letting agents query pre-indexed code structure instead of scanning files repeatedly.

→ 0 54 days ago

#3 scribe

Developer Tools / AI Agents

Single-binary CLI that builds an AI agent knowledge base from git repos, Claude Code/Codex sessions, and saved links. It generates a portable markdown wiki, runs on cron, supports local Ollama mode, and exposes the result for agents via CLAUDE.md/AGENTS.md and MCP.

↓ -1 21 hours ago

#4 LLMRender

Developer Tools / React Libraries

A lightweight React Markdown renderer with built-in LaTeX, syntax highlighting, streaming-safe rendering, and security-focused defaults.

↓ -1 34 days ago

#7 Version Sentinel

Developer Tools / AI Coding Guardrails

Claude Code plugin that blocks dependency edits until a fresh, source-cited version check is recorded, helping prevent hallucinated or stale package versions across npm, pip, Poetry/uv, Cargo, and NuGet.

↑ +163 73 days ago