AgentDish directory

AI Research AI Tools

Accepted listings in this category.

Listing Category Score Trend Checked
#338 ↓ -10
MarCognity-AI

An open-source research framework for structured LLM evaluation, claim verification, and source-grounded reflective reasoning. The repo describes modular components for retrieval, semantic scoring, skeptical claim checking, and benchmark-style epistemic assessment.

AI Research / Evaluation / Verification Framework 81 ↓ -10 28 days ago Details

arXiv paper describing AVA, a GenAI platform for policy and development research built on 4,000+ World Bank reports. The abstract highlights multilingual support, evidence-based synthesis, citation verifiability, and reasoned abstention when queries cannot be supported.

AI Research / Trustworthy Generative AI 78 ↑ +6 5 days ago Details

Agora-1 is a multi-agent world model from Odyssey that simulates shared real-time environments for up to four participants, human or AI, with a focus on gaming, robotics, reinforcement learning, and foundation model research.

AI Research / World Models 78 ↑ +6 14 days ago Details

Apple Machine Learning Research paper proposing LaDiR, a reasoning framework that combines a VAE-based latent space with latent diffusion to improve LLM text reasoning and iterative refinement.

AI Research / LLM Reasoning 78 ↑ +5 27 days ago Details
#429 ↓ -1
Hyperagents

Research paper introducing hyperagents, a self-referential agent framework that combines a task agent and a meta agent into one editable program. The abstract describes a DGM-based system that improves both task performance and its own improvement process across domains.

AI Research / Self-Improving Agents 76 ↓ -1 10 days ago Details

A GitHub research project documenting a long-form, multi-model analysis of LLM behavior across Claude, Gemini, ChatGPT, and Grok. The repo includes an executive summary, screenplay, technical white paper, and archive of logs and chat records.

AI Research / LLM Evaluation & Analysis 75 → 0 7 days ago Details

A GitHub research project that measures how gpt-4.1 responds when asked to pick a random number between 1 and 100, using 10,000 API calls and comparing the results to a uniform baseline.

AI Research / Model Behavior Analysis 74 ↓ -1 8 days ago Details