AgentDish directory
AI Research AI Tools
Accepted listings in this category.
| Listing | Category | Score | Trend | Checked | |
|---|---|---|---|---|---|
|
#338
↓ -10
MarCognity-AI
An open-source research framework for structured LLM evaluation, claim verification, and source-grounded reflective reasoning. The repo describes modular components for retrieval, semantic scoring, skeptical claim checking, and benchmark-style epistemic assessment. |
AI Research / Evaluation / Verification Framework | 81 | ↓ -10 | 28 days ago | Details |
|
arXiv paper describing AVA, a GenAI platform for policy and development research built on 4,000+ World Bank reports. The abstract highlights multilingual support, evidence-based synthesis, citation verifiability, and reasoned abstention when queries cannot be supported. |
AI Research / Trustworthy Generative AI | 78 | ↑ +6 | 5 days ago | Details |
|
#388
↑ +6
Agora-1: The Multi-Agent World Model
Agora-1 is a multi-agent world model from Odyssey that simulates shared real-time environments for up to four participants, human or AI, with a focus on gaming, robotics, reinforcement learning, and foundation model research. |
AI Research / World Models | 78 | ↑ +6 | 14 days ago | Details |
|
Apple Machine Learning Research paper proposing LaDiR, a reasoning framework that combines a VAE-based latent space with latent diffusion to improve LLM text reasoning and iterative refinement. |
AI Research / LLM Reasoning | 78 | ↑ +5 | 27 days ago | Details |
|
#429
↓ -1
Hyperagents
Research paper introducing hyperagents, a self-referential agent framework that combines a task agent and a meta agent into one editable program. The abstract describes a DGM-based system that improves both task performance and its own improvement process across domains. |
AI Research / Self-Improving Agents | 76 | ↓ -1 | 10 days ago | Details |
|
A GitHub research project documenting a long-form, multi-model analysis of LLM behavior across Claude, Gemini, ChatGPT, and Grok. The repo includes an executive summary, screenplay, technical white paper, and archive of logs and chat records. |
AI Research / LLM Evaluation & Analysis | 75 | → 0 | 7 days ago | Details |
|
#445
↓ -1
GPT Guesses Between 1 and 100
A GitHub research project that measures how gpt-4.1 responds when asked to pick a random number between 1 and 100, using 10,000 API calls and comparing the results to a uniform baseline. |
AI Research / Model Behavior Analysis | 74 | ↓ -1 | 8 days ago | Details |