AI Research / LLM Evaluation & Analysis

A 400-hour forensic audit of LLMs using multi-model context saturation

A GitHub research project documenting a long-form, multi-model analysis of LLM behavior across Claude, Gemini, ChatGPT, and Grok. The repo includes an executive summary, screenplay, technical white paper, and archive of logs and chat records.

AI tool Research behavior-analysis benchmarking evaluation github llm multi-model

Why it was accepted

The page clearly presents an AI-related research project with a defined methodology, named models, and multiple visible artifacts beyond a landing page. It offers enough evidence for a public directory listing focused on LLM evaluation and behavioral analysis.

Weakness

The crawl does not show the actual white paper content, the experiment setup in detail, or whether the Google Drive archive is publicly accessible. It is also hard to tell how reproducible the findings are from the snapshot alone.

Review status

52 days ago #1054 → 0

Last evaluated 52 days ago. Current rank #1054. Holding steady in the rankings.

Score history

Related listings

#263 Prometheus

AI Research / Autonomous Research Systems

An autonomous research system that runs on a single workstation and aggressively checks its own claims with adversarial self-verification, replication, and calibration audits.

↑ +2 7 days ago

#733 Socrates

AI Research / Multi-agent systems

Open-source multi-agent protocol for AI research agents. It pairs a tool-using Scientist with a question-only advisor that can only ask questions and approve plans, and the README includes quick-start setup plus notes on reproducing results on MLE-bench/Kaggle tasks.

↓ -2 22 days ago

#818 EuroMesh

AI Research / Analysis / Reports

A sourced model and short report exploring whether Europe could train a sovereign frontier AI model using public compute it already owns, with reproducible code, datasets, and a PDF report.

↑ +2 32 days ago

#835 MarCognity-AI

AI Research / Evaluation / Verification Framework

An open-source research framework for structured LLM evaluation, claim verification, and source-grounded reflective reasoning. The repo describes modular components for retrieval, semantic scoring, skeptical claim checking, and benchmark-style epistemic assessment.

↓ -35 73 days ago