Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
-
Updated
Jun 3, 2026 - Python
Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
Official code repo for NeurIPS 2025 Spotlight paper, "Debate or Vote: Which Yields Better Decisions in Multi-Agent LLMs?"
Framework: Multi-Agent LLMs For Conversational Task-Solving (MALLM)
Research-backed methodology for multi-AI collaborative decision-making with structured debate, consensus synthesis, and bias reduction
Source code for the paper: Hear Both Sides: Efficient Multi-Agent Debate via Diversity-Aware Message Retention
Human-in-the-loop adversarial workflows for high-stakes research audit: from ChatGPT-Gemini duels to 4-model MAD.
Code for "Multiple LLM Agents Debate for Equitable Cultural Alignment" [ACL 2025 Oral]
Code review, but with 5 models arguing first.
A brutally fault-tolerant Mixture-of-Agents (MoA) pipeline built in pure Python. Designed to orchestrate chaotic, round-robin LLM proxy endpoints through a rigorous 4-stage Agentic Workflow (Generate ➔ Cross-Critique ➔ Rebuttal ➔ Judge). Built to eradicate hallucination and guarantee absolute accuracy in complex, multi-step reasoning tasks.
Three Claude Code skills for working with Codex CLI: codex-bridge (one-shot Codex calls), mad-build (Claude+Codex collaboration with cross-review), and mad-research (three-stream adversarial audit of papers, grants, reports with anonymized cross-critique and fresh-Codex synthesis).
Enable autonomous AI agents to optimize LLM training code through iterative experiments and improve models without manual intervention overnight
Research paper on how agentic debate pipelines can be constructed to reduce hallucinations in LLMs with open-source and commercial models
Generate research papers autonomously by chatting with OpenClaw, using Python 3.11+, with a self-evolving framework and extensive test coverage.
AI Agent Workspace Redesign: A structured multi-agent debate methodology for managing AI agent workspaces (memory, file organization, protection tiers, boot sequences)
supporting codes for the study on multi-agent debate protocols
Neurips paper code - Evaluating and enhancing Large Language Models (LLMs) using mathematical datasets through innovative Multi-Agent Debate Architecture, without traditional fine-tuning or Retrieval-Augmented Generation techniques. This project explores advanced strategies to boost LLM capabilities in mathematical reasoning.
Multi-LLM debate orchestrator that drives ChatGPT, Claude, and DeepSeek web UIs (no API keys) through a 5-phase loop: propose → critique → revise → synthesize → ratify-or-veto. Editorial dark UI.
Build autonomous experiment loops that edit files, run tests, and keep only improvements for any project type
Build autonomous ML research in Elixir: design, train, and iterate GPT models across GPUs with fault-tolerant BEAM concurrency
Run your decisions through a jury of 12 AI minds before you commit.