Algorithmic
Research Group

We study how software and industrial systems recursively improve themselves in real-world settings.

Study Failure: AI-driven GPU Kernel Optimization

I recently completed what I thought was a comprehensive study of AI-driven GPU kernel optimization. Over 131,520 optimization attempts...

gpu / optimization / machine learning

Learning to Rank Architectures: A Small Model That Guides Neural Architecture Search

I trained a tiny recursive reasoning model to rank architectures by predicted performance, then used it to guide search. It achieved...

nas / architecture search / machine learning

ARIA Benchmark: How Much Machine Learning Do AI Models Actually Know?

A benchmark suite for evaluating AI agents on real machine learning research tasks — including task definitions, a baseline agent, and...

agent-evaluation / benchmarks / python

ArXiv Research Code Dataset: 129K Research Repositories

A benchmark suite for evaluating AI agents on real machine learning research tasks — including task definitions, a baseline agent, and...

agent-evaluation / benchmarks / python

ArXivDLInstruct: 778K Research Code Functions for Instruction Tuning

A benchmark suite for evaluating AI agents on real machine learning research tasks — including task definitions, a baseline agent, and...

agent-evaluation / benchmarks / python

DeltaMLBench: Can AI Agents Improve on Published ML Research?

A benchmark suite for evaluating AI agents on real machine learning research tasks — including task definitions, a baseline agent, and...

agent-evaluation / benchmarks / python

Teaching Models to Bluff: Measuring Deception, Belief, and Coordination in LLM Secret Hitler

I wired up five LLM agents to play the social-deduction game Secret Hitler with structured logging.

ai-research / agi / recursive-improvement

ML Research Benchmark: Can AI Agents Do Real ML Research?

A benchmark suite for evaluating AI agents on real machine learning research tasks — including task definitions, a baseline agent, and...

agent-evaluation / benchmarks / python

Recursive self-improvement is beginning to shape both software and industry. In software, AI systems are increasingly involved in designing, training, and optimizing other AI systems. Progress compounds through algorithmic advances and improvements in computing hardware.

In the physical world, similar dynamics are emerging in robotics, manufacturing, and supply chains, where automated systems increasingly optimize the processes that produce, deploy, and refine them.

Algorithmic Research Group studies recursive systems in practice. We measure progress, develop tools, and analyze how recursive improvement changes the behavior and capabilities of real-world systems.

Our work focuses on understanding the dynamics, limits, and impacts of self-improving systems across software and industrial domains.