top of page
Our Research

Code
Our open-source baseline agent for ML Research Benchmark. This agent provides a foundation for comparing and evaluating machine learning research and development tasks that agents can perform.
The tasks for ML Research Benchmark, a benchmarkdesigned to evaluate the capabilities of AI agents in accelerating ML research and development. The benchmark consists of 9 competition-level tasks that span the spectrum of activities typically undertaken by ML researchers.
The AI Agent State Library is a library designed to manage the state and decision-making processes of AI agents. At its core, it implements the concept of finite state machines, a computational model used to design systems with a finite number of states and transitions between those states.
ARIA Benchmarks is a suite of closed-book benchmarks designed to assess a models knowledge and understanding of machine learning research and methodologies
Datasets & Models

ArXiv DL Instruct Dataset
ArXivDLInstruct is a dataset for instruct tuning for Python research code. The dataset is comprised of 778,152 functions from research code on ArXiv, and provides a detailed prompt for generating the function, in addition to a short description.
​​
-
AlgorithmicResearchGroup/ArXivDLInstruct (2.26 GB)
ArXiv Research Code Dataset
The arxiv_research_code dataset contains over 21.8GB of source code files referenced strictly in ArXiv papers. The dataset serves as a curated dataset for Code LLMs.
We have also broken the dataset out into the most prominent languages
ArXiv Instruct Tuning Dataset
A series of datasets consisting of 50,000 question-answer pairs derived from ArXiv abstracts. Questions are generated using the t5-base model, while the answers are generated using the GPT-3.5-turbo model

Arxiv QA Bier Datasets
A series of BEIR style question-answer dataset derived from ArXiv.
​
ArXiv Semantic Search Models
A series of Axiv Semantic Search models trained on ArtifactAI/arxiv-beir-500k-generated-queries, a large corpus of 500k question/abstract pairs extracted from the ArXiv dataset. It is designed to encode and transform sentences from academic papers, allowing for effective semantic similarity and information retrieval tasks. It maps sentences & paragraphs to a 768 dimensional dense vector space.
​
ArXiv LED Summarization Models
A led-large-16384 model to summarize ArXiv papers. Inputs are the abstracts of papers and full documents, and outputs are the summaries of the papers.
​
bottom of page