top of page

Our Research

Code

Our open-source baseline agent for ML Research Benchmark. This agent provides a foundation for comparing and evaluating machine learning research and development tasks that agents can perform.

The tasks for ML Research Benchmark, a benchmarkdesigned to evaluate the capabilities of AI agents in accelerating ML research and development. The benchmark consists of 9 competition-level tasks that span the spectrum of activities typically undertaken by ML researchers.

The AI Agent State Library is a library designed to manage the state and decision-making processes of AI agents. At its core, it implements the concept of finite state machines, a computational model used to design systems with a finite number of states and transitions between those states. 

ARIA Benchmarks is a suite of closed-book benchmarks designed to assess a models knowledge and understanding of machine learning research and methodologies

Datasets & Models

ArXiv DL Instruct Dataset

ArXivDLInstruct is a dataset for instruct tuning for Python research code. The dataset is comprised of 778,152 functions from research code on ArXiv, and provides a detailed prompt for generating the function, in addition to a short description.

​​

ArXiv Research Code Dataset

The arxiv_research_code dataset contains over 21.8GB of source code files referenced strictly in ArXiv papers. The dataset serves as a curated dataset for Code LLMs.
 


We have also broken the dataset out into the most prominent languages

ArXiv Instruct Tuning Dataset

A series of datasets consisting of 50,000 question-answer pairs derived from ArXiv abstracts. Questions are generated using the t5-base model, while the answers are generated using the GPT-3.5-turbo model
 

Arxiv QA Bier Datasets

A series of BEIR style question-answer dataset derived from ArXiv.

​

ArXiv Semantic Search Models

A series of Axiv Semantic Search models trained on ArtifactAI/arxiv-beir-500k-generated-queries, a large corpus of 500k question/abstract pairs extracted from the ArXiv dataset. It is designed to encode and transform sentences from academic papers, allowing for effective semantic similarity and information retrieval tasks. It maps sentences & paragraphs to a 768 dimensional dense vector space.

​

ArXiv LED Summarization Models

A led-large-16384 model to summarize ArXiv papers. Inputs are the abstracts of papers and full documents, and outputs are the summaries of the papers.

​

Advancing AI Together

We value the power of collaboration and are actively seeking partnerships with academic institutions, AI research labs, and individual researchers to drive innovation together.

Thanks for submitting!

©2024 Algorithmic Research Group. All Rights Reserved.

bottom of page