Find open-source science resources

Domain-Specific Research Agents

Strongest open-source automated theorem prover in Lean 4, 8B model matches DeepSeek-Prover-V2-671B at 84.6% MiniF2F, 32B model achieves 90.4% with self-correction, using scaffolded data synthesis and verifier-guided proof refinement (Princeton, 2025)

LLM-Peer-Review

Academic Review & Evaluation

Web application for LLM-assisted manuscript review and annotation

NewtonBench (ICLR 2026)

First benchmark evaluating LLMs' ability to rediscover scientific laws through interactive experimentation across 324 tasks in 12 physics domains, featuring memorization-resistant metaphysical shifts of canonical laws (HKUST)

SciCode

Research coding benchmark curated by scientists with 338 subproblems across 16 subdomains (physics, math, materials, biology, chemistry), evaluating LLMs on realistic scientific programming tasks with gold-standard solutions (NeurIPS 2024)

BuildArena

First physics-aligned interactive benchmark for LLM agents in engineering construction, designing rockets/cars/bridges in physics simulator with 3D spatial geometry library

ScienceBoard (ICLR 2026)

Evaluating multimodal autonomous agents in realistic scientific workflows across real scientific software environments (KAlgebra, Celestia, Grass GIS, Lean 4, etc.) with VM-based evaluation infrastructure and agent trajectories

MLE-Bench (OpenAI, 2024)

Benchmark evaluating AI agents on 75 curated Kaggle-style ML engineering competitions with reproducible Docker-based grading harness, human baselines, and end-to-end task lifecycle, used as a primary benchmark for autonomous ML research agents (e.g., InternAgent #1 at 36.44%)

PaperBench (OpenAI, 2025)

Benchmark evaluating AI agents' ability to replicate 20 ICML 2024 Spotlight/Oral papers from scratch, with 8,316 gradable tasks and author-co-developed rubrics

AIRS-Bench (Meta, 2026)

Benchmark quantifying end-to-end autonomous AI research abilities of LLM agents across 20 tasks from SOTA machine learning papers spanning NLP, code, math, biochemical modelling, and time series forecasting, with normalized score metrics against human SOTA and HuggingFace dataset

ScienceAgentBench (ICLR 2025)

102 executable tasks from 44 peer-reviewed papers across 4 disciplines with containerized evaluation

PantheonOS (Stanford, 2025)

Autonomous Research Systems (2023-2025 Breakthroughs)

Evolvable and privacy-preserving multi-agent framework automating, scaling, and accelerating data sciences with a particular focus on end-to-end single-cell biology analyses; features agentic code evolution, multi-agent team orchestration, distributed architecture, and a community marketplace with 1,000+ curated agents and skills (428+ stars)

EvoScientist

Autonomous Research Systems (2023-2025 Breakthroughs)

Self-evolving AI scientist with 6 specialized sub-agents (plan/research/code/debug/analyze/write) and persistent memory, #1 on DeepResearch Bench II and AstaBench, supporting multi-provider LLMs and multi-channel deployment (Apache 2.0, 2026)

UniScientist

Autonomous Research Systems (2023-2025 Breakthroughs)

Universal scientific research intelligence covering 50+ disciplines, repositioning LLMs as cross-disciplinary generators with human experts as verifiers; 30B model outperforms Claude Opus and GPT on 5 research benchmarks

autoresearch

Autonomous Research Systems (2023-2025 Breakthroughs)

Andrej Karpathy's autonomous LLM research framework: AI agent runs overnight experiments on a real training setup, auto-editing code→5min training→evaluation in a loop, ~100 experiments per night on a single GPU

POPPER

Autonomous Research Systems (2023-2025 Breakthroughs)

Automated hypothesis testing with agentic sequential falsifications

Curie

Autonomous Research Systems (2023-2025 Breakthroughs)

Automated and rigorous experiments using AI agents for scientific discovery

Aviary

Autonomous Research Systems (2023-2025 Breakthroughs)

Language agent gymnasium for challenging scientific tasks including DNA manipulation, literature search, and protein engineering

Robin

Autonomous Research Systems (2023-2025 Breakthroughs)

FutureHouse's end-to-end scientific discovery multi-agent system orchestrating literature search (Crow/Falcon) and data analysis (Finch) agents, first AI-generated drug discovery identifying ripasudil as novel dry AMD therapeutic (2025)

LabClaw

Autonomous Research Systems (2023-2025 Breakthroughs)

Skill operating layer for biomedical AI agents with 211 production-ready SKILL.md files across 7 domains (biology, pharmacology, medicine, data science, literature search), enabling modular dry-lab reasoning and protocol composition for Stanford LabOS-compatible agents

ToolUniverse

Autonomous Research Systems (2023-2025 Breakthroughs)

Democratizing AI scientists by transforming any LLM into research systems with 600+ scientific tools (Harvard MIMS)

freephdlabor

Autonomous Research Systems (2023-2025 Breakthroughs)

First fully customizable open-source multiagent framework automating complete research lifecycle from idea conception to LaTeX papers with dynamic workflows

InternAgent

Autonomous Research Systems (2023-2025 Breakthroughs)

Closed-loop multi-agent system from hypothesis to verification across 12 scientific tasks, #1 on MLE-Bench (36.44%)

AIDE (WecoAI, arXiv 2025)

Autonomous Research Systems (2023-2025 Breakthroughs)

LLM-driven machine learning engineering agent using agentic tree search to autonomously draft, debug and benchmark ML code; wins 4× more medals than the best linear agent on OpenAI's MLE-Bench (75 Kaggle competitions) (1.3K+ stars, MIT License)

AI-Researcher

Autonomous Research Systems (2023-2025 Breakthroughs)

Autonomous pipeline from literature review→hypothesis→algorithm implementation→publication-level writing with Scientist-Bench evaluation

AutoResearchClaw

Autonomous Research Systems (2023-2025 Breakthroughs)

Fully autonomous research from idea to paper with multi-agent debate, citation verification, and OpenClaw integration (11K+ stars, 2026)

AlphaResearch

Autonomous Research Systems (2023-2025 Breakthroughs)

Autonomous algorithm discovery combining evolutionary search with peer-review reward models, achieving best-known performance on circle packing problems

Kosmos

Autonomous Research Systems (2023-2025 Breakthroughs)

Extended autonomy AI scientist with 200 parallel agent rollouts, 42K lines of code execution, 1.5K papers analyzed per run, achieving 79.4% accuracy and 7 scientific discoveries (Edison Scientific)

DeepScientist

Autonomous Research Systems (2023-2025 Breakthroughs)

First system progressively surpassing human SOTA on frontier AI tasks (183.7%, 1.9%, 7.9% improvements), month-long autonomous discovery with 20,000+ GPU hours

Virtual Lab (Stanford Zou Group, Nature 2025)

Autonomous Research Systems (2023-2025 Breakthroughs)

AI-human collaborative research platform where a human researcher works with a team of LLM agents via team and individual meetings to perform scientific research; demonstrated by designing new SARS-CoV-2 nanobodies with wet-lab validation

OpenEvolve

Autonomous Research Systems (2023-2025 Breakthroughs)

Open-source implementation of AlphaEvolve's evolutionary coding agent paradigm, enabling LLMs to autonomously discover and optimize algorithms through iterative evolution, matching the approach behind DeepMind's breakthrough matrix multiplication discovery (6.2K+ stars, 2025)

FunSearch (DeepMind, Nature 2023)

Autonomous Research Systems (2023-2025 Breakthroughs)

First system to make novel, verifiable scientific discoveries by pairing LLMs with evolutionary search, solving open problems in combinatorics (cap set problem) and discovering faster matrix multiplication algorithms

Awesome-LLM-KG

Knowledge Graph Resources

Comprehensive collection of papers on unifying LLMs and knowledge graphs

KoPA

Knowledge Graph Construction

Structure-aware prefix adaptation for integrating LLMs with knowledge graphs (ACM MM 2024)

GraphGen

Knowledge Graph Construction

Knowledge graph-guided synthetic data generation for LLM fine-tuning, achieving strong performance on scientific QA (GPQA-Diamond) and math reasoning (AIME)

iText2KG

Knowledge Graph Construction

Incremental knowledge graph construction using LLMs with entity extraction and Neo4j visualization

Claude Prism

Scientific Writing & Collaboration

Offline-first scientific writing workspace powered by Claude, integrating LaTeX, Python, and 100+ scientific skills with local execution, Zotero integration, and privacy-focused design (2026)

Obsidian Smart Connections

Scientific Writing & Collaboration

AI-powered note linking and research graph navigation

Zotero-GPT (MuiseDestiny)

Literature Management Plugins

Classic open-source plugin for document Q&A and summarization within Zotero

PapersGPT for Zotero

Literature Management Plugins

Multi-PDF conversation, retrieval, and citation in Zotero with commercial/local models (Ollama), MCP support

llm-for-zotero

Literature Management Plugins

Research agent system deeply integrated with Zotero supporting Agent Mode, skills, multi-model backends (OpenAI-compatible, Claude Code, WebChat, Codex), and MinerU PDF parsing for literature Q&A, summarization, figure inspection, and source comparison (1.3K+ stars, 2026)

AutoR

Human-centered research OS with terminal-first harness and local browser Studio, turning research work into reproducible artifact-backed runs through a 9-stage workflow with human approval gates, resume/rollback controls, and venue-aware manuscript packaging (1K+ stars, 2026)

OpenBioMed

Open-source biomedical AI platform integrating multimodal foundation models (BioMedGPT, PharmolixFM, LangCell) with agentic workflows and 45+ Claude Code skills for drug discovery, protein engineering, and single-cell omics analysis (PharMolix & Tsinghua AIR, 1K+ stars, 2023-2026)

Notebook Intelligence (NBI)

AI coding assistant for JupyterLab with agent mode, supporting arbitrary LLM providers (2025+)

Jupyter AI (JupyterLab Extension)

Official Jupyter extension with `%%ai` magic commands and sidebar chat assistant, connecting multiple model providers and local inference

STORM

LLM agent system synthesizing Wikipedia-like long-form research articles from scratch through multi-perspective question asking, web retrieval, and citation-grounded report generation, with Co-STORM extension for collaborative human-LLM knowledge curation conversations (Stanford OVAL, NAACL 2024 & EMNLP 2024)

paper-reviewer

Generate comprehensive reviews from arXiv papers and convert to blog posts

Valsci

Self-hostable scientific claim-verification and literature-review tool combining Semantic Scholar retrieval, bibliometric scoring, and LLM-based evidence synthesis for large-batch validation workflows

OpenScholar

Retrieval-augmented LM synthesizing scientific literature from 45M papers with human-expert-level citation accuracy, outperforming GPT-4o by 5% on ScholarQABench (Nature 2026, UW & Ai2)

PaperQA2

High-accuracy RAG for scientific PDFs with citation support, agentic RAG, and contradiction detection

TableBank