Find open-source science resources

Cross-domain directory aggregating tools, AI models, datasets, and research resources from bio.tools, Bioconductor, HuggingFace, curated GitHub awesome-lists, and more.

3,084 of 5,674 resources

Showing 2,8512,900

Extract figures, tables, captions, and section titles from scholarly PDFs

Docling-powered parsing with UI/CLI demonstration for rapid prototyping

Parse scientific papers to structured fields (title/author/sections/references)

Machine learning software for extracting structured metadata from scholarly documents

Large-scale PDF/LaTeX/JATS parsing to standardized JSON for millions of papers

High-accuracy PDF→Markdown/JSON/HTML conversion, specialized for tables/formulas/code blocks with benchmark scripts

Production-grade ETL for transforming complex documents into structured formats, with open-source API

Advanced OCR with PP-StructureV3 document parsing, 13% accuracy improvement, supports 80+ languages

Toolkit for linearizing academic PDFs into LLM-ready text with high accuracy and structure preservation, optimized for scientific literature extraction

Neural optical understanding for academic documents, transforms scientific PDFs to Markdown with mathematical formula support

Comprehensive toolkit for high-quality PDF content extraction with layout detection, formula recognition, and OCR

SOTA multimodal document parsing with 1.2B parameters outperforming GPT-4o, converts PDFs to LLM-ready Markdown/JSON

Automated code generation from machine learning research papers into runnable implementations (4.5K+ stars, 2025)

Large-scale chart summarization datasets for training chart description capabilities

Universal chart comprehension and reasoning model

Automated academic illustration generation for AI scientists, converting research papers into publication-ready figures using VLMs and diffusion models with iterative refinement (PKU & Google Research, 6.2K+ stars, 2026)

Transform arXiv research papers into engaging presentations and YouTube-ready videos

First benchmark for automatic video generation from scientific papers (NeurIPS 2025)

Azure Semantic Kernel multi-agent PPT generation reference

Convert PDF files into editable slides with three lines of code

AI-powered tool that automatically converts academic papers (PDF) into presentation slides

Transform arXiv papers into Beamer slides using LLMs

Beyond text-to-slides generation with PPTEval multi-dimensional evaluation (EMNLP 2025)

Multimodal LLM for scientific charts and diagrams understanding/generation

Multi-agent system with Parser-Planner-Painter architecture converting `paper.pdf` to editable `poster.pptx`, outperforms GPT-4o with 87% fewer tokens

Comprehensive collection of 125+ ready-to-use scientific skill modules for Claude AI across bioinformatics, cheminformatics, clinical research, ML, and materials science

Programmatic data labeling and weak supervision

Multi-type data labeling and annotation tool

Secure text-to-visualization through standardized chart specifications

Automated data visualization with minimal code

Conversational data analysis using natural language

A curated list of molecular docking software, datasets, and other closely related resources.

A list of papers, data sets, and other resources for machine learning for small-molecule drug discovery.

Another list focuses on Python stuff related to Chemistry.

Chemoinformatics and drug discovery section in deeplearning-biology repo.

A teaching platform for computer-aided drug design (CADD) using open source packages and data.

Webapp for generating conformers

Wrapper for RDKit's RunReactants to improve stereochemistry handling

an automated workflow for the generation and storage of DFT calculations for organic molecules.

Python-centric Cookiecutter for Molecular Computational Chemistry Packages by [MolSSL](https://molssi.org/)

Open Parser for Systematic IUPAC nomenclature

Parsers and algorithms for computational chemistry logfiles.

Analysis of molecular dynamics trajectories.

Automates and standardizes ligand preparation for AutoDock Vina.

Cheminformatic extension for the SQLAlchemy database.

Open source web framework for small molecule analysis based on Django.

[RDKit](http://www.rdkit.org/) and [OSRA](https://cactus.nci.nih.gov/osra/) in the [Bottle](http://bottlepy.org/docs/dev/) on [Tornado](http://www.tornadoweb.org/en/stable/).

Chemical Information from the Web.

A python package for optimizing chemical reactions using machine learning (contains 10 algorithms + several benchmarks).

A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models in R.