Find open-source science resources

Website & Interactive Content Generation

AI-powered pipeline converting papers into interactive websites, posters, and multimedia presentations with "Let's Make Your Paper Alive!" philosophy

3737 months ago

Transformer encoder-decoder for de novo peptide sequencing from tandem mass spectrometry, translating MS/MS spectra directly to peptide sequences without reference databases, enabling identification of novel peptides for immunopeptidomics, antibody repertoires, and metaproteomes (Noble Lab UW, Nature Communications 2024)

1872 days ago

stk

General Chemistry

A library for building, manipulating, analyzing and automatic design of molecules, including a genetic algorithm.

2844 months ago

DISCO

Protein & Drug Discovery

General multimodal protein design framework enabling DNA-encoding of chemistry for programmable enzyme design and diverse protein generation through diffusion-based generative modeling (190+ stars, Apache 2.0, 2026)

1901 week ago

FitSNAP

Force Fields

A Package For Training SNAP Interatomic Potentials for use in the LAMMPS molecular dynamics package.

1867 months ago

GPL-2.0

PyLabRobot

Lab Automation & Robotics

Interactive and hardware-agnostic SDK for laboratory automation, enabling programmatic control of liquid handlers, plate readers, and other lab instruments across multiple vendors; foundational infrastructure for self-driving laboratories and AI-driven experimental execution (447+ stars)

4502 days ago

AfterQC

Sequence Processing

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.

2146 years ago

Helical

Genomics & Bioinformatics

Unified framework for state-of-the-art pre-trained bio foundation models across genomics and transcriptomics, providing standardized interfaces and pipelines for DNA, RNA, and single-cell models including Evo 2, Geneformer, scGPT, and UCE with streamlined inference, benchmarking, and fine-tuning workflows (213+ stars, 2024-2025)

2153 weeks ago

AGPL-3.0

ChemFormula

General Chemistry

ChemFormula provides a class for working with chemical formulas. It allows parsing chemical formulas, calculating formula weights, and generating formatted output strings (e.g. in HTML, LaTeX, or Unicode).

336 months ago

Equiformer

Machine Learning for Physics

Equivariant graph attention Transformer (ICLR2023)

2821 year ago

DeepAnalyze

Data Analysis & Visualization

First agentic LLM for autonomous data science with end-to-end pipeline from data to analyst-grade reports

4.2K1 month ago

sam2interval

Genomics

A Python script that converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.

373 months ago

PennyLane

Specialized Frameworks

Cross-platform library for differentiable programming of quantum computers with automatic differentiation, enabling hybrid quantum-classical machine learning for quantum chemistry, quantum physics, and NISQ algorithm research (Xanadu, 3k+ stars)

3.2K2 days ago

CHGNet

Materials Discovery

Universal pretrained neural network potential with charge and magnetic moment awareness, trained on 1.5M+ Materials Project inorganic structures for charge-informed molecular dynamics and phase diagram prediction (Berkeley, Nature Machine Intelligence 2023 Cover)

3833 months ago

NOASSERTION

DeepChem

Machine Learning

Deep learning library for Chemistry based on Tensorflow

6.8K2 months ago

cellSAM

Medical AI & Clinical Applications

Foundation model for universal cell segmentation achieving state-of-the-art performance across bacteria, tissue, yeast, cell culture, and diverse imaging modalities (brightfield, fluorescence, phase), with pip-installable inference and Napari plugin (vanvalenlab/Caltech, bioRxiv 2024)

1956 months ago

NOASSERTION

Genie 2

Protein & Drug Discovery

Diffusion model for scalable protein structure design with multi-motif scaffolding capabilities, achieving state-of-the-art designability, diversity, and novelty through SE(3)-equivariant attention and massive data augmentation (AlQuraishi Lab, 2024)

1921 year ago

perses

Generative Molecular Design

Experiments with expanded ensembles to explore chemical space.

1996 months ago

TimesFM (Google Research)

General Science Models

Pretrained time series foundation model for long-horizon forecasting across diverse scientific domains including climate variables, biomedical signals, and physical observations; decoder-only Transformer architecture with strong zero-shot generalization (19.8K+ stars, Apache 2.0, 2024-2025)

20.1K5 days ago

AlphaGeometry

Domain-Specific Research Agents

DeepMind's Olympiad-level geometry theorem prover combining neural language model with symbolic deduction engine, AlphaGeometry2 solves 84% of IMO geometry problems (42/50) at gold-medalist level (Nature 2024)

4.8K4 months ago

SlideDeck AI

Slides & Presentation Generation

Co-create PowerPoint presentations with Generative AI from documents or topics

3582 weeks ago

ProDy

Molecular Dynamics

A Python package for protein dynamics analysis

5462 months ago

NOASSERTION

OpenChem

Machine Learning

OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend.

7452 years ago

pyensembl

Data

Pythonic Access to the Ensembl database.

4001 week ago

EdinOmics Dash App

Metabolomics

An interactive platform that performs statistical analyses on metabolomics datasets and allows visualising results with ease. The interface gives users autonomy in creating figures suited to their reporting and publication needs.

CC-BY-4.0

GONetView

Ontology and terminology

Standalone browser-based Gene Ontology network viewer for exploring, filtering, searching, and exporting GO term and gene annotation neighborhoods from locally preprocessed GO OBO and GAF data.

Circlator

Sequence assembly

Circlator is a tool to circularize genome assemblies. It will attempt to identify each circular sequence and output a linearised version of it. It does this by assembling all reads that map to contig ends and comparing the resulting contigs with the input assembly.

EUCAIM ETL toolset

Data identity and mapping

Modular toolchain for an extensible and customizable ETL pipeline that extracts, transforms, and loads clinical data and medical imaging metadata, applying dataset-specific mappings to generate outputs compatible with the EUCAIM Common Data Model (CDM). Its design aims to minimize manual data preparation efforts and facilitate customization and integration with other components, such as data quality assurance tools. Containerized, currently supports input datasets in CSV, JSON, XLSX.

EvoBind

Protein folding, stability and design

Design of linear and cyclic peptide binders from protein sequence information.

miniconda

Software management

Miniconda is a minimal Python distribution that includes the Conda package and environment manager plus only essential dependencies. It provides a lightweight way to create isolated environments and install Python packages as needed, without the large preinstalled package set of Anaconda.

Proprietary

NuclearPhaser

Genomics

NuclearPhaser is a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs. This is an overview of the phasing pipeline for dikaryons.

GPL-3.0

CompuCell3D

Systems biology

CompuCell3D is a multiscale multicellular virtual tissue modeling and simulation environment. CompuCell3D is written in C++ and provides Python bindings for model and simulation development in Python.

GPL-3.0

nanosv

Structural genomics

NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers.

generate_count_matrix

Transcriptomics

Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.

gc_derivatization

Metabolomics

In silico derivatization for GC. The GC-derivatization tool converts carbonyl groups to C═N-OCH3 (MeOX) and transforms acidic protons into -Si(CH3)3 (TMS). Key functionalities include checking for specific groups, removing derivatization groups, and adding derivatization groups to molecules.

Image Duplicates Checker

Data quality management

Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]

Data Integration Quality Check Tool (DIQCT)

Data quality management

A tool that checks the clinical metadata quality (validity, completeness), the integrity between images and clinical metadata provided as well as their accuracy, the de-identification protocol applied, and existence of annotation together with the consistency between the images and the annotation files and informs the user on corrective actions prior to data upload.

Proprietary

JCVI

Sequence assembly

JCVI is a versatile toolkit for comparative genomics analysis. It is a collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.

BSD-2-Clause

DICOM-SEG Annotation

Data quality management

This module provides a command line tool to validate DICOM SEG files against predefined requirements specified in an Excel file. It contains components for finding relevant DICOM files, loading and parsing validation requests and applying validation rules. The main validation process checks each DICOM file for compliance with the Type 1, 1C, 2, 2C and 3 attributes specified in the requirements file. A detailed report is generated highlighting issues such as missing, invalid or conditionally required attributes, including file paths and affected DICOM tags. The tool is designed to ensure data integrity and compliance with DICOM standards.

QP-Insights Uploader

Medical imaging

This desktop application enables users to upload DICOM data along with associated clinical information to QP-Insights—the data management platform of the UPV Reference Node within EUCAIM.