Find open-source science resources

Cross-domain directory aggregating tools, AI models, datasets, and research resources from bio.tools, Bioconductor, HuggingFace, curated GitHub awesome-lists, and more.

31 of 5,674 resources

Provides functionality for producing geometric representations of protein and RNA structures, and biological interaction networks.

1.2K2 days ago
Jupyter Notebook
MIT

CluMSID is a tool that aids the identification of features in untargeted LC-MS/MS analysis by the use of MS2 spectra similarity and unsupervised statistical methods. It offers functions for a complete and customisable workflow from raw data to visualisations and is interfaceable with the xmcs family of preprocessing packages.

104 years ago
R
MIT

Stamps Seurat, SingleCellExperiment, and SummarizedExperiment objects with a persistent metadata passport. For Seurat objects the passport is stored in the misc slot; for SingleCellExperiment and SummarizedExperiment objects it is stored in the metadata slot. Tracks animal info, experiment details, lineage (parent/child relationships), RDS registry numbers, processing logs, and custom fields. Includes an interactive Shiny gadget to fill and update the passport, and a read mode to print the full passport to console. The passport persists inside the RDS file with no external files needed.

31 month ago
R
MIT

Computation Pipeline library for python widely used in science and bioinformatics.

1754 years ago
Python
MIT

E(3)-equivariant neural network interatomic potentials achieving DFT accuracy with up to 1000× less training data than invariant models, foundational architecture behind MACE and Allegro (Harvard, MIT, Nature Communications 2022)

9144 days ago
Python
MIT

A library for building, manipulating, analyzing and automatic design of molecules, including a genetic algorithm.

2844 months ago
Python
MIT

Interactive and hardware-agnostic SDK for laboratory automation, enabling programmatic control of liquid handlers, plate readers, and other lab instruments across multiple vendors; foundational infrastructure for self-driving laboratories and AI-driven experimental execution (447+ stars)

4502 days ago
Python
MIT

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.

2146 years ago
Python
MIT

LLM papers for scientific discovery

3456 months ago
MIT

ChemFormula provides a class for working with chemical formulas. It allows parsing chemical formulas, calculating formula weights, and generating formatted output strings (e.g. in HTML, LaTeX, or Unicode).

336 months ago
Python
MIT

Equivariant graph attention Transformer (ICLR2023)

2821 year ago
Python
MIT

First agentic LLM for autonomous data science with end-to-end pipeline from data to analyst-grade reports

4.2K1 month ago
Python
MIT

A Python script that converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.

373 months ago
Python
MIT

Deep learning library for Chemistry based on Tensorflow

6.8K2 months ago
Python
MIT

Experiments with expanded ensembles to explore chemical space.

1996 months ago
Python
MIT

Co-create PowerPoint presentations with Generative AI from documents or topics

3582 weeks ago
Python
MIT

A Swiss Army knife for genome arithmetic.

1K1 year ago
C
MIT

OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend.

7452 years ago
Python
MIT

lumpy: a general probabilistic framework for structural variant discovery.

3423 months ago
C
MIT

Tools for NanoString Technologies GeoMx Technology. Package to easily graph on top of an OME-TIFF image. Plotting annotations can range from tissue segment to gene expression.

SCAN is a microarray normalization method to facilitate personalized-medicine workflows. Rather than processing microarray samples as groups, which can introduce biases and present logistical challenges, SCAN normalizes each sample individually by modeling and removing probe- and array-specific background noise using only data from within each array. SCAN can be applied to one-channel (e.g., Affymetrix) or two-channel (e.g., Agilent) microarrays. The Universal exPression Codes (UPC) method is an extension of SCAN that estimates whether a given gene/transcript is active above background levels in a given sample. The UPC method can be applied to one-channel or two-channel microarrays as well as to RNA-Seq read counts. Because UPC values are represented on the same scale and have an identical interpretation for each platform, they can be used for cross-platform data integration.

ProteoMM is a statistical method to perform model-based peptide-level differential expression analysis of single or multiple datasets. For multiple datasets ProteoMM produces a single fold change and p-value for each protein across multiple datasets. ProteoMM provides functionality for normalization, missing value imputation and differential expression. Model-based peptide-level imputation and differential expression analysis component of package follows the analysis described in “A statistical framework for protein quantitation in bottom-up MS based proteomics" (Karpievitch et al. Bioinformatics 2009). EigenMS normalisation is implemented as described in "Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition." (Karpievitch et al. Bioinformatics 2009).

Tools for NanoString Technologies nCounter Technology. Provides support for reading RCC files into an ExpressionSet derived object. Also includes methods for QC and normalizaztion of NanoString data.

Tools for NanoString Technologies GeoMx Technology. Package provides functions for reading in DCC and PKC files based on an ExpressionSet derived object. Normalization and QC functions are also included.

Robust model-based clustering using a t-mixture model with Box-Cox transformation. Note: users should have GSL installed. Windows users: 'consult the README file available in the inst directory of the source distribution for necessary configuration instructions'.

It is a web-application for visual and interactive gene expression analysis. Phantasus is based on Morpheus – a web-based software for heatmap visualisation and analysis, which was integrated with an R environment via OpenCPU API. Aside from basic visualization and filtering methods, R-based methods such as k-means clustering, principal component analysis or differential expression analysis with limma package are supported.

Standalone browser-based Gene Ontology network viewer for exploring, filtering, searching, and exporting GO term and gene annotation neighborhoods from locally preprocessed GO OBO and GAF data.

NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers.

Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.

Minigraph is a sequence-to-graph mapper and graph constructor. For graph generation, it aligns a query sequence against a sequence graph and incrementally augments an existing graph with long query subsequences diverged from the graph.

RBPBench is a multi-function tool to evaluate CLIP-seq and other related genomic region data using a comprehensive collection of known RNA-binding protein (RBP) binding motifs. RBPBench can be used for a variety of purposes, from RBP motif search (database or user-supplied RBP motifs) in genomic regions, over motif enrichment and co-occurrence analysis, in-depth comparisons over multiple datasets via sequence and genomic annotation statistics, to benchmarking CLIP-seq peak caller methods as well as comparisons across cell types and CLIP-seq protocols. RBPBench supports both sequence and structure motifs, as well as regular expressions (sequence and structure patterns). Moreover, users can easily provide their own motif collections.