Find open-source science resources
Cross-domain directory aggregating tools, AI models, datasets, and research resources from bio.tools, Bioconductor, HuggingFace, curated GitHub awesome-lists, and more.
Filters
Domain
Language
License(1)
Source
Type
35 of 5,662 resources
A library for building, manipulating, analyzing and automatic design of molecules, including a genetic algorithm.
G-quadruplexes (G4s) are unique nucleic acid secondary structures predominantly found in guanine-rich regions and have been shown to be involved in various biological regulatory processes. G4SNVHunter is an R package designed to rapidly identify genomic sequences with G4-forming propensity and to accurately screen user-provided single nucleotide variants—as well as other small-scale variants such as indels and MNVs—for their potential to destabilize these structures. This allows researchers to then screen these critical variants for deeper study, digging into how they might influence biological functions—think gene regulation, for instance—by impairing G4 formation propensity.
Interactive and hardware-agnostic SDK for laboratory automation, enabling programmatic control of liquid handlers, plate readers, and other lab instruments across multiple vendors; foundational infrastructure for self-driving laboratories and AI-driven experimental execution (447+ stars)
sRACIPE implements a randomization-based method for gene circuit modeling. It allows us to study the effect of both the gene expression noise and the parametric variation on any gene regulatory circuit (GRC) using only its topology, and simulates an ensemble of models with random kinetic parameters at multiple noise levels. Statistical analysis of the generated gene expressions reveals the basin of attraction and stability of various phenotypic states and their changes associated with intrinsic and extrinsic noises. sRACIPE provides a holistic picture to evaluate the effects of both the stochastic nature of cellular processes and the parametric variation.
Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.
CCPlotR is an R package for visualising results from tools that predict cell-cell interactions from single-cell RNA-seq data. These plots are generic and can be used to visualise results from multiple tools such as Liana, CellPhoneDB, NATMI etc.
LLM papers for scientific discovery
ChemFormula provides a class for working with chemical formulas. It allows parsing chemical formulas, calculating formula weights, and generating formatted output strings (e.g. in HTML, LaTeX, or Unicode).
Equivariant graph attention Transformer (ICLR2023)
First agentic LLM for autonomous data science with end-to-end pipeline from data to analyst-grade reports
A Python script that converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.
Deep learning library for Chemistry based on Tensorflow
An Apache-based persistent URL (PURL) service
Experiments with expanded ensembles to explore chemical space.
Co-create PowerPoint presentations with Generative AI from documents or topics
OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend.
lumpy: a general probabilistic framework for structural variant discovery.
Babelon is a simple standard for managing ontology translations and language profiles. Profiles are managed as TSV files, see for example https://github.com/obophenotype/hpo-translations/tree/main/babelon. The goal of Babelon as a data model and vocabulary is to capture the minimum data required to capture important metadata such as confidence and precision of translation.
Tools for NanoString Technologies GeoMx Technology. Package to easily graph on top of an OME-TIFF image. Plotting annotations can range from tissue segment to gene expression.
SCAN is a microarray normalization method to facilitate personalized-medicine workflows. Rather than processing microarray samples as groups, which can introduce biases and present logistical challenges, SCAN normalizes each sample individually by modeling and removing probe- and array-specific background noise using only data from within each array. SCAN can be applied to one-channel (e.g., Affymetrix) or two-channel (e.g., Agilent) microarrays. The Universal exPression Codes (UPC) method is an extension of SCAN that estimates whether a given gene/transcript is active above background levels in a given sample. The UPC method can be applied to one-channel or two-channel microarrays as well as to RNA-Seq read counts. Because UPC values are represented on the same scale and have an identical interpretation for each platform, they can be used for cross-platform data integration.
ProteoMM is a statistical method to perform model-based peptide-level differential expression analysis of single or multiple datasets. For multiple datasets ProteoMM produces a single fold change and p-value for each protein across multiple datasets. ProteoMM provides functionality for normalization, missing value imputation and differential expression. Model-based peptide-level imputation and differential expression analysis component of package follows the analysis described in “A statistical framework for protein quantitation in bottom-up MS based proteomics" (Karpievitch et al. Bioinformatics 2009). EigenMS normalisation is implemented as described in "Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition." (Karpievitch et al. Bioinformatics 2009).
Tools for NanoString Technologies nCounter Technology. Provides support for reading RCC files into an ExpressionSet derived object. Also includes methods for QC and normalizaztion of NanoString data.
Tools for NanoString Technologies GeoMx Technology. Package provides functions for reading in DCC and PKC files based on an ExpressionSet derived object. Normalization and QC functions are also included.
Robust model-based clustering using a t-mixture model with Box-Cox transformation. Note: users should have GSL installed. Windows users: 'consult the README file available in the inst directory of the source distribution for necessary configuration instructions'.
It is a web-application for visual and interactive gene expression analysis. Phantasus is based on Morpheus – a web-based software for heatmap visualisation and analysis, which was integrated with an R environment via OpenCPU API. Aside from basic visualization and filtering methods, R-based methods such as k-means clustering, principal component analysis or differential expression analysis with limma package are supported.
Standalone browser-based Gene Ontology network viewer for exploring, filtering, searching, and exporting GO term and gene annotation neighborhoods from locally preprocessed GO OBO and GAF data.
NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers.
Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.
Minigraph is a sequence-to-graph mapper and graph constructor. For graph generation, it aligns a query sequence against a sequence graph and incrementally augments an existing graph with long query subsequences diverged from the graph.
RBPBench is a multi-function tool to evaluate CLIP-seq and other related genomic region data using a comprehensive collection of known RNA-binding protein (RBP) binding motifs. RBPBench can be used for a variety of purposes, from RBP motif search (database or user-supplied RBP motifs) in genomic regions, over motif enrichment and co-occurrence analysis, in-depth comparisons over multiple datasets via sequence and genomic annotation statistics, to benchmarking CLIP-seq peak caller methods as well as comparisons across cell types and CLIP-seq protocols. RBPBench supports both sequence and structure motifs, as well as regular expressions (sequence and structure patterns). Moreover, users can easily provide their own motif collections.
This service fills a gap between services like prefix.cc and LOV or looking up the original vocabulary specification. Not all vocabularies (or schema or ontology, whatever you want to call them) provide an HTML view. If you resolve some of the common prefixes all you get back is some RDF serialization which is not ideal. (from <https://prefix.zazuko.com/about>)
With the growing number of available genomes, the need for an environment to support effective comparative analysis increases. The original SEED Project was started in 2003 by the [Fellowship for Interpretation of Genomes (FIG)](http://thefig.info/) as a largely unfunded open source effort. Argonne National Laboratory and the University of Chicago joined the project, and now much of the activity occurs at those two institutions (as well as the University of Illinois at Urbana-Champaign, Hope college, San Diego State University, the Burnham Institute and a number of other institutions). The cooperative effort focuses on the development of the comparative genomics environment called the SEED and, more importantly, on the development of curated genomic data. This prefix provides identifiers for molecular roles that describe the function of one or more proteins in microbes and plants.
The DCAT-AP conversion to a LinkML Schema is the intended point of truth for the DCAT-AP+ schema, but could be used alternatively as a LinkML representation of DCAT-AP for other Projects. It is a port of DCAT-AP to the LinkML world that is as faithful to the original as possible. This Persistent Identifier does not only provide the SHACL Shape, but could also be used as described [here](https://github.com/perma-id/w3id.org/tree/cecbc2e5f40d928f05ed5306d24fc60db0e7bb21/nfdi-de/dcat-ap-plus). DCAT-AP+ is a [LinkML](https://linkml.io/)-based extension of the [DCAT Application Profile 3.0](https://semiceu.github.io/DCAT-AP/releases/3.0.0/) that adds a provenance layer for describing how a dataset was generated and what it is about, using the [Starting Point Terms of PROV-O](https://www.w3.org/TR/prov-o/#description-starting-point-terms), the [QUDT ontology](https://www.qudt.org/), and [Dublin Core Terms](http://purl.org/dc/terms/).