Find open-source science resources
Cross-domain directory aggregating tools, AI models, datasets, and research resources from bio.tools, Bioconductor, HuggingFace, curated GitHub awesome-lists, and more.
Filters
Domain
Language
License
Source
Type(1)
3,084 of 5,674 resources
Showing 3,001–3,050
FASTQ and SAM quality control using Python.
A pipeline for preprocessing short and long sequencing reads, built with Nextflow.
Customizable pipeline for differential expression analysis with an intuitive GUI.
Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction.
A generic but comprehensive bacterial annotation pipeline, built with Nextflow, with nice graphical options for investigating results.
A flexible pipeline, built with Nextflow, for the complete analysis of bacterial genomes.
A list of pipeline resources.
Workflow standard developed by the Broad.
Hadoop Oozie-based workflow system focused on genomics data analysis in cloud environments.
Workflow library embedded in the Go programming language, focusing on supporting complex workflow constructs, compiling to a single binary, providing powerful file naming and comprehensive audit reports for every output
A python-based workflow manager.
A Workflow Management System geared towards scientific workflows.
a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.
A small language for defining pipeline stages and linking them together to make pipelines.
A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities.
Create an index on a compressed text file.
Write-once-read-many table for large datasets.
Table file index.
Sort genomic files according to a specified order.
Fast FASTQ filtering by matching reads against one or more regex patterns.
A wee tool for random access into BGZF files.
Easily submitting PBS jobs with script template. Multiple input files supported.
Another cross-platform, efficient, practical and pretty CSV/TSV toolkit.
Utilities for working with CSV/Tab-delimited files.
Syntax Highlighting for Computational Biology file formats (SAM, VCF, GTF, FASTA, PDB, etc...) in vim/less/gedit/sublime.
Modular and universal bioinformatics, Bionode provides pipeable UNIX command line tools and JavaScript APIs for bioinformatics analysis workflows.
Git repo of useful single line commands.
A compressor of common genomic file formats (BAM, CRAM, FASTQ, VCF etc).
Easily get SRA download links and other information.
Go Get Data; A command line interface for obtaining genomic data.
Java framework for processing biological data.
Biocaml aims to be a high-performance user-friendly library for Bioinformatics.
A Go library and command line utility for engineering organisms.
The modern C++ library for sequence analysis.
Rust implementations of algorithms and data structures useful for bioinformatics.
Freely available tools for biological computing in Python, with included cookbook, packaging and thorough documentation. Part of the [Open Bioinformatics Foundation](http://open-bio.org/). Contains the very useful [Entrez](https://biopython.org/DIST/docs/api/Bio.Entrez-module.html) package for API access to the NCBI databases.
International association of users & developers of open source Perl tools for bioinformatics, genomics and life sciences.
It is a web-application for visual and interactive gene expression analysis. Phantasus is based on Morpheus – a web-based software for heatmap visualisation and analysis, which was integrated with an R environment via OpenCPU API. Aside from basic visualization and filtering methods, R-based methods such as k-means clustering, principal component analysis or differential expression analysis with limma package are supported.
The Open Neuroscience Graph (openneuroscience.org) is an open-access, curated knowledge graph that maps the open science ecosystem in neuroscience as a browsable digital garden. Built from an Obsidian vault and published as a static website using Quartz, the project replaces traditional linear presentation with a networked structure of interlinked Markdown notes. Bidirectional links, full-text search, and an integrated graph visualization allow users to navigate thematic relationships dynamically rather than sequentially. The complete source material is openly available to sustain, replicate and extend the resource, includding all Markdown content, media attachments, Quartz configuration files, and site customizations. Researchers, educators, and open-science practitioners may explore the site directly, download the vault for offline use in Obsidian, or fork the material to build new, derivative knowledge bases. PID=https://doi.org/10.5281/zenodo.20181900
DigestedProteinDB provides a scalable computational infrastructure for indexing and querying peptide cleavage data. Designed for seamless integration into high-throughput mass spectrometry pipelines, it enables low-latency searches and advanced filtering of digested protein datasets to accelerate experimental spectra cross-referencing.
PlantiSMASH is a specialized extension of antiSMASH for the identification and analysis of biosynthetic gene clusters (BGCs) in plant genomes. It supports advanced plant-specific detection rules and features for comparative genomics, visualization, and more.
LifeSoaks was designed to find solvent channels in macromolecular structures solved by X-ray crystallography. It predicts their accessibility by molecules through an automated annotation of so-called bottleneck radii. It simplifies the process of manually checking a crystal structure for solvent channels. Bottleneck radii can be calculated for solvent channels and small molecule binding sites. The tool is ideally suited for channel analyses before the actual soaking experiments to select the most promising experimental conditions and crystal forms. LifeSoaks runs fully automated and will finish within seconds to minutes for moderately sized crystals.
Three-dimensional protein structures play a vital role in drug design. Structure-based design necessitates an in-depth examination of the available quality data before using the structure in computational experiments and for method evaluation. StructureProfiler assists in automatically profiling sets of protein-ligand complex structures based on multiple quality indicators, ranging from model characteristics, e.g., the R factor, and active site features, e.g., bond length deviations, to ligand properties such as electron density support and the validity of torsion angles.
The electron density score for individual atoms (EDIA) quantifies the electron density fit of each atom in a crystallographically resolved structure. Multiple EDIA values can be combined using the power mean to compute the EDIAm, i.e., the electron density score for a group of several atoms. It enables users to score a set of atoms, such as a ligand, a residue, or an active site.
Primerpickr is an open-source tool for rational primer design powered by the aggregation of public usage of pcr primers
Protoss is a fully automated hydrogen atom placement tool for protein-ligand complexes. It adds missing hydrogen atoms to protein structures and detects reasonable protonation states, tautomeric states, and hydrogen coordinates of both protein and ligand molecules by optimizing the hydrogen bond network.
WarPP predicts the position and orientation of water molecules in small-molecule binding sites. It places and scores water molecules in binding sites of crystallographic structures based on EDIAscorer results and interaction geometries as known from experimentally solved protein structures. WarPP was validated on a high-quality set of 1,500 protein-ligand complexes, containing 20,000 crystallographically observed water molecules. It is sufficiently fast for high-throughput analyses. It correctly places water molecules in approx. 80% of the cases. Users can export the predictions as PDB files for, e.g., molecular docking with JAMDA.
GeoMine enables the automated mining of protein-ligand binding sites. Based on individually designed queries, users can search for spatial interaction patterns in huge collections of protein-ligand complexes and binding pockets. The regularly updated GeoMine database relies on the free database systems SQLite and PostgreSQL. It supports radius-based pockets (based on ligands and predicted pockets (based on DoGSite3) for query generation. The query management is based on XML (for the REST service) or JSON in the GUI mode. Its output consists of the query-based superpositions of the matched binding sites and statistics on matching points, distances, and angles.
SIENA is a software pipeline enabling the fully automated construction of protein structure ensembles from the PDB. Starting with a single query structure, all binding sites with high sequence similarity are extracted from the PDB, aligned, and superimposed. SIENA also handles complicated cases, such as comparing binding sites at protein domain interfaces or within multimeric proteins.
MicroMiner assists in identifying single-residue substitutions in protein structure databases. It searches protein residue environments with local sequence and structural similarity based on the SIENA methodology. Users can search for structural mutation in the entire PDB, their in-house structure collection, or (subsets of) the AlphaFold Database. They can use the method to explore the mutation landscape of proteins with experimental or predicted structures. MicroMiner can be applied to single domains or even protein-protein or protein-ligand interfaces. Several filter options to simplify downstream analysis are available.