Find open-source science resources

Metabolomics

An interactive platform that performs statistical analyses on metabolomics datasets and allows visualising results with ease. The interface gives users autonomy in creating figures suited to their reporting and publication needs.

CC-BY-4.0

GONetView

Ontology and terminology

Standalone browser-based Gene Ontology network viewer for exploring, filtering, searching, and exporting GO term and gene annotation neighborhoods from locally preprocessed GO OBO and GAF data.

PoseView

PoseView automatically generates 2D diagrams of protein-ligand complexes, focusing on the interactions between protein and ligand. Interactions between molecules are estimated by an underlying interaction mode that relies on atom types and simple geometric criteria. It adheres to the conventions of chemical structure diagram generation. The quality of the resulting diagrams is comparable to manually drawn examples from books and scientific publications.

Circlator

Sequence assembly

Circlator is a tool to circularize genome assemblies. It will attempt to identify each circular sequence and output a linearised version of it. It does this by assembling all reads that map to contig ends and comparing the resulting contigs with the input assembly.

AlphaFind v2

Proteins

AlphaFind v2 is a tool for fast, structure‑based search for protein structures against the AlphaFold DB (https://alphafold.ebi.ac.uk/) and TED DB (https://ted.cathdb.info/). The tool uses protein‑level embeddings to provide a rapid pre‑filter, with top candidates undergoing TM-Score, RMSD and residue‑level alignments computations. Four complementary search modes are available: (i) whole‑chain search, (ii) pLDDT‑aware search that restricts similarity to high‑confidence regions, (iii) domain search against the TED database, and (iv) multidomain search that combines several chain‑level matches into a single score. Users can restrict queries to a given organism, CATH superfamily or to proteins with experimental structures, and submit queries by UniProt/AlphaFold identifier. Results comprise a ranked list with similarity metrics, rich metadata and an interactive 3‑D superposition view. The service is freely accessible at https://alphafind.ics.muni.cz/.

EUCAIM ETL toolset

Data identity and mapping

Modular toolchain for an extensible and customizable ETL pipeline that extracts, transforms, and loads clinical data and medical imaging metadata, applying dataset-specific mappings to generate outputs compatible with the EUCAIM Common Data Model (CDM). Its design aims to minimize manual data preparation efforts and facilitate customization and integration with other components, such as data quality assurance tools. Containerized, currently supports input datasets in CSV, JSON, XLSX.

Apache-2.0

EvoBind

Protein folding, stability and design

Design of linear and cyclic peptide binders from protein sequence information.

FigCanvas

FigCanvas is an AI scientific figure generator for life-science researchers. It produces publication-ready biological diagrams (mechanism diagrams, pathway figures, cell biology visuals), CONSORT and methodology flowcharts, and data visualizations such as volcano plots from text prompts or uploaded datasets. The tool turns methods-section text or structured data into editable vector figures suitable for manuscripts, posters, and slides, helping researchers iterate on figures without rebuilding them in Illustrator.

miniconda

Software management

Miniconda is a minimal Python distribution that includes the Conda package and environment manager plus only essential dependencies. It provides a lightweight way to create isolated environments and install Python packages as needed, without the large preinstalled package set of Anaconda.

Proprietary

seqlib

Molecular genetics

seqlib is a type-safe Rust library for working with DNA and RNA sequences.

Rust

Not licensed

NuclearPhaser

Genomics

NuclearPhaser is a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs. This is an overview of the phasing pipeline for dikaryons.

CompuCell3D

Systems biology

CompuCell3D is a multiscale multicellular virtual tissue modeling and simulation environment. CompuCell3D is written in C++ and provides Python bindings for model and simulation development in Python.

ParseSNP

Personalised medicine

A small <720Kb C++ windows utility. That allows you to load Ancestry, 23andMe, FTDNA, or Genes for Good RAW DNA files search them, merge them. covert them to Ancestry format. But also create files from peer reviewed publications to compare with you loaded data to give your genetic disposition for the condition you have entered the data for an statistical risk if OR values are included. Included with the program are example files for Type 2 Diabetes risk factors. (As I have type 2 Diabetes so I could test the results).

C++

s3segmenter

Bioimaging

S3segmenter is a Matlab-based set of functions that generates single cell (nuclei and cytoplasm) label masks.

MATLAB

Not licensed

nanosv

Structural genomics

NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers.

generate_count_matrix

Transcriptomics

Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.

NP-Likeness

Small molecules

Natural Product-likeness calculator v-2.1 : calculates natural product-likeness of small molecules based on open-data of natural products.

Java

LGPL-2.0-or-later

NanoporeDB

Computational biology

NanoporeDB is an open-access structural database dedicated to the exploration, analysis, and design of protein nanopores, which serve as essential molecular gateways in biological membranes and form the basis of many advanced biosensing and sequencing technologies. This platform integrates large-scale structure-guided mining and deep learning-based modeling using AlphaFold-Multimer and AlphaFold3 to provide about 7,000 high-confidence multimeric nanopore structures. Each entry includes detailed information on membrane embedding, pore geometry annotation, and constriction profiling to support functional and biophysical interpretation. Through an interactive 3D visualization interface and quantitative parameters such as tilt angle, insertion depth, and pore geometry, NanoporeDB enables researchers to explore nanopore diversity, discover novel scaffolds, and accelerate innovation in molecular sensing, precision diagnostics, and synthetic biology.

SNP-sites

Genomics

Finds SNP sites from a multi-FASTA alignment file.

minigraph

Genomics

Minigraph is a sequence-to-graph mapper and graph constructor. For graph generation, it aligns a query sequence against a sequence graph and incrementally augments an existing graph with long query subsequences diverged from the graph.

gc_derivatization

Metabolomics

In silico derivatization for GC. The GC-derivatization tool converts carbonyl groups to C═N-OCH3 (MeOX) and transforms acidic protons into -Si(CH3)3 (TMS). Key functionalities include checking for specific groups, removing derivatization groups, and adding derivatization groups to molecules.

Image Duplicates Checker

Data quality management

Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]

Apache-2.0

Data Integration Quality Check Tool (DIQCT)

Data quality management

A tool that checks the clinical metadata quality (validity, completeness), the integrity between images and clinical metadata provided as well as their accuracy, the de-identification protocol applied, and existence of annotation together with the consistency between the images and the annotation files and informs the user on corrective actions prior to data upload.

Proprietary

METALizer

METALizer predicts the coordination geometry of metal ions in metalloproteins. Users can compare potential coordination geometries to those found in the examined structure. The predicted coordination geometries and the observed metal interaction distances can be interactively compared to statistics calculated based on the PDB.

PoseEdit

PoseEdit automatically generates 2D diagrams of protein-ligand complexes, focusing on the interactions between protein and ligand. Interactions between molecules are estimated by an underlying interaction model that relies on atom types and simple geometric criteria. The structure mining tool GeoMine also uses this model to describe binding sites. In addition, users can manipulate the diagrams by translating, rotating, mirroring parts of the structure, adding additional interactions, or removing them. Furthermore, users can add individual labels or adjust available labels. Users can download the final 2D diagrams for a binding site of interest in JSON or SVG format.

JCVI

Sequence assembly

JCVI is a versatile toolkit for comparative genomics analysis. It is a collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.

BSD-2-Clause

DICOM-SEG Annotation

Data quality management

This module provides a command line tool to validate DICOM SEG files against predefined requirements specified in an Excel file. It contains components for finding relevant DICOM files, loading and parsing validation requests and applying validation rules. The main validation process checks each DICOM file for compliance with the Type 1, 1C, 2, 2C and 3 attributes specified in the requirements file. A detailed report is generated highlighting issues such as missing, invalid or conditionally required attributes, including file paths and affected DICOM tags. The tool is designed to ensure data integrity and compliance with DICOM standards.

Apache-2.0

xgt

xgt is a command-line tool for programmatic access to the GTDB REST API. It provides four subcommands: search (genome queries with pagination), genome (cards, metadata, taxonomic history), taxon (lineage and genome set retrieval), and diff (per-rank taxonomic comparison between any two GTDB releases). All subcommands support batch input, JSON/CSV/TSV output, file splitting, and automatic retry. Implemented in Rust as a self-contained binary with no runtime dependencies.

QP-Insights Uploader

Medical imaging

This desktop application enables users to upload DICOM data along with associated clinical information to QP-Insights—the data management platform of the UPV Reference Node within EUCAIM.

CC-BY-NC-ND-4.0

DoGSiteScorer

Protein binding sites

DoGSiteScorer is a grid-based automated pocket detection and analysis tool. It applies a Difference of Gaussian filter to detect potential binding pockets and splits them into sub-pockets. The method solely uses the 3D structure of the protein. Global properties, describing the size, shape, and chemical features of the predicted (sub-)pockets, are calculated. Per default, a simple druggability score based on a linear combination of the three descriptors describing volume, hydrophobicity, and enclosure is provided for each (sub-)pocket. Furthermore, a subset of meaningful descriptors is incorporated in a support vector machine (libsvm) to predict the (sub-)pocket druggability score (values are between zero and one). The higher the score, the more druggable the pocket is estimated to be.

DoGSite3

Protein binding sites

DoGSite3 was developed for predicting robust and reliable small molecule binding sites and computing their geometrical and chemical descriptors. It is based on the grid-based DoGSite algorithm for predicting pockets and their sub-pockets. The new tool is largely rotation- and translation-invariant due to a normalization procedure before binding site prediction. Known ligands in the structure can be used to bias the grid by sufficiently buried ligand fragments. The output encompasses novel chemical binding site descriptors considering solvent accessibility. Compared to its predecessor, it shows increased robustness through comprehensive parameter optimization. DoGSite3 runs finish within seconds.

JAMDA

Molecular modelling

JAMDA enables the preparation of individual protein structures and the docking of small molecules in preprocessed binding sites of choice. JAMDA simplifies the process of protein-ligand docking by automatic preprocessing protocols for the protein and binding sites of interest. The JAMDAscore scoring function retrieved 75% of the native poses in the three highest-ranked solutions for high-quality protein-ligand complexes with default settings. Individual configurations for protein preparation are available, e.g., considering protein ensembles, relevant binding site water molecules, or cofactors. A user-defined number of input conformations for the ligands of interest can be generated fully automated using Conformator. Alternatively, users can also provide externally prepared ligand conformers.

HyPPI

HyPPI classifies a protein-protein complex based on its interaction type into permanent, transient, or crystal artifact. Permanent protein-protein complexes are only stable in their complexed state. Their subunits would denature upon dissociation of the protein-protein complex. Transient protein-protein complexes are stable in the complexed as well as in the monomeric form, depending on the necessary function of the complex. Crystal artifacts have no biological function and are artificially formed during the crystallization process. The discrimination is performed using two characteristics of the protein-protein complex, the hydrophobicity of the interface (ΔGhydrophobic) and the quotient of interface area ratios (IF-quotient). The IF-quotient considers whether the protein-protein interface is symmetric.

RBPBench

RNA

RBPBench is a multi-function tool to evaluate CLIP-seq and other related genomic region data using a comprehensive collection of known RNA-binding protein (RBP) binding motifs. RBPBench can be used for a variety of purposes, from RBP motif search (database or user-supplied RBP motifs) in genomic regions, over motif enrichment and co-occurrence analysis, in-depth comparisons over multiple datasets via sequence and genomic annotation statistics, to benchmarking CLIP-seq peak caller methods as well as comparisons across cell types and CLIP-seq protocols. RBPBench supports both sequence and structure motifs, as well as regular expressions (sequence and structure patterns). Moreover, users can easily provide their own motif collections.