Find open-source science resources

Cross-domain directory aggregating tools, AI models, datasets, and research resources from bio.tools, Bioconductor, HuggingFace, curated GitHub awesome-lists, and more.

3,084 of 5,674 resources

Showing 2,9513,000

Pythonic access to FASTA files.

Python wrapper for [bedtools](https://github.com/arq5x/bedtools).

Python library for blazing-fast genomic interval operations and genomic file formats I/O on Polars DataFrames

Cython + HTSlib == fast VCF parsing; even faster parsing than pyVCF.

A port of [pyVCF](https://github.com/jamescasbon/PyVCF) using Cython for speed.

Access to Biological Web Services from Python.

Pythonic access to the UCSC Genome database.

Genetic variant annotation and effect prediction toolbox.

Predicts whether an amino acid substitution affects protein function.

**Comes with samtools!** - Reads simulator.

Tools for adding mutations to existing `.bam` files, used for testing mutation callers.

GFF and GTF file manipulation and interconversion.

Suite of tools to handle gene annotations in any GTF/GFF format.

VCF manipulation and statistics (e.g. linkage disequilibrium, allele frequency, Fst).

A C++ library for parsing and manipulating VCF files.

Annotate a VCF with other VCFs/BEDs/tabixed files.

Telseq is a tool for estimating telomere length from whole genome sequence data.

Fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs.

Displaying sequence statistics for next-generation sequencing.

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.

Automate common SAM & BAM conversions.

Collection of tools for working with BAM files.

structural variant calling and genotyping with existing tools, but,smoothly.

GRIDSS: the Genomic Rearrangement IDentification Software Suite.

Structural variant and indel caller for mapped sequencing data.

Structural variant discovery by integrated paired-end and split-read analysis.

samtools/bcftools are a suite of tools for manipulating NGS data and can be used to call variants.

A polymorphic bayesian genotyping model with wide applicability.

Variant Discovery in High-Throughput Sequencing Data.

Bayesian haplotype-based polymorphism discovery and genotyping.

Deep learning-based variant caller

A software package for estimating gene and isoform expression levels from RNA-Seq data.

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

Ultra-fast, sensitive search and clustering suite for protein and nucleotide sequence sets.

Partial-Order Alignment for fast alignment and consensus of multiple homologous sequences.

An ultrafast protein aligner for `blastp` and `blastx` like searches.

A system for rapidly aligning entire genomes, whether in complete or draft form.

SIMD C library for global, semi-global, and local pairwise sequence alignments

the wavefront alignment algorithm (WFA) which expoit sequence similarity to speed up alignment

BWA-MEM drop-in replacement: 2-3x faster, 2-5x cheaper, 100% identical output on standard CPUs.

Burrow-Wheeler Aligner for pairwise alignment between DNA sequences.

An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Scalable gVCF merging and joint variant calling for population sequencing projects.

Scalable genomic analysis.

UNIX-style FASTA manipulation tools.

Toolkit for processing sequences in FASTA/Q formats.

file format conversion in Biopython in a convenient way.

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang.

Aggregate results from bioinformatics analyses across many samples into a single report.

FASTQ/A short-reads pre-processing tools: Demultiplexing, trimming, clipping, quality filtering, and masking utilities.