Casanovo

Genomics & Bioinformatics
Actively maintained187updated 2 days ago
Python
Apache-2.0

Transformer encoder-decoder for de novo peptide sequencing from tandem mass spectrometry, translating MS/MS spectra directly to peptide sequences without reference databases, enabling identification of novel peptides for immunopeptidomics, antibody repertoires, and metaproteomes (Noble Lab UW, Nature Communications 2024)

README

Casanovo De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model Casanovo is a state-of-the-art deep learning tool designed for de novo peptide sequencing. Powered by a transformer neural network, Casanovo "translates" peaks in MS/MS spectra into amino acid sequences with remarkable precision. Casanovo can be used to find unexpected peptide sequences in any data-dependent acquisition, bottom-up tandem mass spectrometry dataset, and is particularly useful for immunopeptidomics,…

Source attribution

  • GitHubgithub.com/noble-lab/casanovo
  • Awesome AI for Sciencegithub.com/noble-lab/casanovo

Related resources

Unified framework for state-of-the-art pre-trained bio foundation models across genomics and transcriptomics, providing standardized interfaces and pipelines for DNA, RNA, and single-cell models including Evo 2, Geneformer, scGPT, and UCE with streamlined inference, benchmarking, and fine-tuning workflows (213+ stars, 2024-2025)

2153 weeks ago
Python
AGPL-3.0

RNA foundation model trained on millions of RNA sequences for generalist RNA sequence understanding, enabling downstream structure prediction, function annotation, and representation learning for non-coding RNAs (ml4bio, 372+ stars)

Generative AI framework for inverse design of 3D RNA structure and function using geometric deep learning, learning design rules from 3D structures to capture complex tertiary interactions (pseudoknots, non-canonical base pairs) with expert-level accuracy for designing functional RNAs including aptamers and ribozymes (bioRxiv 2025)

GenBio AI's software stack for the AI-Driven Digital Organism, supporting adaptation and finetuning of multiscale biological foundation models across DNA, RNA, protein, structure, and single-cell tasks with reproducible CLIs and pretrained model zoo (2025)

Arc Institute's 40B-parameter genome foundation model trained on 9 trillion nucleotides from all domains of life, supporting 1M base pair context for generalist DNA/RNA/protein prediction and design (Nature 2026)

End-to-end RNA 3D structure prediction using RNA language model pretrained on 23.7M sequences, outperforming existing methods and human expert groups on RNA-Puzzles and CASP15 (Nature Methods 2024)