songlab/tokenizer-dna-mlm
README
license: mit tags: dna biology genomics Tokenizer for masked language modeling of DNA sequences
Source attribution
- HuggingFace — songlab/tokenizer-dna-mlm
Related resources
scvi-tools/tabula-sapiens-blood-stereoscope
by scvi-toolsStereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.
scvi-tools/tabula-sapiens-fat-condscvi
by scvi-toolsCondSCVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in DestVI.
scvi-tools/tabula-sapiens-heart-scanvi
by scvi-toolsScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…
AIRI-Institute/moderngena-base
by AIRI-Institute# ModernGENA base ModernGENA is a DNA foundation model based on ModernBERT (a modernized BERT-style encoder architecture) adapted for genomic sequence modeling. ModernGENA base is the 377M-parameter version introduced in the paper Back to BERT in 2026: ModernGENA as a Strong, Efficient Baseline for…
scvi-tools/tabula-sapiens-fat-scvi
by scvi-toolsScVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. The learned low-dimensional latent representation of the data can be used for visualization and clustering.
Xaira-Therapeutics/X-Cell
by Xaira-TherapeuticsA diffusion language model for genome-scale perturbation prediction across diverse cellular contexts.