Find open-source science resources

Cross-domain directory aggregating tools, AI models, datasets, and research resources from bio.tools, Bioconductor, HuggingFace, curated GitHub awesome-lists, and more.

172 of 5,674 resources

Showing 51100

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

6341 year ago
Python
2705 months ago

This model is a BERT-like sequence classifier for 221 human protein drug targets, fine-tuned from gbyuvd/chemselfies-base-bertmlm on a dataset derived ChemBL34 (Zdrazil et al. 2023). It predicts potential drug targets using chemical structures represented as SELFIES (Self-Referencing Embedded…

91 year ago
Python

The ibm/biomed.omics.bl.sm.ma-ted-458m model is a biomedical foundation model trained on over 2 billion biological samples across multiple modalities, including proteins, small molecules, and single-cell gene data. Designed for robust performance, it achieves state-of-the-art results over a variety…

2.3K1 year ago

ESM Cambrian is a parallel model family to our flagship ESM3 generative models. While ESM3 focuses on controllable generation of proteins for therapeutic and many other applications, ESM C focuses on creating representations of the underlying biology of proteins.

6.6K6 days ago

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

155 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

285 months ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

2511 months ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

1511 months ago
Python

# InstaNovo: De novo Peptide Sequencing Model ## Model Description

352 weeks ago

# InstaNovo: De novo Peptide Sequencing Model ## Model Description

362 weeks ago

# ChemGPT 4.7M ChemGPT is based on the GPT-Neo model and was introduced in the paper Neural Scaling of Deep Chemical Models.

4.2K3 years ago
Python

# ChemGPT 1.2B ChemGPT is based on the GPT-Neo model and was introduced in the paper Neural Scaling of Deep Chemical Models.

3.4K3 years ago
Python

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-ESOL-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

221 year ago

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-BBBP-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

221 year ago

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-QM7-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

191 year ago

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-FREESOLV-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

151 year ago

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-HIV-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

151 year ago

Protein solubility is a critical factor in both pharmaceutical research and production processes, as it can significantly impact the quality and function of a protein. This is an example for finetuning ibm/biomed.omics.bl.sm-ted-458m for protein solubility prediction (binary classification) based…

641 year ago

Accurate prediction of drug-target binding affinity is essential in the early stages of drug discovery. This is an example of finetuning ibm/biomed.omics.bl.sm-ted-400 the task. Prediction of binding affinities using pKd, the negative logarithm of the dissociation constant, which reflects the…

1721 year ago

T-cell receptor (TCR) binding to immunogenic peptides (epitopes) presented by major histocompatibility complex (MHC) molecules is a critical mechanism in the adaptive immune system, essential for antigen recognition and triggering immune responses.

441 year ago

Drugs must satisfy stringent criteria for both efficacy and safety. This model predicts the likelihood of failure in clinical toxicity trials for small-molecule drugs, represented using SMILES (Simplified Molecular Input Line Entry System) strings.

241 year ago

Drugs must satisfy stringent criteria for both efficacy and safety. This model predicts the likelihood of FDA approval for small-molecule drugs, represented using SMILES (Simplified Molecular Input Line Entry System) strings.

261 year ago

Drugs targeting the central nervous system must meet stringent criteria for both efficacy and safety, including their ability to penetrate the blood-brain barrier (BBB). This model predicts the likelihood of small-molecule drugs crossing the BBB, a critical factor in CNS drug development.

351 year ago

Base model: google/gemma-4-26b-it Architecture: MoE — 26B total / ≈4B active parameters (1 shared expert + 8 routed from a pool of 128 per MoE layer, 30 MoE layers) Method: Activation-directed expert surgery — 128 → 64 experts per layer (50% reduction) Quantization: Q4KM (≈9.7 GB on disk) Tags:…

581 week ago

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. This model version was continually pretrained on ~14 million cancer transcriptomes…

135 months ago
Python

# or·a·cle /ˈôrəkəl/ — a source of wise counsel; one who provides authoritative knowledge. From Latin ōrāculum, meaning divine announcement. In computer science, an oracle is a black box that always returns the correct answer — you don't ask it how it knows, you ask and it answers.

1362 months ago
Python

ONNX export of the Cellpose cpsam (Cellpose-SAM) model for cell segmentation in microscopy images.

03 months ago

darkknight25/deepseek-16b-medical-GPT is a fine-tuned version of deepseek-ai/deepseek-l6b-moe-chat, optimized for medical question answering, reasoning, and clinical summarization using QLoRA and open-access healthcare datasets.

010 months ago
Python

> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > model. For instructions on how to install TransformerEngine, please refer to the > official documentation.

278 months ago
Python

For a convenient overview and download list, visit our model page for this model.

38910 months ago
Python
011 months ago

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

245 months ago
Python

COMO (Closed-loop Optical Molecule recOgnition) is a deep learning framework for Optical Chemical Structure Recognition (OCSR). It recognizes chemical structure diagrams from images and predicts SMILES strings with atom-level 2D coordinates and bond matrices.

01 day ago

ChemFIE-BED is a sentence-transformers based on gbyuvd/chemselfies-base-bertmlm fine-tuned on around (for now) 2 million pairs of valid molecules' SELFIES (Krenn et al. 2020) taken from COCONUTDB (Sorokina et al. 2021) and ChemBL34 (Zdrazil et al. 2023).

6326 months ago
Python

This model is a lightweight model pre-trained on SELFIES (Self-Referencing Embedded Strings) representations of molecules. It is trained on 2.7M unique and valid molecules taken from COCONUTDB and ChemBL34, with 7.3M total generated masked examples.

137 months ago
Python

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 87% better perplexity than standard knowledge distillation at 20x compression.

202 months ago
Python

Github | Cite

52 months ago

InstaNovo-P is a specialized transformer-based model for de novo peptide sequencing from phosphoproteomics mass spectrometry data. This model is specifically trained and optimized for identifying phosphorylated peptides and their modification sites.

282 weeks ago

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

2611 months ago
Python

Genos-m is a foundation model for human-associated microbial genomes. It is trained to model microbial DNA sequences at single-nucleotide resolution and supports ultra-long genomic contexts up to one million tokens.

244 days ago
Python

# ModernGENA base ModernGENA is a DNA foundation model based on ModernBERT (a modernized BERT-style encoder architecture) adapted for genomic sequence modeling. ModernGENA base is the 377M-parameter version introduced in the paper Back to BERT in 2026: ModernGENA as a Strong, Efficient Baseline for…

4951 month ago

Abstract:

8752 years ago
Python

This repository contains the full Bio-DINO DINOv2 training weights for a SoViT-150M/14 Vision Transformer trained on natural photographs of living organisms. It is the companion release to the Birder backbone checkpoints at .

1324 days ago

ScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…

02 months ago