Open Science Index

Find open-source science resources

Cross-domain directory aggregating tools, AI models, datasets, and research resources from bio.tools, Bioconductor, HuggingFace, curated GitHub awesome-lists, and more.

Filters

Domain

text-generation21
fill-mask17
text-classification9
image-text-to-text5
feature-extraction4
question-answering3
sentence-similarity3
token-classification3
text-ranking2
automatic-speech-recognition1
image-classification1
image-feature-extraction1
(None)7

Language(1)

Python79
(None)93

License

(None)79

Source(1)

huggingface79
github46
awesome-ai-for-science22
bio.tools17
bioregistry14
awesome-python-chemistry5
awesome-bioinformatics3
awesome-cheminformatics3

Type

AI model79

Filters

Domain

text-generation21
fill-mask17
text-classification9
image-text-to-text5
feature-extraction4
question-answering3
sentence-similarity3
token-classification3
text-ranking2
automatic-speech-recognition1
image-classification1
image-feature-extraction1
(None)7

Language(1)

Python79
(None)93

License

(None)79

Source(1)

huggingface79
github46
awesome-ai-for-science22
bio.tools17
bioregistry14
awesome-python-chemistry5
awesome-bioinformatics3
awesome-cheminformatics3

Type

AI model79

79 of 5,674 resources

Showing 51–79

littleworth/protgpt2-distilled-tiny

by littleworth

text-generation

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 87% better perplexity than standard knowledge distillation at 20x compression.

↓202 months ago

andrewdalpino/ESM2-150M-Protein-Cellular-Component

by andrewdalpino

text-classification

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

↓2611 months ago

BGI-HangzhouAI/Genos-m

by BGI-HangzhouAI

text-generation

Genos-m is a foundation model for human-associated microbial genomes. It is trained to model microbial DNA sequences at single-nucleotide resolution and supports ultra-long genomic contexts up to one million tokens.

↓245 days ago

Acryl-aLLM/ALLM.H-Bv4-Gemma4-31B-BF16

by Acryl-aLLM

text-generation

↓131 month ago

Junhauwong/Surge-Cognition-4x8B

by Junhauwong

text-generation

↓325 days ago

prithivMLmods/Indian-Western-Food-34

by prithivMLmods

image-classification

!fffffff.png

↓271 year ago

BioMistral/BioMistral-7B-GGUF

by BioMistral

text-generation

Abstract:

↓8752 years ago

Dr-BERT/DrBERT-7GB

by Dr-BERT

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

↓1.4K2 years ago

Dr-BERT/DrBERT-4GB

by Dr-BERT

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

↓3092 years ago

OpenMed/OpenMed-NER-ChemicalDetect-ElectraMed-33M

by OpenMed

token-classification

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature

↓529 months ago

ConvergeBio/virtual-cell-patient

by ConvergeBio

feature-extraction

A patient-level disease classification model trained on single-cell RNA-seq data. Given a matrix of gene expression profiles (one row per cell), the model produces a disease-category prediction for the patient.

↓692 weeks ago

zeroentropy/zerank-1-small-reranker

by zeroentropy

In search enginers, rerankers are crucial for improving the accuracy of your retrieval system.

↓12.7K2 months ago

EPFLiGHT/Apertus-70B-MeditronFO

by EPFLiGHT

text-generation

Apertus-70B-MeditronFO is a 70B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-70B-Instruct on the Fully Open Meditron Corpus.

↓3971 week ago

Dr-BERT/DrBERT-4GB-CP-PubMedBERT

by Dr-BERT

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

↓3572 years ago

Dr-BERT/DrBERT-4GB-CP-CamemBERT

by Dr-BERT

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

↓02 years ago

UEG/interface

by UEG

text-classification

↓03 years ago

sagawa/ReactionT5v1-forward

by sagawa

This is a ReactionT5 pre-trained to predict the products of reactions.

↓631 year ago

UmbrellaInc/Prototype-Virus-1B

by UmbrellaInc

question-answering

!image/png

↓82 months ago

andrewdalpino/ESM2-150M-Protein-Biological-Process

by andrewdalpino

text-classification

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

↓311 months ago

PurvaTijare/PPTStab

by PurvaTijare

tabular-regression

PPTStab: Prediction and Designing of thermostable proteins with a desired melting temperature

↓01 year ago

sichengwang04/Qwen3-8B-syco_med-gated-attention-FT

by sichengwang04

text-generation

Qwen3-8B-syco_med-gated-attention-FT is a plug-and-play gated attention weight released for AI safety research.

↓06 days ago

OpenMed/OpenMed-PII-SuperClinical-Small-44M-v1

by OpenMed

token-classification

PII Detection Model | 44M Parameters | Open Source

↓27K4 months ago

Lolimipsu/so_vits_yuuka_voice_model

by Lolimipsu

question-answering

!image/png

↓02 years ago

Hamdan003/inventmol-r1

by Hamdan003

text-generation

Target-Conditioned Molecular Ideation Model for Drug Discovery Research

↓05 days ago

google/medsiglip-448

by google

zero-shot-image-classification

↓32.3K10 months ago

oriel9p/protsent-esm2-150M

by oriel9p

sentence-similarity

↓02 weeks ago

oriel9p/protsent-esm2-35M

by oriel9p

sentence-similarity

↓02 weeks ago

ScientaLab/eva-rna

by ScientaLab

feature-extraction

↓152 months ago

unsloth/medgemma-27b-it-GGUF

by unsloth

image-text-to-text

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

↓7.7K10 months ago

1
2

Next →

Submit a resource bio.tools Awesome Bioinformatics