Find open-source science resources

jackxinning/Leanly_AI

by jackxinning

question-answering

6.2K3 weeks ago

Meow-Omni 1 is the world’s first Multimodal Large Language Model (MLLM) specifically engineered for Computational Ethology. It natively co-embeds four distinct modalities—Text, Video, Audio, and Biological Time-Series—to decode the latent intentions of non-verbal species.

2521 week ago

microsoft/NatureLM-8x7B-Inst

by microsoft

# Model details ## Model description Nature Language Model (NatureLM) is a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including…

24011 months ago

microsoft/NatureLM-8x7B

by microsoft

# Model details ## Model description Nature Language Model (NatureLM) is a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including…

3311 months ago

birder-project/vit_reg4_so150m_p14_ls_dino-v2-bio

by birder-project

Genomics & Bioinformatics

image-feature-extraction

vitreg4so150mp14ls_dino-v2-bio is a Bio-DINO image encoder for natural photographs of living organisms. It uses a SoViT-150M/14 Vision Transformer with 4 register tokens and 133.6M backbone parameters, trained with a DINOv2-style self-supervised objective on approximately 31 million curated images…

3.9K3 days ago

Arc Virtual Cell Atlas

Tool

Biology & Medicine

Curated open dataset collection of 602M+ observational and perturbational single-cell profiles for accelerating virtual cell model creation, integrating Tahoe-100M and scBaseCount data with Google Cloud Marketplace distribution (Arc Institute, 2025-2026)

CellRank

Tool

Probabilistic framework for inferring cell fate decisions and trajectory dynamics from multi-view single-cell data using Markov chains and machine learning, integrating RNA velocity, pseudotime, and metabolic labeling to predict differentiation paths and terminal states (scverse/Theis Lab, 449+ stars, BSD 3-Clause)

GDM Science Skills

Tool

Research Workbench & Plugins

Google DeepMind's official collection of agentic science skills accelerating scientific workflows with better grounding and higher token efficiency, integrating insights from AlphaGenome, AFDB, UniProt and 30+ other databases and tools (2026)

DataCite Ontology

An ontology that enables the metadata properties of the DataCite Metadata Schema Specification (i.e., a list of metadata properties for the accurate and consistent identification of a resource for citation and retrieval purposes) to be described in RDF.

41 week ago

XSLT

cryoem

05 years ago

covoc

33 years ago

Makefile

Chemical Entity Materials and Reactions Ontological Framework

A data model for managing information about chemical entities, ranging from atoms through molecules to complex mixtures.

233 days ago

CC0-1.0

The COVID-19 epidemiology and monitoring ontology

The covid-19 epidemiology and monitoring ontology (cemo) provides a common ontological model to make epidemiological quantitative data for monitoring the covid-19 outbreak machine-readable and interoperable to facilitate its exchange, integration and analysis, to eventually support evidence-based rapid response.

73 years ago

TeX

CC0-1.0

Cellosaurus

142 years ago

CC-BY-4.0

Curriculum Course Syllabus Ontology

CCSO is an educational ontology acting as a data model for concepts and entities within an academic setting, enabling also the annotation of potentially available resources. The ontology aims to conceptualize educational entities within Curriculum and Syllabus with appropriate coverage and quality, in order to support rich services on top for improving curriculum management and automatically enabling syllabus semantic processes. (from homepage)

05 months ago

HTML

GPL-3.0

Citation Counting and Context Characterisation Ontology

An ontology that permits the number of in-text citations of a cited source to be recorded, together with their textual citation contexts, along with the number of citations a cited entity has received globally on a particular date.

06 years ago

CodeMeta

schema

An extension of Schema.org to annotate metadata on software projects

3481 month ago

Apache-2.0

Bibliographic Reference Ontology

An ontology meant to define bibliographic records, bibliographic references, and their compilation into bibliographic collections and bibliographic lists, respectively.

06 years ago

Bibliometric Data Ontology

An ontology that allows the description of numerical and categorical bibliometric data (e.g., journal impact factor, author h-index, categories describing research careers) in RDF.

06 years ago

Babelon

Babelon is a simple standard for managing ontology translations and language profiles. Profiles are managed as TSV files, see for example https://github.com/obophenotype/hpo-translations/tree/main/babelon. The goal of Babelon as a data model and vocabulary is to capture the minimum data required to capture important metadata such as confidence and precision of translation.

102 months ago

Jupyter Notebook

MIT

atomistic

An EMMO-based domain ontology for atomistic and electronic modelling.

12 months ago

CC-BY-4.0

Biological and Environmental Research Variable Ontology

A representation of variables appearing in models in the environmental research space.

45 days ago

HTML

Algorithm Metadata Vocabulary

Algorithm Metadata Vocabulary is a vocabulary for capturing and storing the metadata about the algorithms (a procedure or a set of rules that is followed step-by-step to solve a problem, especially by a computer). There are uncountable algorithms present in every area (e.g., Computer Science, Mathematics), which makes it hard for specialists, academicians, application engineers, and so forth to discover, distinguish, select, and reuse them. [from repository]

03 years ago

CC0-1.0

Academic Event Ontology

The academic event ontology, currently still in development and thus unstable, is an OBO compliant reference ontology for describing academic events such as conferences, workshops or seminars and their series. It is being developed as part of the [ConfIDent project](https://projects.tib.eu/confident/) to allow RDF representations of the academic events and series stored and curated in the [ConfIDent platform](https://www.confident-conference.org/index.php/main_page).

141 year ago

Makefile

CC-BY-4.0

The Artificial Intelligence Ontology

This ontology models classes and relationships describing deep learning networks, their component layers and activation functions, as well as potential biases.

491 year ago

Jupyter Notebook

addicto

52 weeks ago

scvi-tools/tabula-sapiens-heart-stereoscope

by scvi-tools

Stereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.

02 months ago

scvi-tools/tabula-sapiens-heart-scanvi

by scvi-tools

ScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…

02 months ago

scvi-tools/tabula-sapiens-fat-stereoscope

by scvi-tools

Stereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.

02 months ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-TOXCAST-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-TOXCAST-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

51 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-TOX21-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-TOX21-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

131 year ago

Dr-BERT/DrBERT-4GB-CP-PubMedBERT

by Dr-BERT

fill-mask

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

3572 years ago

Dr-BERT/DrBERT-4GB-CP-CamemBERT

by Dr-BERT

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

02 years ago

Sevenlee/kkk

by Sevenlee

image-segmentation

03 years ago

UEG/interface

by UEG

text-classification

03 years ago

sagawa/ReactionT5v1-forward

by sagawa

This is a ReactionT5 pre-trained to predict the products of reactions.

631 year ago

UmbrellaInc/Prototype-Virus-1B

by UmbrellaInc

question-answering

!image/png

82 months ago

ByteDance-Seed/bamboo_mixer

by ByteDance-Seed

This repository contains the official model of the paper A Unified Predictive and Generative Solution for Liquid Electrolyte Formulation.

09 months ago

SaeedLab/SpeCollate

by SaeedLab

feature-extraction

Github | Cite

32 months ago

InstaDeepAI/instanovoplus-v1.1.0

by InstaDeepAI

text-generation

InstaNovoPlus is a diffusion-based model for de novo peptide sequencing from mass spectrometry data. This model leverages multinomial diffusion for accurate, database-free peptide identification for large-scale proteomics experiments.

47 months ago

InstaDeepAI/winnow-helaqc-model

by InstaDeepAI

Winnow recalibrates confidence scores and provides FDR control for de novo peptide sequencing (DNS) workflows. This repository contains the calibrator trained on HeLa Single Shot data as referenced in our paper: De novo peptide sequencing rescoring and FDR estimation with Winnow.

02 weeks ago

InstaDeepAI/winnow-general-model

by InstaDeepAI

Winnow recalibrates confidence scores and provides FDR control for de novo peptide sequencing (DNS) workflows. This repository hosts a pretrained, general-purpose calibrator that maps raw InstaNovo model confidences and complementary features (mass error, retention time, chimericity, beam features,…

02 weeks ago

andrewdalpino/ESM2-150M-Protein-Biological-Process

by andrewdalpino

text-classification

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

311 months ago

PurvaTijare/PPTStab

by PurvaTijare

tabular-regression

PPTStab: Prediction and Designing of thermostable proteins with a desired melting temperature

01 year ago

scvi-tools/tabula-sapiens-fat-scanvi

by scvi-tools

ScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…

02 months ago

scvi-tools/tabula-sapiens-fat-scvi

by scvi-tools

ScVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. The learned low-dimensional latent representation of the data can be used for visualization and clustering.

02 months ago

scvi-tools/tabula-sapiens-eye-stereoscope

by scvi-tools

Stereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.

02 months ago

scvi-tools/tabula-sapiens-eye-condscvi

by scvi-tools

CondSCVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in DestVI.

02 months ago

scvi-tools/tabula-sapiens-eye-scanvi

by scvi-tools