Find open-source science resources
Cross-domain directory aggregating tools, AI models, datasets, and research resources from bio.tools, Bioconductor, HuggingFace, curated GitHub awesome-lists, and more.
Filters
Domain
Language
License
Source
Type
5,674 resources indexed
Showing 301–350
Apertus-70B-MeditronFO is a 70B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-70B-Instruct on the Fully Open Meditron Corpus.
jackxinning/Leanly_AI
by jackxinningsmgjch/Meow-Omni-1
by smgjchMeow-Omni 1 is the world’s first Multimodal Large Language Model (MLLM) specifically engineered for Computational Ethology. It natively co-embeds four distinct modalities—Text, Video, Audio, and Biological Time-Series—to decode the latent intentions of non-verbal species.
microsoft/NatureLM-8x7B-Inst
by microsoft# Model details ## Model description Nature Language Model (NatureLM) is a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including…
microsoft/NatureLM-8x7B
by microsoft# Model details ## Model description Nature Language Model (NatureLM) is a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including…
vitreg4so150mp14ls_dino-v2-bio is a Bio-DINO image encoder for natural photographs of living organisms. It uses a SoViT-150M/14 Vision Transformer with 4 register tokens and 133.6M backbone parameters, trained with a DINOv2-style self-supervised objective on approximately 31 million curated images…
Curated open dataset collection of 602M+ observational and perturbational single-cell profiles for accelerating virtual cell model creation, integrating Tahoe-100M and scBaseCount data with Google Cloud Marketplace distribution (Arc Institute, 2025-2026)
Probabilistic framework for inferring cell fate decisions and trajectory dynamics from multi-view single-cell data using Markov chains and machine learning, integrating RNA velocity, pseudotime, and metabolic labeling to predict differentiation paths and terminal states (scverse/Theis Lab, 449+ stars, BSD 3-Clause)
Google DeepMind's official collection of agentic science skills accelerating scientific workflows with better grounding and higher token efficiency, integrating insights from AlphaGenome, AFDB, UniProt and 30+ other databases and tools (2026)
An ontology that enables the metadata properties of the DataCite Metadata Schema Specification (i.e., a list of metadata properties for the accurate and consistent identification of a resource for citation and retrieval purposes) to be described in RDF.
A data model for managing information about chemical entities, ranging from atoms through molecules to complex mixtures.
The covid-19 epidemiology and monitoring ontology (cemo) provides a common ontological model to make epidemiological quantitative data for monitoring the covid-19 outbreak machine-readable and interoperable to facilitate its exchange, integration and analysis, to eventually support evidence-based rapid response.
CCSO is an educational ontology acting as a data model for concepts and entities within an academic setting, enabling also the annotation of potentially available resources. The ontology aims to conceptualize educational entities within Curriculum and Syllabus with appropriate coverage and quality, in order to support rich services on top for improving curriculum management and automatically enabling syllabus semantic processes. (from homepage)
An ontology that permits the number of in-text citations of a cited source to be recorded, together with their textual citation contexts, along with the number of citations a cited entity has received globally on a particular date.
An extension of Schema.org to annotate metadata on software projects
An ontology meant to define bibliographic records, bibliographic references, and their compilation into bibliographic collections and bibliographic lists, respectively.
An ontology that allows the description of numerical and categorical bibliometric data (e.g., journal impact factor, author h-index, categories describing research careers) in RDF.
Babelon is a simple standard for managing ontology translations and language profiles. Profiles are managed as TSV files, see for example https://github.com/obophenotype/hpo-translations/tree/main/babelon. The goal of Babelon as a data model and vocabulary is to capture the minimum data required to capture important metadata such as confidence and precision of translation.
An EMMO-based domain ontology for atomistic and electronic modelling.
A representation of variables appearing in models in the environmental research space.
Algorithm Metadata Vocabulary is a vocabulary for capturing and storing the metadata about the algorithms (a procedure or a set of rules that is followed step-by-step to solve a problem, especially by a computer). There are uncountable algorithms present in every area (e.g., Computer Science, Mathematics), which makes it hard for specialists, academicians, application engineers, and so forth to discover, distinguish, select, and reuse them. [from repository]
The academic event ontology, currently still in development and thus unstable, is an OBO compliant reference ontology for describing academic events such as conferences, workshops or seminars and their series. It is being developed as part of the [ConfIDent project](https://projects.tib.eu/confident/) to allow RDF representations of the academic events and series stored and curated in the [ConfIDent platform](https://www.confident-conference.org/index.php/main_page).
This ontology models classes and relationships describing deep learning networks, their component layers and activation functions, as well as potential biases.
scvi-tools/tabula-sapiens-heart-stereoscope
by scvi-toolsStereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.
scvi-tools/tabula-sapiens-heart-scanvi
by scvi-toolsScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…
scvi-tools/tabula-sapiens-fat-stereoscope
by scvi-toolsStereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.
# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-TOXCAST-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…
# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-TOX21-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…
In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.
Dr-BERT/DrBERT-4GB-CP-CamemBERT
by Dr-BERTIn recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.
Sevenlee/kkk
by Sevenleesagawa/ReactionT5v1-forward
by sagawaThis is a ReactionT5 pre-trained to predict the products of reactions.
ByteDance-Seed/bamboo_mixer
by ByteDance-SeedThis repository contains the official model of the paper A Unified Predictive and Generative Solution for Liquid Electrolyte Formulation.
InstaDeepAI/instanovoplus-v1.1.0
by InstaDeepAIInstaNovoPlus is a diffusion-based model for de novo peptide sequencing from mass spectrometry data. This model leverages multinomial diffusion for accurate, database-free peptide identification for large-scale proteomics experiments.
InstaDeepAI/winnow-helaqc-model
by InstaDeepAIWinnow recalibrates confidence scores and provides FDR control for de novo peptide sequencing (DNS) workflows. This repository contains the calibrator trained on HeLa Single Shot data as referenced in our paper: De novo peptide sequencing rescoring and FDR estimation with Winnow.
InstaDeepAI/winnow-general-model
by InstaDeepAIWinnow recalibrates confidence scores and provides FDR control for de novo peptide sequencing (DNS) workflows. This repository hosts a pretrained, general-purpose calibrator that maps raw InstaNovo model confidences and complementary features (mass error, retention time, chimericity, beam features,…
An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…
PurvaTijare/PPTStab
by PurvaTijarePPTStab: Prediction and Designing of thermostable proteins with a desired melting temperature
scvi-tools/tabula-sapiens-fat-scanvi
by scvi-toolsScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…
scvi-tools/tabula-sapiens-fat-scvi
by scvi-toolsScVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. The learned low-dimensional latent representation of the data can be used for visualization and clustering.
scvi-tools/tabula-sapiens-eye-stereoscope
by scvi-toolsStereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.
scvi-tools/tabula-sapiens-eye-condscvi
by scvi-toolsCondSCVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in DestVI.
scvi-tools/tabula-sapiens-eye-scanvi
by scvi-toolsScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…