Find open-source science resources

mradermacher/Dans-PersonalityEngine-V1.3.0-24b-i1-GGUF

by mradermacher

For a convenient overview and download list, visit our model page for this model.

38910 months ago

mradermacher/Dans-PersonalityEngine-V1.2.0-24b-i1-GGUF

by mradermacher

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

6241 year ago

prov-gigatime/GigaTIME

by prov-gigatime

image-to-image

2795 months ago

SaltySander/MOSAIC

by SaltySander

011 months ago

google/medasr

by google

automatic-speech-recognition

12.2K3 weeks ago

cambridgeltl/SapBERT-from-PubMedBERT-fulltext

by cambridgeltl

feature-extraction

datasets: - UMLS

1.8M2 years ago

nvidia/geneformer_V2_104M

by nvidia

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

245 months ago

openadmet/pxr-chemeleon-baseline

by openadmet

> [!WARNING] > This is a baseline model trained on publicly available data. While we've done our best to curate the data, the model performance is quite poor. Proceed with caution.

262 weeks ago

nvidia/AMPLIFY_120M

by nvidia

> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > model. For instructions on how to install TransformerEngine, please refer to the > official documentation.

6808 months ago

VariantFiltering

Genetics

Filter genetic variants using different criteria such as inheritance model, amino acid change consequence, minor allele frequencies across human populations, splice site strength, conservation, etc.

47 months ago

scToppR

Pathways

scToppR provides an easy-to-use API wrapper for the ToppGene web platform, used for gene ontology and functional enrichment research. The package also integrates visualization tools, making it a convenient tool directly connecting ToppGene to code-based workflows in R. The tool can also easily save results into different formats.

71 month ago

NOASSERTION

CCPlotR

SingleCell

CCPlotR is an R package for visualising results from tools that predict cell-cell interactions from single-cell RNA-seq data. These plots are generic and can be used to visualise results from multiple tools such as Liana, CellPhoneDB, NATMI etc.

472 months ago

HTML

Awesome LLM Scientific Discovery

📋 Paper Collections & Repositories

LLM papers for scientific discovery

3456 months ago

ChemFormula

General Chemistry

ChemFormula provides a class for working with chemical formulas. It allows parsing chemical formulas, calculating formula weights, and generating formatted output strings (e.g. in HTML, LaTeX, or Unicode).

336 months ago

Equiformer

Machine Learning for Physics

Equivariant graph attention Transformer (ICLR2023)

2821 year ago

DeepAnalyze

Data Analysis & Visualization

First agentic LLM for autonomous data science with end-to-end pipeline from data to analyst-grade reports

4.2K1 month ago

NFDI-MatWerk aims to establish a digital infrastructure for Materials Science and Engineering (MSE), fostering improved data sharing and collaboration. This repository provides comprehensive documentation for NFDI MatWerk Ontology (MWO) v3.0.0, a foundational framework designed to structure research data and enhance interoperability within the MSE community. To ensure compliance with top-level ontology standards, MWO v3.0.0 is aligned with the Basic Formal Ontology (BFO) and incorporates the modular approach of the NFDIcore mid-level ontology, enriching metadata through standardized classes and properties. The mwo addresses key aspects of MSE research data, including the NFDI-MatWerk community structure, covering task areas, infrastructure use cases, projects, researchers, and organizations. It also describes essential NFDI resources, such as software, workflows, ontologies, publications, datasets, metadata schemas, instruments, facilities, and educational materials. Additionally, mwo represents NFDI-MatWerk services, academic events, courses, and international collaborations. As the foundation for the MSE Knowledge Graph, mwo facilitates efficient data integration and retrieval, promoting collaboration and knowledge representation across MSE domains. This digital transformation enhances data discoverability, reusability, and accelerates scientific exchange, innovation, and discoveries by optimizing research data management and accessibility. (from repository)

12 weeks ago

Makefile

CC0-1.0

International Histocompatibility Workshop cell lines

The International Histocompatibility Working Group provides a comprehensive inventory of HLA reference genes to support worldwide research in immunogenetics. We also offer selected cell lines and DNA from our substantial DNA Bank of more than 1,000 cell lines from selected families, as well as individuals with diverse ethnicity and immunologic characteristics.

Galaxy Training Network

Identifiers in the GTN correspond to training materials in various formats (markdown, slides, video). The users can apply learned concepts directly within the framework via galaxy workflows.

3651 day ago

HTML

CC-BY-4.0

DECIPHER CNV Syndromes

CNV syndromes in the DECIPHER genomics database that are linked to Human Phenotype Ontology terms

arrayexpress

Keylab/COMO

by Keylab

COMO (Closed-loop Optical Molecule recOgnition) is a deep learning framework for Optical Chemical Structure Recognition (OCSR). It recognizes chemical structure diagrams from images and predicts SMILES strings with atom-level 2D coordinates and bond matrices.

01 day ago

ibm-research/biomed.omics.bl.sm.ma-ted-458m.moleculenet_bbbp

by ibm-research

Drugs targeting the central nervous system must meet stringent criteria for both efficacy and safety, including their ability to penetrate the blood-brain barrier (BBB). This model predicts the likelihood of small-molecule drugs crossing the BBB, a critical factor in CNS drug development.

341 year ago

ibm-research/biomed.omics.bl.sm.ma-ted-458m.tcr_epitope_bind

by ibm-research

T-cell receptor (TCR) binding to immunogenic peptides (epitopes) presented by major histocompatibility complex (MHC) molecules is a critical mechanism in the adaptive immune system, essential for antigen recognition and triggering immune responses.

451 year ago

ibm-research/biomed.omics.bl.sm.ma-ted-458m.protein_solubility

by ibm-research

Protein solubility is a critical factor in both pharmaceutical research and production processes, as it can significantly impact the quality and function of a protein. This is an example for finetuning ibm/biomed.omics.bl.sm-ted-458m for protein solubility prediction (binary classification) based…

621 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-LIPOPHILICITY-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-LIPOPHILICITY-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image,…

1.8K1 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-SIDER-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-SIDER-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

101 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-BACE-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-BACE-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

1.6K2 months ago

ibm-research/biomed.sm.mv-te-84m

by ibm-research

# ibm-research/biomed.sm.mv-te-84m biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of molecules in a foundation model…

12.1K2 months ago

gbyuvd/chemembed-chemselfies

by gbyuvd

sentence-similarity

ChemFIE-BED is a sentence-transformers based on gbyuvd/chemselfies-base-bertmlm fine-tuned on around (for now) 2 million pairs of valid molecules' SELFIES (Krenn et al. 2020) taken from COCONUTDB (Sorokina et al. 2021) and ChemBL34 (Zdrazil et al. 2023).

6326 months ago

gbyuvd/chemselfies-base-bertmlm

by gbyuvd

This model is a lightweight model pre-trained on SELFIES (Self-Referencing Embedded Strings) representations of molecules. It is trained on 2.7M unique and valid molecules taken from COCONUTDB and ChemBL34, with 7.3M total generated masked examples.

137 months ago

littleworth/protgpt2-distilled-medium

by littleworth

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 31% better perplexity than standard knowledge distillation at 3.8x compression.

652 months ago

littleworth/protgpt2-distilled-tiny

by littleworth

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 87% better perplexity than standard knowledge distillation at 20x compression.

202 months ago

ncfrey/ChemGPT-1.2B

by ncfrey

# ChemGPT 1.2B ChemGPT is based on the GPT-Neo model and was introduced in the paper Neural Scaling of Deep Chemical Models.

2.6K3 years ago

ncfrey/ChemGPT-19M

by ncfrey

# ChemGPT 19M ChemGPT is based on the GPT-Neo model and was introduced in the paper Neural Scaling of Deep Chemical Models.

6.9K3 years ago

ncfrey/ChemGPT-4.7M

by ncfrey

# ChemGPT 4.7M ChemGPT is based on the GPT-Neo model and was introduced in the paper Neural Scaling of Deep Chemical Models.

4.1K3 years ago

seyonec/ChemBERTa-zinc-base-v1

by seyonec

Deep learning for chemistry and materials science remains a novel field with lots of potiential. However, the popularity of transfer learning based methods in areas such as NLP and computer vision have not yet been effectively developed in computational chemistry + machine learning.

254.7K5 years ago

Prior-Labs/tabpfn_2_6

by Prior-Labs

tabular-classification

### Model Overview TabPFN-2.6 is a transformer-based foundation model that uses in-context-learning to solve tabular prediction problems in a forward pass. Inference code can be found at https://github.com/PriorLabs/tabPFN.

11.6K1 month ago

SaeedLab/ProteoRift

by SaeedLab

feature-extraction

Github | Cite

52 months ago

InstaDeepAI/instanovo-phospho-v1.0.0

by InstaDeepAI

InstaNovo-P is a specialized transformer-based model for de novo peptide sequencing from phosphoproteomics mass spectrometry data. This model is specifically trained and optimized for identifying phosphorylated peptides and their modification sites.

282 weeks ago

InstaDeepAI/instanovo-v1.0.0

by InstaDeepAI

# InstaNovo: De novo Peptide Sequencing Model ## Model Description

352 weeks ago

InstaDeepAI/instanovo-v1.1.0

by InstaDeepAI

# InstaNovo: De novo Peptide Sequencing Model ## Model Description

342 weeks ago

andrewdalpino/ESM2-150M-Protein-Cellular-Component

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

2611 months ago

andrewdalpino/ESM2-150M-Protein-Molecular-Function

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

1311 months ago

andrewdalpino/ESM2-35M-Protein-Biological-Process

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

2811 months ago

andrewdalpino/ESM2-35M-Protein-Molecular-Function

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

2011 months ago

andrewdalpino/ESM2-35M-Protein-Cellular-Component

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

1711 months ago