gbyuvd/chemselfies-base-bertmlm

fill-mask
Maintenance lightby gbyuvd131updated 7 months ago
Python

This model is a lightweight model pre-trained on SELFIES (Self-Referencing Embedded Strings) representations of molecules. It is trained on 2.7M unique and valid molecules taken from COCONUTDB and ChemBL34, with 7.3M total generated masked examples.

README

datasets: COCONUTDB ChemBL34 language: code libraryname: transformers metrics: perplexity accuracy pipelinetag: fill-mask tags: fill-mask chemistry selfies drug-discovery herbal coconutdb chembl34 drugs molecules compounds ranger21 madgrad widget: text: >- [C] [C] [=Branch1] [C] [MASK] [O] [C] [C] [N+1] [Branch1] [C] [C] [Branch1] [C] [C] [C] exampletitle: '[=O]' text: >- [O-1] [P] [=Branch1] [C] [=O] [Branch1] [C] [MASK] [O] [P] [=Branch1] [C] [=O] [Branch1] [C] [O-1] [O-1] .[99Tc+4]…

Source attribution

  • HuggingFacegbyuvd/chemselfies-base-bertmlm

Related resources

This model is a BERT-like sequence classifier for 221 human protein drug targets, fine-tuned from gbyuvd/chemselfies-base-bertmlm on a dataset derived ChemBL34 (Zdrazil et al. 2023). It predicts potential drug targets using chemical structures represented as SELFIES (Self-Referencing Embedded…

91 year ago
Python

Target-Conditioned Molecular Ideation Model for Drug Discovery Research

04 days ago
Python

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

3572 years ago
Python

ChemFIE-SA is a BERT-like sequence classifier for predicting synthesis accessibility given a SELFIES string of a compound, fine-tuned from gbyuvd/chemselfies-base-bertmlm on DeepSA's expanded dataset from Wang et al. 2023.

91 year ago
Python

Deep learning for chemistry and materials science remains a novel field with lots of potiential. However, the popularity of transfer learning based methods in areas such as NLP and computer vision have not yet been effectively developed in computational chemistry + machine learning.

254.2K5 years ago
Python

# Geneformer Geneformer is a foundational transformer model pretrained on a large-scale corpus of human single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.

20.2K1 month ago
Python