Skip to content

datasets input of SMILES canonicalization model #5

@DreamMemory001

Description

@DreamMemory001

First of all, i reckon that is a fantastic work. I want to ask some problem about it:

i: Fig.1 in this paper, Benzylpenicillin canonical SMILES is
image
but i get it in the website of ChEMBL is
image
include Fig.3, the canonical SMILES of CHEMBL351484 is different from website of ChEMBL.
And i use rdkit to get these canonical SMILES, i get the same result as website of ChEMBL
Because of them, i get a little confused.
ii: i want to ask you where i can find datasets input of SMILES canonicalization model. Just as 17,657,995 canonicalization pairs written in reactions format separated by ‘ >> ’. Each pair contained on the left side a non-canonical, and on the right side—a canonical SMILES for the same molecule.

I hope to get your reply. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions