datasets input of SMILES canonicalization model

First of all, i reckon that is a fantastic work.  I want to ask some problem about it:

i:  Fig.1 in this paper, Benzylpenicillin canonical SMILES is 
![image](https://user-images.githubusercontent.com/32425458/136384857-ee2f3465-c699-459d-ae27-65ac9534a351.png)
but i get it in the website of ChEMBL is 
![image](https://user-images.githubusercontent.com/32425458/136385183-864c972c-9a9c-4fb9-a758-0d6a2c135133.png)
include Fig.3, the canonical SMILES of CHEMBL351484 is different from website of ChEMBL.
And i use `rdkit` to get these canonical SMILES, i get the same result as website of ChEMBL
 Because of them, i get a little confused.
ii: i want to ask you where i can find datasets input of SMILES canonicalization model. Just as 17,657,995 canonicalization pairs written in reactions format separated by ‘ >> ’.  Each pair contained on the left side a non-canonical, and on the right side—a canonical SMILES for the same molecule.

I hope to get your reply. Thanks.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets input of SMILES canonicalization model #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

datasets input of SMILES canonicalization model #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions