First of all, i reckon that is a fantastic work. I want to ask some problem about it:
i: Fig.1 in this paper, Benzylpenicillin canonical SMILES is

but i get it in the website of ChEMBL is

include Fig.3, the canonical SMILES of CHEMBL351484 is different from website of ChEMBL.
And i use rdkit to get these canonical SMILES, i get the same result as website of ChEMBL
Because of them, i get a little confused.
ii: i want to ask you where i can find datasets input of SMILES canonicalization model. Just as 17,657,995 canonicalization pairs written in reactions format separated by ‘ >> ’. Each pair contained on the left side a non-canonical, and on the right side—a canonical SMILES for the same molecule.
I hope to get your reply. Thanks.
First of all, i reckon that is a fantastic work. I want to ask some problem about it:
i: Fig.1 in this paper, Benzylpenicillin canonical SMILES is


but i get it in the website of ChEMBL is
include Fig.3, the canonical SMILES of CHEMBL351484 is different from website of ChEMBL.
And i use
rdkitto get these canonical SMILES, i get the same result as website of ChEMBLBecause of them, i get a little confused.
ii: i want to ask you where i can find datasets input of SMILES canonicalization model. Just as 17,657,995 canonicalization pairs written in reactions format separated by ‘ >> ’. Each pair contained on the left side a non-canonical, and on the right side—a canonical SMILES for the same molecule.
I hope to get your reply. Thanks.