Skip to content

Questions about Uni-FEP-Benchmarks data processing: ChEMBL versioning, assay filtering, target–PDB mapping, reference PDB selection #56

Description

@0ut0fcontrol

Hi maintainers,

Thanks for releasing Uni-FEP-Benchmarks.
I have a few technical questions about the Uni-FEP-Benchmarks data processing pipeline (ChEMBL + PDB → benchmarks):

1. ChEMBL version / update plan

  • Was Uni-FEP-Benchmarks built using ChEMBL v35?
  • For example, the latest EGFR assay CHEMBL5260455 is dated 2023-04-18, which is before the ChEMBL 35 release date (December 2024).
  • Do you plan to update the benchmark following new ChEMBL releases (ChEMBL is now at v37)?

2. Assay / affinity endpoint filtering

  • Which affinity types are included (Ki/Kd only, or also IC50/EC50, etc.)?
  • What filters are applied for assay/data quality (assay type, confidence score, curated flags, removing ambiguous units, replicates/uncertainty/outlier handling, etc.)?

3. Target → PDB mapping

  • How is a ChEMBL target mapped to PDB structures (e.g., UniProt/SIFTS mapping, sequence alignment/similarity search)?
  • How are mutations, engineered constructs, missing segments, or isoforms handled?

4. Reference PDB selection

  • When multiple PDB structures are available for the same target, what is the selection strategy for the reference structure?
  • For targets with distinct conformational states (e.g., GPCR agonist vs antagonist), how do you ensure the selected structure matches the relevant state?

5. Reproducible pipeline / scripts

  • Are there scripts or a reproducible workflow available for the ChEMBL + PDB → benchmark generation? If not, any pointers to the key steps would be appreciated.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions