Hi maintainers,
Thanks for releasing Uni-FEP-Benchmarks.
I have a few technical questions about the Uni-FEP-Benchmarks data processing pipeline (ChEMBL + PDB → benchmarks):
1. ChEMBL version / update plan
- Was Uni-FEP-Benchmarks built using ChEMBL v35?
- For example, the latest EGFR assay CHEMBL5260455 is dated 2023-04-18, which is before the ChEMBL 35 release date (December 2024).
- Do you plan to update the benchmark following new ChEMBL releases (ChEMBL is now at v37)?
2. Assay / affinity endpoint filtering
- Which affinity types are included (Ki/Kd only, or also IC50/EC50, etc.)?
- What filters are applied for assay/data quality (assay type, confidence score, curated flags, removing ambiguous units, replicates/uncertainty/outlier handling, etc.)?
3. Target → PDB mapping
- How is a ChEMBL target mapped to PDB structures (e.g., UniProt/SIFTS mapping, sequence alignment/similarity search)?
- How are mutations, engineered constructs, missing segments, or isoforms handled?
4. Reference PDB selection
- When multiple PDB structures are available for the same target, what is the selection strategy for the reference structure?
- For targets with distinct conformational states (e.g., GPCR agonist vs antagonist), how do you ensure the selected structure matches the relevant state?
5. Reproducible pipeline / scripts
- Are there scripts or a reproducible workflow available for the ChEMBL + PDB → benchmark generation? If not, any pointers to the key steps would be appreciated.
Thanks
Hi maintainers,
Thanks for releasing Uni-FEP-Benchmarks.
I have a few technical questions about the Uni-FEP-Benchmarks data processing pipeline (ChEMBL + PDB → benchmarks):
1. ChEMBL version / update plan
2. Assay / affinity endpoint filtering
3. Target → PDB mapping
4. Reference PDB selection
5. Reproducible pipeline / scripts
Thanks