Reproducible submission repository for the Mandrake Bio Retroviral Wall Challenge. The project predicts reverse transcriptase (RT) activity from sequence, protein-family, embedding, and structure-derived features under leave-one-family-out (LOFO) validation.
The repository is packaged so a reviewer can reproduce the submitted CSV and verify the reported score with two commands after environment setup.
Primary submission files:
submission.csvresults/calibrated_predictions_selected_features.csvWRITEUP.mdrequirements.txt
Required prediction schema:
rt_name,predicted_active,predicted_score
Final verified metrics on submission.csv:
PR-AUC: 0.9857
Weighted Spearman: 0.9652
CLS: 0.9754
The exact full-precision values recorded for the submitted artifact are:
PR-AUC = 0.985714
Weighted Spearman = 0.965244
CLS = 0.975372
From the repository root:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python experiments\materialize_final_submission.py
python evaluation\evaluate.py --predictions submission.csvExpected output:
PR-AUC: 0.9857
Weighted Spearman: 0.9652
CLS: 0.9754
On macOS/Linux, use:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python experiments/materialize_final_submission.py
python evaluation/evaluate.py --predictions submission.csvThe final submission is reconstructed from LOFO out-of-fold predictions generated by the family-conditioned pipeline:
results/calibrated_predictions_selected_features.csv
experiments/materialize_final_submission.py reconstructs the final LOFO out-of-fold prediction vector from the preserved experiment outputs generated by the family-conditioned LOFO pipeline, applies the fixed binary decision rule, and writes:
submission.csv
results/calibrated_predictions_selected_features.csv
predicted_score is not changed during materialization. predicted_score is the primary ranking output used for CLS evaluation. predicted_active is derived from predicted_score using a fixed decision rule defined inside the pipeline.
Artifact hashes after materialization:
SHA256 submission.csv
B9CFADF6BF3AFED5BEA1E5D848D5E8F4EF510E322DB61373316A3D9EDC6CA84E
SHA256 results/calibrated_predictions_selected_features.csv
B9CFADF6BF3AFED5BEA1E5D848D5E8F4EF510E322DB61373316A3D9EDC6CA84E
Check the submitted score:
python evaluation\evaluate.py --predictions submission.csvCheck the CSV schema:
python -c "import pandas as pd; p=pd.read_csv('submission.csv'); print(list(p.columns)); print(len(p)); print(sorted(map(int, p.predicted_active.unique()))); print(int(p.isna().sum().sum()))"Expected schema check:
['rt_name', 'predicted_active', 'predicted_score']
57
[0, 1]
0
Generate a confusion matrix against the included labels:
python experiments\confusion_matrix_eval.pyExpected classification counts:
TP: 20
FP: 1
TN: 35
FN: 1
Run the shuffle-label leakage sanity check:
python experiments\shuffle_labels_test.pyAll predictions were generated using strict leave-one-family-out cross-validation across the 7 RT families.
Each family was excluded from training before prediction, and CLS was computed on pooled out-of-fold predictions across all 57 enzymes.
The modeling objective was to predict RT activity while respecting evolutionary family boundaries. The strongest candidate was selected using LOFO-style evaluation, where each RT family is held out from training before predicting that family. This prevents inflated performance from close family-level similarity.
The feature set combines:
- sequence composition and charge features
- structural descriptors from predicted RT structures
- catalytic-site and cleft-geometry features
- secondary-structure summaries
- ESM2 embedding-derived similarity and ranking signals
- family-conditioned and residual rank-correction signals
The final artifact was selected for the best balance between active/inactive separation and efficiency-aware ranking under the official CLS metric:
CLS = harmonic_mean(PR-AUC, Weighted Spearman)
data/
rt_sequences.csv Challenge labels, families, sequences, efficiency values
family_splits.csv Family-level sample counts
handcrafted_features.csv Precomputed sequence/biophysical features
structure_features.csv Predicted-structure feature table
topology_features.csv Structural/topological feature table
esm2_embeddings.npz Precomputed ESM2 embeddings
structures/ Predicted protein structures
evaluation/
evaluate.py Official-style CLS evaluator
experiments/
materialize_final_submission.py Rebuilds final submission CSVs
family_conditioned_lofo_search.py Main family-conditioned LOFO search lineage
quantum_residual_ranker.py Quantum-inspired residual ranking experiment
lambdarank_residual_ranker.py Active-subset LambdaRank-style residual experiment
nested_residual_ranker.py Strict nested-LOFO residual correction experiment
confusion_matrix_eval.py Binary-label sanity report
shuffle_labels_test.py Leakage sanity check
src/
data_loader.py Data loading helpers
feature_engineering.py Main feature construction pipeline
structure_features.py Structure feature extraction utilities
train_*.py Earlier model training scripts
results/
calibrated_predictions_selected_features.csv
Final LOFO prediction artifact
*.csv, *.json, *.png Candidate outputs, diagnostics, and plots
The shortest reproducible path is:
results/calibrated_predictions_selected_features.csv
-> experiments/materialize_final_submission.py
-> submission.csv
-> evaluation/evaluate.py
The main research/development scripts are:
experiments/family_conditioned_lofo_search.py: strongest family-conditioned hybrid LOFO search.experiments/quantum_residual_ranker.py: local quantum-inspired residual ranking.experiments/lambdarank_residual_ranker.py: active-subset rank residual experiment.experiments/nested_residual_ranker.py: strict nested residual correction.src/feature_engineering.py: central construction of sequence, structure, and topology features.
Several late-stage scripts are retained for auditability. They are deterministic given the included data and fixed seeds, but they are exploratory and are not required to regenerate the submitted CSV.
Install all declared dependencies with:
python -m pip install -r requirements.txtThe final artifact reproduction path requires only the core scientific Python stack: numpy, pandas, scipy, and scikit-learn. The full requirements.txt also includes packages used by exploratory scripts, plots, structure utilities, ESM/embedding experiments, XGBoost baselines, and quantum-inspired ranking experiments.
Recommended interpreter:
Python 3.10-3.12 for broad package compatibility.
The final CSV materialization and evaluator were also verified in this workspace on:
Python 3.14.2
- Install dependencies from
requirements.txt. - Run
python experiments\materialize_final_submission.py. - Confirm
submission.csvhas exactly 57 rows and columnsrt_name,predicted_active,predicted_score. - Run
python evaluation\evaluate.py --predictions submission.csv. - Confirm the evaluator reports
CLS: 0.9754. - Submit
submission.csv,WRITEUP.md, and this repository as the runnable code package.
- No external data download is required for the final reproduction path; all required inputs are included under
data/andresults/. - The submitted
predicted_scorevalues are continuous rank scores. Higher means more likely active and/or higher-efficiency. predicted_activeis derived frompredicted_scoreusing a fixed pipeline rule and is included to satisfy the submission schema. The official ranking metrics usepredicted_score.- The deadline text in the challenge instructions should be checked against the active submission form before upload.