FloralSwitchFormer is an interpretable machine-learning workflow for detecting an M5-centered transcriptional switch during Arabidopsis thaliana shoot apical meristem floral transition.
The project reanalyzes an M1-M8 developmental RNA-seq time series as a gene-level trajectory-learning problem. Instead of relying only on differential expression or clustering, it identifies genes with M5-pulse-like behavior, trains trajectory-only machine-learning models, performs stage-ablation interpretation, and prioritizes hidden switch-like candidates missed by strict rule-based thresholds.
The central biological question is:
Can the floral transition switch be detected directly from gene expression trajectory shape?
FloralSwitchFormer models each gene as an eight-stage expression trajectory:
M1 -> M2 -> M3 -> M4 -> M5 -> M6 -> M7 -> M8
The target pattern is:
moderate or low before M5 -> M5-centered pulse -> decrease after M5
- Clean nuclear gene universe: 17,267 genes
- Strict M5-switch genes: 892 genes
- Hidden trajectory-ML candidates: 35 genes
- Best trajectory-only model: GradientBoosting
- Trajectory-only ROC-AUC: 0.952
- Trajectory-only PR-AUC: 0.486
- CNN-attention ROC-AUC: 0.953
- CNN-attention PR-AUC: 0.482
The strongest interpretability result came from stage/window ablation. Shuffling the full M4-M5-M6 transition window caused the largest drop in model performance.
- Ablated window: M4_M5_M6
- ROC-AUC drop: 0.403
- PR-AUC drop: 0.431
This shows that the model is not learning a random expression artifact. It depends strongly on the M4-M5-M6 transition window, especially the M5-centered pulse.
Hidden candidate discovery
The final hidden candidate table was curated using g:Profiler annotations and manual biological correction. The strongest Tier 1 candidates are:
ATL63, UBC36, ERF070, ERF069, UBC14, ATG8I, TCP7
These candidates were not selected by strict rule-based thresholds, but were recovered by trajectory-only machine learning.
Curated hidden candidate modules
- protein_turnover_proteasome: 7 genes
- transcriptional_regulation: 4 genes
- endomembrane_vesicle_trafficking: 4 genes
- redox_energy_signaling: 3 genes
- ribosome_translation: 3 genes
- cell_wall_glycosylation_metabolism: 3 genes
- low_annotation_candidate: 3 genes
- membrane_protein_processing: 2 genes
- rna_processing: 2 genes
- autophagy_cellular_remodeling: 1 genes
- cell_wall_peptide_metabolism: 1 genes
- chromatin_histone: 1 genes
- cytoskeleton_cell_cycle: 1 genes
The strict M5-switch gene set captures the broad M5 transcriptional pulse, including ribosome and growth-associated signals. The hidden ML-nominated candidates reveal smaller regulatory and remodeling modules, including protein turnover, transcriptional regulation, autophagy, endomembrane trafficking, RNA processing, redox signaling, chromatin, cytoskeleton, and cell-wall remodeling.
- phase5C_FINAL_curated_hidden_candidate_table.csv
- phase5C_curated_hidden_module_summary.csv
- phase5C_followup_tier_summary.csv
- phase3C_stage_ablation_interpreted.csv
- phase2B_trajectory_only_model_performance.csv
- phase3A_deep_model_metrics.csv
FloralSwitchFormer: interpretable machine learning identifies an M5-centered transcriptional switch during Arabidopsis floral transition