Skip to content

Saddeekhan/FloralSwitchFormer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FloralSwitchFormer

FloralSwitchFormer is an interpretable machine-learning workflow for detecting an M5-centered transcriptional switch during Arabidopsis thaliana shoot apical meristem floral transition.

The project reanalyzes an M1-M8 developmental RNA-seq time series as a gene-level trajectory-learning problem. Instead of relying only on differential expression or clustering, it identifies genes with M5-pulse-like behavior, trains trajectory-only machine-learning models, performs stage-ablation interpretation, and prioritizes hidden switch-like candidates missed by strict rule-based thresholds.

Main idea

The central biological question is:

Can the floral transition switch be detected directly from gene expression trajectory shape?

FloralSwitchFormer models each gene as an eight-stage expression trajectory:

M1 -> M2 -> M3 -> M4 -> M5 -> M6 -> M7 -> M8

The target pattern is:

moderate or low before M5 -> M5-centered pulse -> decrease after M5

Main results

  • Clean nuclear gene universe: 17,267 genes
  • Strict M5-switch genes: 892 genes
  • Hidden trajectory-ML candidates: 35 genes
  • Best trajectory-only model: GradientBoosting
  • Trajectory-only ROC-AUC: 0.952
  • Trajectory-only PR-AUC: 0.486
  • CNN-attention ROC-AUC: 0.953
  • CNN-attention PR-AUC: 0.482

Stage-ablation evidence

The strongest interpretability result came from stage/window ablation. Shuffling the full M4-M5-M6 transition window caused the largest drop in model performance.

  • Ablated window: M4_M5_M6
  • ROC-AUC drop: 0.403
  • PR-AUC drop: 0.431

This shows that the model is not learning a random expression artifact. It depends strongly on the M4-M5-M6 transition window, especially the M5-centered pulse.

Hidden candidate discovery

The final hidden candidate table was curated using g:Profiler annotations and manual biological correction. The strongest Tier 1 candidates are:

ATL63, UBC36, ERF070, ERF069, UBC14, ATG8I, TCP7

These candidates were not selected by strict rule-based thresholds, but were recovered by trajectory-only machine learning.

Curated hidden candidate modules

  • protein_turnover_proteasome: 7 genes
  • transcriptional_regulation: 4 genes
  • endomembrane_vesicle_trafficking: 4 genes
  • redox_energy_signaling: 3 genes
  • ribosome_translation: 3 genes
  • cell_wall_glycosylation_metabolism: 3 genes
  • low_annotation_candidate: 3 genes
  • membrane_protein_processing: 2 genes
  • rna_processing: 2 genes
  • autophagy_cellular_remodeling: 1 genes
  • cell_wall_peptide_metabolism: 1 genes
  • chromatin_histone: 1 genes
  • cytoskeleton_cell_cycle: 1 genes

Biological interpretation

The strict M5-switch gene set captures the broad M5 transcriptional pulse, including ribosome and growth-associated signals. The hidden ML-nominated candidates reveal smaller regulatory and remodeling modules, including protein turnover, transcriptional regulation, autophagy, endomembrane trafficking, RNA processing, redox signaling, chromatin, cytoskeleton, and cell-wall remodeling.

Key output files

  • phase5C_FINAL_curated_hidden_candidate_table.csv
  • phase5C_curated_hidden_module_summary.csv
  • phase5C_followup_tier_summary.csv
  • phase3C_stage_ablation_interpreted.csv
  • phase2B_trajectory_only_model_performance.csv
  • phase3A_deep_model_metrics.csv

Suggested project title

FloralSwitchFormer: interpretable machine learning identifies an M5-centered transcriptional switch during Arabidopsis floral transition

About

Interpretable machine-learning workflow for identifying an M5-centered transcriptional switch during Arabidopsis floral transition.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors