Multimodal face liveness detection system that combines IMU sensor data with video embeddings to distinguish genuine face presentations from spoofing attacks (screen spoofs, cardboard spoofs, etc.) on mobile devices.
The core idea: distill knowledge from a video encoder (teacher) into a lightweight IMU encoder (student) using cross-modal contrastive learning, so that at inference time only IMU data from the device's accelerometer and gyroscope is needed.
- Video encoder (teacher): InternVideo2.5 or VideoMAE
- IMU encoder (student): Mantis-8M with a projection MLP head
- Loss: COMODO — cross-modal distillation with an instance queue of negative samples and forward KL divergence
- Evaluation metrics: AUC, EER, APCER/BPCER at multiple thresholds, TPR @ FPR=0
├── train_vimo.py # Main ViMo training (IMU ↔ video distillation)
├── train_imu_encoder.py # Standalone IMU encoder fine-tuning
├── train_imu_encoder_mmea.py # IMU encoder training on MMEA dataset
├── train_eval_vimo_head.py # Train & evaluate SVM head on IMU embeddings
├── eval_imu_encoder.py # Evaluate fine-tuned IMU encoder
├── get_imu_embeds.py # Extract IMU embeddings for downstream tasks
├── configs/ # Training & evaluation configs (YAML)
├── utils/
│ ├── model_utils.py # IMUEncoder, VideoEncoder, VideoEncoderMAE
│ ├── dataset_utils.py # ViMoDataset, IMUDataset, S3 data loading
│ ├── loss.py # COMODOLoss
│ └── utils.py # Metrics, plotting, evaluation helpers
└── tools/
├── imu_viz.py # IMU trajectory visualization orchestrator
├── trajectory_calculator.py # Trajectory from quaternion + accelerometer
├── trajectory_visualizer.py # 3D trajectory rendering with video overlay
└── web_motion_check.py # Motion statistics analysis
Requires Python 3.12+.
uv syncAll training scripts use ClearML for experiment tracking, dataset management, and remote execution.
uv run python train_vimo.py --config configs/vimo_train.yamlTrains the IMU encoder to match video encoder embeddings using COMODO loss. Video embeddings are pre-computed and cached as .npy files.
uv run python train_vimo_mmea.py --config configs/vimo_mmea_train.yaml --dataset_path /path/to/UESTC-MMEA-CL/Trains ViMo distillation on the UESTC-MMEA-CL dataset. Expects train.txt and val.txt split files inside --dataset_path.
uv run python train_imu_encoder.py --config configs/imu_encoder_train.yamlFine-tunes Mantis-8M for binary genuine/spoof classification with BCE or cross-entropy loss.
uv run python train_eval_vimo_head.py --config configs/vimo_head_train_eval.yamlExtracts IMU embeddings and trains an SVM classifier with grid search.
uv run python eval_imu_encoder.py --config configs/imu_encoder_eval.yamlReports AUC, EER, accuracy, APCER/BPCER curves, and confusion matrix.
Configs are in configs/ as YAML files. Key parameters:
| Parameter | Default | Description |
|---|---|---|
train.lr |
3e-4 | Learning rate |
train.epochs |
50 | Training epochs |
train.batch_size |
128 | Batch size |
train.mlp_hidden_dim |
2048 | Projection MLP hidden dim |
train.mlp_output_dim |
128 | Embedding dimension |
train.queue_size |
2048 | COMODO instance queue size |
train.teacher_temp |
0.1 | Teacher softmax temperature |
train.student_temp |
0.05 | Student softmax temperature |
train.use_quat |
true | Use quaternion data (7 ch) or accel+gyro only (6 ch) |
train.num_of_frames |
16 | Video frames per sample |
Visualization and analysis utilities in tools/:
# IMU trajectory visualization
uv run python tools/imu_viz.py --imu_csv <path> --video <path> --config tools/config.yamlComputes 3D trajectory from IMU data (Savitzky-Golay smoothing, gravity compensation, stationary detection) and renders it overlaid on video frames.