Skip to content

feat(ml):test muon optimizer for b2b_vid_cls#846

Open
wr0124 wants to merge 1 commit into
jolibrain:masterfrom
wr0124:muon_test_J
Open

feat(ml):test muon optimizer for b2b_vid_cls#846
wr0124 wants to merge 1 commit into
jolibrain:masterfrom
wr0124:muon_test_J

Conversation

@wr0124

@wr0124 wr0124 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add Muon optimizer support for b2b_vid_cls.
  • Allow Muon to be selected from training options.
  • Update optimizer setup for Muon.

Command line

python3 -W ignore::FutureWarning -W ignore::UserWarning train.py \
--dataroot   path/to/dataset_vid  \
--data_relative_paths   \
--data_num_threads  4  \
--data_dataset_mode self_supervised_vid_labeled_mask_cls_online \
--data_load_size 256   \
--data_crop_size 256   \
--data_online_creation_crop_size_A 256 \
--data_online_creation_crop_delta_A  25 \
--data_online_creation_rand_mask_A \
--data_online_creation_mask_random_offset_A  0.1 0.1  \
--data_temporal_number_frames  2  \
--data_temporal_frame_step   1  \
--dataaug_flip both  \
--dataaug_diff_aug_policy color \
--dataaug_diff_aug_proba 1.0 \
--model_type  b2b   \
--model_input_nc 3 \
--model_output_nc 3 \
--G_netG vit_vid \
--G_vit_variant "JiTVid-B/16" \
--G_vit_disable_bottleneck \
--gpu_ids  1  \
--name   test_vid256_b2b_muon  \
--train_batch_size 10 \
--train_iter_size 2  \
--train_n_epochs 60000  \
--train_save_epoch_freq  10  \
--train_G_ema \
--train_optim  muon \
--train_G_lr 5e-4 \
--train_optim_weight_decay 0.0 \
--train_beta1 0.9 \
--train_beta2 0.95 \
--train_compute_metrics_test   \
--train_metrics_list PSNR    \
--train_metrics_every 8000  \
--test_batch_size 2 \
--with_amp \
--with_tf32 \
--with_torch_compile  \
--output_print_freq 200   \
--output_display_freq 4000  \
--alg_b2b_denoise_timesteps 2 5 20   \
--alg_b2b_cfg_scale 1.0 \
--alg_b2b_disable_inference_clipping \
--alg_b2b_loss_masked_region_only \
--alg_b2b_perceptual_loss LPIPS DISTS \
--alg_b2b_lambda_perceptual 0.1 \
--alg_b2b_loss pseudo_huber  \
--alg_b2b_autoregressive \
--alg_b2b_use_gt_prob 0.2 \
--alg_b2b_mask_as_channel  \
--G_vit_num_classes  3 \
--f_s_semantic_nclasses 3 \

@wr0124

wr0124 commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

Using the same configuration for running same lv_ring data set around 90 epochs, with muon and adamw optimizer. According to the loss, metric, and inference visual generation.

Adam is the slight winner on the saved compare metrics, but the margin is small and the side-by-side view does not show a clear visual advantage.

Metrics(based on the 2700 training images):

  • Adam: PSNR 25.8327, SSIM 0.89646, LPIPS 0.05656
  • Muon: PSNR 25.5687, SSIM 0.89429, LPIPS 0.05695
    So Adam is better on all three global metrics, but only marginally.

Visual conclusion( based on one marker video /videos/w26_adamw_muon_marker.avi ):

  • The generated frames in are very close.
  • The Adam and Muon outputs are not identical, but the difference is tiny and not obvious by
    eye.
  • There is no strong qualitative winner from the side-by-side output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant