Fix inf grad_norm on Qwen3.5 at seq_len > 65536 without flash-attn#582
Open
danielhanchen wants to merge 1 commit intomainfrom
Open
Fix inf grad_norm on Qwen3.5 at seq_len > 65536 without flash-attn#582danielhanchen wants to merge 1 commit intomainfrom
danielhanchen wants to merge 1 commit intomainfrom