Skip to content

Using command line testing Qwen3-Omni raising error #1007

@qnxgy921

Description

@qnxgy921

Hi, I want to test the ASR module of Qwen3-Omni using shell command line like:

python -m mlx_vlm generate \   
  --model ../mlx_qwen3_omni_4bit \    
  --audio ../AUD-20240613-WA0035.wav \    
  --prompt "Transcribe this audio" \      
  --max-tokens 500

error raising that:

Traceback (most recent call last):
  File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1076, in process_inputs_with_fallback
    return process_inputs(
  File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1062, in process_inputs
    return process_method(**args)
  File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/site-packages/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py", line 156, in __call__
    audio_inputs = self.feature_extractor(audio, **output_kwargs["audio_kwargs"])
  File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/site-packages/transformers/models/whisper/feature_extraction_whisper.py", line 284, in __call__
    raw_speech = np.asarray(raw_speech, dtype=np.float32)
ValueError: could not convert string to float: '../AUD-20240613-WA0035.wav'

Traceback (most recent call last):
  File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1076, in process_inputs_with_fallback
    return process_inputs(
  File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1062, in process_inputs
    return process_method(**args)
  File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/site-packages/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py", line 156, in __call__
    audio_inputs = self.feature_extractor(audio, **output_kwargs["audio_kwargs"])
  File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/site-packages/transformers/models/whisper/feature_extraction_whisper.py", line 284, in __call__
    raw_speech = np.asarray(raw_speech, dtype=np.float32)
ValueError: could not convert string to float: '../AUD-20240613-WA0035.wav'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/__main__.py", line 23, in <module>
    submodule.main()
  File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/generate.py", line 1637, in main
    result = generate(
  File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/generate.py", line 856, in generate
    for response in stream_generate(model, processor, prompt, image, audio, **kwargs):
  File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/generate.py", line 635, in stream_generate
    inputs = prepare_inputs(
  File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1296, in prepare_inputs
    inputs = process_inputs_with_fallback(
  File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1088, in process_inputs_with_fallback
    raise ValueError(f"Failed to process inputs with error: {e}")
ValueError: Failed to process inputs with error: could not convert string to float: '../AUD-20240613-WA0035.wav'

My transformer version is 5.5.3

Any resolution?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions