Hi, I want to test the ASR module of Qwen3-Omni using shell command line like:
python -m mlx_vlm generate \
--model ../mlx_qwen3_omni_4bit \
--audio ../AUD-20240613-WA0035.wav \
--prompt "Transcribe this audio" \
--max-tokens 500
error raising that:
Traceback (most recent call last):
File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1076, in process_inputs_with_fallback
return process_inputs(
File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1062, in process_inputs
return process_method(**args)
File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/site-packages/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py", line 156, in __call__
audio_inputs = self.feature_extractor(audio, **output_kwargs["audio_kwargs"])
File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/site-packages/transformers/models/whisper/feature_extraction_whisper.py", line 284, in __call__
raw_speech = np.asarray(raw_speech, dtype=np.float32)
ValueError: could not convert string to float: '../AUD-20240613-WA0035.wav'
Traceback (most recent call last):
File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1076, in process_inputs_with_fallback
return process_inputs(
File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1062, in process_inputs
return process_method(**args)
File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/site-packages/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py", line 156, in __call__
audio_inputs = self.feature_extractor(audio, **output_kwargs["audio_kwargs"])
File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/site-packages/transformers/models/whisper/feature_extraction_whisper.py", line 284, in __call__
raw_speech = np.asarray(raw_speech, dtype=np.float32)
ValueError: could not convert string to float: '../AUD-20240613-WA0035.wav'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/nxqian/miniforge3/envs/nxqian-mlx-vlm/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/__main__.py", line 23, in <module>
submodule.main()
File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/generate.py", line 1637, in main
result = generate(
File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/generate.py", line 856, in generate
for response in stream_generate(model, processor, prompt, image, audio, **kwargs):
File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/generate.py", line 635, in stream_generate
inputs = prepare_inputs(
File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1296, in prepare_inputs
inputs = process_inputs_with_fallback(
File "/Users/nxqian/Documents/vllm/mlx-vlm/mlx_vlm/utils.py", line 1088, in process_inputs_with_fallback
raise ValueError(f"Failed to process inputs with error: {e}")
ValueError: Failed to process inputs with error: could not convert string to float: '../AUD-20240613-WA0035.wav'
My transformer version is 5.5.3
Any resolution?
Hi, I want to test the ASR module of Qwen3-Omni using shell command line like:
error raising that:
My transformer version is 5.5.3
Any resolution?