Qwen3-VL 量化失败

MNN 版本号 3.5.0-62-g8637da94
整个流程就是

```shell
cd ~/MNN/transformers/llm/export
pip install -r requirements.txt
python llmexport.py --path ~/Qwen3-VL-4B-Instruct --export mnn --hqq --visual_quant_bit 8
```
我使用 https://www.modelscope.cn/models/Qwen/Qwen3-VL-4B-Instruct 这个版本的进行量化

量化的结果在这里
https://www.modelscope.cn/models/ghost7/Qwen3-VL-4B-Instruct-MNN_V-8bit/files

❓ 我在 MNNChat 上测试输入任何内容都得到无限的 `FFFFFFFF...`，所以我是哪一步有问题？

我如果希望语言部分能接近这个版本
https://www.modelscope.cn/models/MNN/Qwen3-VL-4B-Instruct-MNN/files
❓ 但视觉部分至少 8bit 或者 fp16，参数应该怎么设置？

❓ 另外如果我想用GPU Vulkan 推理或者NPU推理，是不是需要量化的时候选特别的参数？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3-VL 量化失败 #4485

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Qwen3-VL 量化失败 #4485

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions