I tried describing it in detail at jundot/omlx#1757
Not sure where the bug(s) lie but I think there are several and when they combine they are quite deadly :)
- memory explosion upon load without release on failure
- VLM->LLM fallback causing even more memory use
- subsequent OOM when it doesn't fit
- successful release when it does
I don't know if vision is supposed to work or not in this quant, but it works with gemma-4-26B-A4B-it-oQ8
I tried describing it in detail at jundot/omlx#1757
Not sure where the bug(s) lie but I think there are several and when they combine they are quite deadly :)
I don't know if vision is supposed to work or not in this quant, but it works with gemma-4-26B-A4B-it-oQ8