Replies: 1 comment 1 reply
-
|
Thanks for the detailed write-up. This is very much the direction I want Short answer:
For task profiles, there is already NVFP4 / MXFP4 support is also on the roadmap. I have an issue for that here: #27. That work should include parsing those quant types, adding reasonable memory/speed/quality assumptions, and making sure Contributions would be very welcome. The most useful things would be:
I think this probably breaks down into a few focused pieces:
If you are interested, your RTX 5090 Laptop data would be especially useful because consumer laptop GPUs are exactly the kind of hardware that is underrepresented in public benchmarks. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, this is an amazing tool to use, I have been going through so many documentations, blogs, resources and running manual tests etc.. for understanding what the best models for agentic coding and openclaw usage are, based on my Hardware constraints. I have an RTX 5090 Laptop (24GB VRAM), so I can make use of NVFP4 quantized models, which is a big advantage for me, but considering other formats, other engines and all the benchmarks out there, along with the degradation of scores because of the quantizations of model weights and KV cache is something I have been looking up so that I can get the best models for various use cases like opencode agentic coding via local models, but still have good accuracies, max possible context lengths and nice speed, and in some cases energy efficiency as well. Or having various serving or inference scripts based on needs. But, the major hurdle has been comparing so many different models, quants, use cases and real benchmarks, cause the actual local usage on consumer hardware is something that isn't benchmarked anywhere.
This is really cool, but does it support vLLM engine, NVFP4 quants and need specific model evaluations as well ? Is there a way that I can contribute to this so that my particular difficulties are not faced by the community when trying to use open-source models, based on their needs and hardware.
Beta Was this translation helpful? Give feedback.
All reactions