Version: v2.3.9, repo HEAD 326a2b489411a20ed742ff13701be39ba00063c8, running via the repo's own Dockerfile.
Setup: default provider (nv_build) with its default model (deepseek-ai/deepseek-v4-flash), scanning a small 2-file test skill.
What happens: the model prefixed its JSON with stray text — the raw response was We{"findings":[]} — and the response parse raised a pydantic ValidationError that nothing catches. The whole scan exits with code 2 and produces no report at all, discarding all completed analyzer work. Reproduced 3/3 on the same input.
Why it matters: this is the out-of-box configuration — default provider, default model — so a first-time user's very first scan can hard-crash on a routine LLM formatting slip. Smaller/OpenAI-compatible models (a use case your README explicitly supports, e.g. Ollama) emit malformed JSON at a meaningful rate, so any long scan becomes a lottery: one bad response out of hundreds of calls voids the run.
Suggested fix: wrap the per-analyzer LLM-response parse in a guard — retry once, then degrade that component to its static findings and continue — so a scan always yields a (possibly partial, clearly annotated) report instead of nothing.
Happy to provide the full stderr captures if useful.
Version: v2.3.9, repo HEAD
326a2b489411a20ed742ff13701be39ba00063c8, running via the repo's own Dockerfile.Setup: default provider (
nv_build) with its default model (deepseek-ai/deepseek-v4-flash), scanning a small 2-file test skill.What happens: the model prefixed its JSON with stray text — the raw response was
We{"findings":[]}— and the response parse raised a pydanticValidationErrorthat nothing catches. The whole scan exits with code 2 and produces no report at all, discarding all completed analyzer work. Reproduced 3/3 on the same input.Why it matters: this is the out-of-box configuration — default provider, default model — so a first-time user's very first scan can hard-crash on a routine LLM formatting slip. Smaller/OpenAI-compatible models (a use case your README explicitly supports, e.g. Ollama) emit malformed JSON at a meaningful rate, so any long scan becomes a lottery: one bad response out of hundreds of calls voids the run.
Suggested fix: wrap the per-analyzer LLM-response parse in a guard — retry once, then degrade that component to its static findings and continue — so a scan always yields a (possibly partial, clearly annotated) report instead of nothing.
Happy to provide the full stderr captures if useful.