Skip to content

Text-only models (e.g. deepseek-v4-pro via Ollama) 400 on image content: synthetic models claim false vision support #251

Description

@craigamcw

App Version: 3.3.1

Platform: Both (provider-independent; affects any text-only model not in the pi-ai registry)

Bug Description

When using a text-only model that isn't in pi-ai's registry (e.g. deepseek-v4-pro via Ollama Cloud), any request that carries image content — screenshots from the GUI/computer-use tools, or pasted images — fails with a provider 400, surfaced in the UI as an opaque "请求被上游拒绝(400)… invalid message format" / "invalid message format".

Steps to Reproduce

  1. Configure provider Ollama Cloud, model deepseek-v4-pro, base URL https://ollama.com/v1.
  2. Run a cowork task where the agent captures a screenshot (or otherwise include an image in the turn).
  3. The request fails with a 400.

Expected Behaviour

Images are dropped for a text-only model and the request succeeds as a text-only request.

Actual Behaviour

Hard 400 from the provider. Direct reproduction against the endpoint:

POST https://ollama.com/v1/chat/completions   (image_url content, model deepseek-v4-pro)
HTTP 400  {"error":"this model does not support image input"}

The same model/endpoint returns 200 for text-only requests (verified with system + user, tool round-trips, streaming, reasoning_effort, etc.).

Root Cause

deepseek-v4-pro isn't in pi-ai's registry, so the app builds a synthetic model (log: [ClaudeAgentRunner] Model not in pi-ai registry, using synthetic model: deepseek-v4-pro → openai-completions). buildSyntheticPiModel hard-codes:

// src/main/claude/pi-model-resolution.ts
input: ['text', 'image'],

Because the synthetic model advertises image support, the openai-completions provider's convertMessages does not filter image content (it only filters when !model.input.includes("image")). So image blocks are sent to a text-only endpoint → 400. The app then shows its generic "invalid message format".

Suggested Fix

Default synthetic models to input: ['text']. We can't know whether an arbitrary model supports vision, and a false vision claim hard-fails the whole request, whereas text-only merely drops images gracefully. Vision-capable models resolved from the pi-ai registry keep their real modalities; only synthetic fallbacks change. A PR implementing this follows.

Found alongside #248 / #249.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions