fix: implement batch generation for OpenAI models by ianliuy · Pull Request #1846 · dottxt-ai/outlines

ianliuy · 2026-04-14T20:32:18Z

What's broken?

Calling the generator with a list of prompts (vectorized/batched calls) crashes for OpenAI models. For example:

model = outlines.models.openai("gpt-4o-mini")
generator = outlines.generate.json(model, MyModel)
results = generator(["prompt1", "prompt2"])  # crashes

Who is affected?

Any user trying to batch multiple prompts with OpenAI (or AsyncOpenAI) models. Single-prompt calls work fine. This was originally reported against v0.1.13, and while the codebase has been completely restructured for v1.0, the underlying user need (batched generation) is still unmet — generate_batch() raises NotImplementedError.

When does it trigger?

Every time model.batch([...]) or generator.batch([...]) is called with an OpenAI model. 100% reproducible.

Where is the bug?

outlines/models/openai.py lines 295-303: OpenAI.generate_batch() raises NotImplementedError
outlines/models/openai.py lines 445-453: AsyncOpenAI.generate_batch() raises NotImplementedError

Why does it happen?

When batch support was added to the model hierarchy (commit c350ddc9), models with native batch APIs (Transformers, vLLM) got real implementations, while API-based models (OpenAI, Anthropic, etc.) got NotImplementedError stubs since their APIs don't support multi-prompt-in-one-call.

However, batch can be trivially implemented for API models by looping over individual generate() calls — each prompt becomes a separate API request. The async variant can use asyncio.gather() for concurrent execution.

How did we fix it?

Replaced the NotImplementedError stubs in OpenAI.generate_batch() and AsyncOpenAI.generate_batch() with implementations that:

Sync: Loop over inputs, calling self.generate() for each prompt
Async: Use asyncio.gather() to fire all prompts concurrently

This is a minimal, surgical change (~10 lines per method) that follows the existing generate() contract — input formatting, output type handling, and error handling are all delegated to the already-tested generate() method.

The same pattern could be applied to other API models (Anthropic, Gemini, Mistral, etc.) as a follow-up.

How do we know it works?

Updated test_openai_batch and test_openai_async_batch from asserting NotImplementedError to verifying correct batch results using mock clients
Tests cover both sync and async paths, including duplicate-prompt edge case
All 10 non-API-call OpenAI tests pass (no regressions)
Pre-commit checks pass (mypy, ruff, merge conflicts, debug statements)

cc @RobinPicard @cpfiffer

Replace NotImplementedError stubs in OpenAI.generate_batch() and AsyncOpenAI.generate_batch() with working implementations: - Sync: loops over inputs calling self.generate() for each prompt - Async: uses asyncio.gather() for concurrent execution Update tests from asserting NotImplementedError to verifying correct batch results using mock clients. Fixes dottxt-ai#1391 Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>

ianliuy mentioned this pull request Apr 14, 2026

Error in vectorized calls to OpenAI API #1391

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: implement batch generation for OpenAI models#1846

fix: implement batch generation for OpenAI models#1846
ianliuy wants to merge 1 commit intodottxt-ai:mainfrom
ianliuy:fix/issue-1391

ianliuy commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ianliuy commented Apr 14, 2026

What's broken?

Who is affected?

When does it trigger?

Where is the bug?

Why does it happen?

How did we fix it?

How do we know it works?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant