Skip to content

fix: implement batch generation for OpenAI models#1846

Open
ianliuy wants to merge 1 commit intodottxt-ai:mainfrom
ianliuy:fix/issue-1391
Open

fix: implement batch generation for OpenAI models#1846
ianliuy wants to merge 1 commit intodottxt-ai:mainfrom
ianliuy:fix/issue-1391

Conversation

@ianliuy
Copy link
Copy Markdown

@ianliuy ianliuy commented Apr 14, 2026

What's broken?

Calling the generator with a list of prompts (vectorized/batched calls) crashes for OpenAI models. For example:

model = outlines.models.openai("gpt-4o-mini")
generator = outlines.generate.json(model, MyModel)
results = generator(["prompt1", "prompt2"])  # crashes

Who is affected?

Any user trying to batch multiple prompts with OpenAI (or AsyncOpenAI) models. Single-prompt calls work fine. This was originally reported against v0.1.13, and while the codebase has been completely restructured for v1.0, the underlying user need (batched generation) is still unmet — generate_batch() raises NotImplementedError.

When does it trigger?

Every time model.batch([...]) or generator.batch([...]) is called with an OpenAI model. 100% reproducible.

Where is the bug?

  • outlines/models/openai.py lines 295-303: OpenAI.generate_batch() raises NotImplementedError
  • outlines/models/openai.py lines 445-453: AsyncOpenAI.generate_batch() raises NotImplementedError

Why does it happen?

When batch support was added to the model hierarchy (commit c350ddc9), models with native batch APIs (Transformers, vLLM) got real implementations, while API-based models (OpenAI, Anthropic, etc.) got NotImplementedError stubs since their APIs don't support multi-prompt-in-one-call.

However, batch can be trivially implemented for API models by looping over individual generate() calls — each prompt becomes a separate API request. The async variant can use asyncio.gather() for concurrent execution.

How did we fix it?

Replaced the NotImplementedError stubs in OpenAI.generate_batch() and AsyncOpenAI.generate_batch() with implementations that:

  • Sync: Loop over inputs, calling self.generate() for each prompt
  • Async: Use asyncio.gather() to fire all prompts concurrently

This is a minimal, surgical change (~10 lines per method) that follows the existing generate() contract — input formatting, output type handling, and error handling are all delegated to the already-tested generate() method.

The same pattern could be applied to other API models (Anthropic, Gemini, Mistral, etc.) as a follow-up.

How do we know it works?

  • Updated test_openai_batch and test_openai_async_batch from asserting NotImplementedError to verifying correct batch results using mock clients
  • Tests cover both sync and async paths, including duplicate-prompt edge case
  • All 10 non-API-call OpenAI tests pass (no regressions)
  • Pre-commit checks pass (mypy, ruff, merge conflicts, debug statements)

cc @RobinPicard @cpfiffer

Replace NotImplementedError stubs in OpenAI.generate_batch() and
AsyncOpenAI.generate_batch() with working implementations:
- Sync: loops over inputs calling self.generate() for each prompt
- Async: uses asyncio.gather() for concurrent execution

Update tests from asserting NotImplementedError to verifying correct
batch results using mock clients.

Fixes dottxt-ai#1391

Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant