PaperBanana

Automated Academic Illustration for AI Scientists

Disclaimer: This is an unofficial, community-driven open-source implementation of the paper "PaperBanana: Automating Academic Illustration for AI Scientists" by Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, and Jinsung Yoon (arXiv:2601.23265). This project is not affiliated with or endorsed by the original authors or Google Research. The implementation is based on the publicly available paper and may differ from the original system.

An agentic framework for generating publication-quality academic diagrams and statistical plots from text descriptions. Supports OpenAI (GPT-5.2 + GPT-Image-1.5), Azure OpenAI / Foundry, and Google Gemini providers.

Two-phase multi-agent pipeline with iterative refinement
Multiple VLM and image generation providers (OpenAI, Azure, Gemini)
Input optimization layer for better generation quality
Auto-refine mode and run continuation with user feedback
CLI, Python API, and MCP server for IDE integration
Batch generation from a manifest file (YAML/JSON) for multiple diagrams in one run
Batch plots — paperbanana plot-batch runs many statistical plots from one manifest (CSV/JSON per item)
PDF inputs for methodology context (optional paperbanana[pdf] / PyMuPDF), with per-page selection
PaperBanana Studio — local Gradio web UI (paperbanana studio) for diagrams, plots, evaluation, batch, and run browser
Claude Code skills for /generate-diagram, /generate-plot, and /evaluate-diagram

Quick Start

Prerequisites

Python 3.10+
An OpenAI API key (platform.openai.com) or Azure OpenAI / Foundry endpoint
Or a Google Gemini API key (free, Google AI Studio)

Step 1: Install

pip install paperbanana

Or install from source for development:

git clone https://github.com/llmsresearch/paperbanana.git
cd paperbanana
pip install -e ".[dev,openai,google]"

Step 2: Get Your API Key

cp .env.example .env
# Edit .env and add your API key:
#   OPENAI_API_KEY=your-key-here
#   GOOGLE_API_KEY=your-key-here
#
# For Azure OpenAI / Foundry:
#   OPENAI_BASE_URL=https://<resource>.openai.azure.com/openai/v1
#
# Optional Gemini overrides:
#   GOOGLE_BASE_URL=https://your-gemini-proxy.example.com
#   GOOGLE_VLM_MODEL=gemini-2.0-flash
#   GOOGLE_IMAGE_MODEL=gemini-3-pro-image-preview

Or use the setup wizard for Gemini:

paperbanana setup

Step 3: Generate a Diagram

paperbanana generate \
  --input examples/sample_inputs/transformer_method.txt \
  --caption "Overview of our encoder-decoder architecture with sparse routing"

With input optimization and auto-refine:

paperbanana generate \
  --input my_method.txt \
  --caption "Overview of our encoder-decoder framework" \
  --optimize --auto

Output is saved to outputs/run_<timestamp>/final_output.png along with all intermediate iterations and metadata.

PaperBanana Studio (local web UI)

Install the optional Gradio dependency, then start the app:

pip install 'paperbanana[studio]'
paperbanana studio

Open the URL shown in the terminal (default http://127.0.0.1:7860/). The Studio exposes the same workflows as the CLI: methodology diagrams, statistical plots, comparative evaluation, continuing a prior run, batch manifests (methodology or plot batch via the Batch tab), and a simple browser for run_* / batch_* output folders. Use --host, --port, --config, and --output-dir as needed.

How It Works

PaperBanana implements a multi-agent pipeline with up to 7 specialized agents:

Phase 0 -- Input Optimization (optional, --optimize):

Input Optimizer runs two parallel VLM calls:
- Context Enricher structures raw methodology text into diagram-ready format (components, flows, groupings, I/O)
- Caption Sharpener transforms vague captions into precise visual specifications

Phase 1 -- Linear Planning:

Retriever selects the most relevant reference examples from a curated set of 13 methodology diagrams spanning agent/reasoning, vision/perception, generative/learning, and science/applications domains
Planner generates a detailed textual description of the target diagram via in-context learning from the retrieved examples
Stylist refines the description for visual aesthetics using NeurIPS-style guidelines (color palette, layout, typography)

Phase 2 -- Iterative Refinement:

Visualizer renders the description into an image
Critic evaluates the generated image against the source context and provides a revised description addressing any issues
Steps 4-5 repeat for a fixed number of iterations (default 3), or until the critic is satisfied (--auto)

Providers

PaperBanana supports multiple VLM and image generation providers:

Component	Provider	Model	Notes
VLM (planning, critique)	OpenAI	`gpt-5.2`	Default
Image Generation	OpenAI	`gpt-image-1.5`	Default
VLM	Google Gemini	`gemini-2.0-flash`	Free tier
Image Generation	Google Gemini	`gemini-3-pro-image-preview`	Free tier
VLM / Image	OpenRouter	Any supported model	Flexible routing

Azure OpenAI / Foundry endpoints are auto-detected — set OPENAI_BASE_URL to your endpoint. Gemini-compatible gateways are also supported — set GOOGLE_BASE_URL when needed.

CLI Reference

`paperbanana generate` -- Methodology Diagrams

# Basic generation
paperbanana generate \
  --input method.txt \
  --caption "Overview of our framework"

# With input optimization and auto-refine
paperbanana generate \
  --input method.txt \
  --caption "Overview of our framework" \
  --optimize --auto

# Continue the latest run with user feedback
paperbanana generate --continue \
  --feedback "Make arrows thicker and colors more distinct"

# Continue a specific run
paperbanana generate --continue-run run_20260218_125448_e7b876 \
  --iterations 3

# PDF as input (install PyMuPDF: pip install 'paperbanana[pdf]')
paperbanana generate \
  --input paper.pdf \
  --caption "Overview of our method" \
  --pdf-pages "3-8"

Flag	Short	Description
`--input`	`-i`	Path to methodology text file or PDF (required for new runs)
`--caption`	`-c`	Figure caption / communicative intent (required for new runs)
`--output`	`-o`	Output image path (default: auto-generated in `outputs/`)
`--iterations`	`-n`	Number of Visualizer-Critic refinement rounds (default: 3)
`--auto`		Loop until critic is satisfied (with `--max-iterations` safety cap)
`--max-iterations`		Safety cap for `--auto` mode (default: 30)
`--optimize`		Preprocess inputs with parallel context enrichment and caption sharpening
`--continue`		Continue from the latest run in `outputs/`
`--continue-run`		Continue from a specific run ID
`--feedback`		User feedback for the critic when continuing a run
`--pdf-pages`		PDF input only: 1-based pages (e.g. `1-5`, `2,4,6-8`; default: all)
`--vlm-provider`		VLM provider name (default: `openai`)
`--vlm-model`		VLM model name (default: `gpt-5.2`)
`--image-provider`		Image gen provider (default: `openai_imagen`)
`--image-model`		Image gen model (default: `gpt-image-1.5`)
`--format`	`-f`	Output format: `png`, `jpeg`, or `webp` (default: `png`)
`--config`		Path to YAML config file (see `configs/config.yaml`)
`--verbose`	`-v`	Show detailed agent progress and timing
`--progress-json`		Emit JSON progress events to stdout during generation

`paperbanana plot` -- Statistical Plots

paperbanana plot \
  --data results.csv \
  --intent "Bar chart comparing model accuracy across benchmarks"

Flag	Short	Description
`--data`	`-d`	Path to data file, CSV or JSON (required)
`--intent`		Communicative intent for the plot (required)
`--output`	`-o`	Output image path
`--iterations`	`-n`	Refinement iterations (default: 3)

`paperbanana batch` -- Batch Generation

Generate multiple methodology diagrams from a single manifest file (YAML or JSON). Each item runs the full pipeline; outputs are written under outputs/batch_<id>/run_<id>/ and a batch_report.json summarizes all runs.

paperbanana batch --manifest examples/batch_manifest.yaml --optimize

Manifest format (YAML or JSON with an items list):

items:
  - input: path/to/method1.txt
    caption: "Overview of our encoder-decoder"
    id: fig1
  - input: method2.txt
    caption: "Training pipeline"
    id: fig2
  - input: paper.pdf
    caption: "System overview"
    id: fig3
    pdf_pages: "4-9" # optional; PDF inputs only

Paths in the manifest are resolved relative to the manifest file's directory.

Generate a human-readable report from an existing batch run (Markdown or HTML):

paperbanana batch-report --batch-dir outputs/batch_20250109_123456_abc --format markdown
# or by batch ID (under default output dir)
paperbanana batch-report --batch-id batch_20250109_123456_abc --format html --output report.html

Diagram batch reports include batch_kind: methodology; plot batches use batch_kind: statistical_plot. Human-readable reports (paperbanana batch-report) show the batch kind when present.

Flag	Short	Description
`--manifest`	`-m`	Path to manifest file (required)
`--output-dir`	`-o`	Parent directory for batch run (default: outputs)
`--config`		Path to config YAML
`--iterations`	`-n`	Refinement iterations per item
`--optimize`		Preprocess inputs for each item
`--auto`		Loop until critic satisfied per item
`--format`	`-f`	Output image format (png, jpeg, webp)
`--auto-download-data`		Download expanded reference set if needed

`paperbanana plot-batch` -- Batch Statistical Plots

Generate multiple plots from a manifest (YAML or JSON). Each item specifies a data file (CSV or JSON) and an intent string, mirroring paperbanana plot. Outputs live under outputs/batch_<id>/run_<id>/ with the same batch_report.json and paperbanana batch-report workflow as diagram batches.

paperbanana plot-batch --manifest examples/plot_batch_manifest.yaml --optimize

Manifest format (items list):

items:
  - data: path/to/results.csv
    intent: "Bar chart comparing accuracy across models"
    id: fig_acc
  - data: other.json
    intent: "Scatter plot with trend line"
    aspect_ratio: "16:9"   # optional per item; CLI --aspect-ratio is the default when omitted

Paths are resolved relative to the manifest file’s directory.

Flag	Short	Description
`--manifest`	`-m`	Path to manifest (required)
`--output-dir`	`-o`	Parent directory for `batch_*` (default: outputs)
`--config`		Path to config YAML
`--vlm-provider`		VLM provider (default: gemini)
`--vlm-model`		VLM model override
`--image-provider`		Image gen provider
`--image-model`		Image gen model
`--iterations`	`-n`	Refinement iterations per item
`--auto`		Loop until critic satisfied per item
`--max-iterations`		Safety cap for `--auto`
`--optimize`		Input optimization per item
`--format`	`-f`	png, jpeg, or webp
`--save-prompts` / `--no-save-prompts`		Persist prompts (default: on, same as `plot`)
`--venue`		Venue style (neurips, icml, acl, ieee, custom)
`--aspect-ratio`	`-ar`	Default aspect ratio when not set in the manifest
`--verbose`	`-v`	Verbose logging

`paperbanana evaluate` -- Quality Assessment

Comparative evaluation of a generated diagram against a human reference using VLM-as-a-Judge:

paperbanana evaluate \
  --generated diagram.png \
  --reference human_diagram.png \
  --context method.txt \
  --caption "Overview of our framework"

Flag	Short	Description
`--generated`	`-g`	Path to generated image (required)
`--reference`	`-r`	Path to human reference image (required)
`--context`		Path to source context text file or PDF (required)
`--caption`	`-c`	Figure caption (required)
`--pdf-pages`		PDF context only: 1-based page selection (default: all)

Scores on 4 dimensions (hierarchical aggregation per the paper):

Primary: Faithfulness, Readability
Secondary: Conciseness, Aesthetics

`paperbanana studio` -- Local web UI

Requires pip install 'paperbanana[studio]' (Gradio).

paperbanana studio
paperbanana studio --port 8080 --output-dir ./my_outputs

Flag	Description
`--host`	Bind address (default `127.0.0.1`)
`--port`	Port (default `7860`)
`--share`	Create a temporary public Gradio link (do not use with sensitive data)
`--config`	Path to YAML config
`--output-dir` / `-o`	Default output directory for runs
`--root-path`	URL subpath when behind a reverse proxy

`paperbanana setup` -- First-Time Configuration

paperbanana setup

Interactive wizard that first asks whether to use the official Gemini API. If you choose official API, it follows the default AI Studio key flow; if not, it asks for a custom Gemini-compatible URL and API key.

Python API

import asyncio
from paperbanana import PaperBananaPipeline, GenerationInput, DiagramType
from paperbanana.core.config import Settings

settings = Settings(
    vlm_provider="openai",
    vlm_model="gpt-5.2",
    image_provider="openai_imagen",
    image_model="gpt-image-1.5",
    optimize_inputs=True,   # Enable input optimization
    auto_refine=True,       # Loop until critic is satisfied
)

pipeline = PaperBananaPipeline(settings=settings)

result = asyncio.run(pipeline.generate(
    GenerationInput(
        source_context="Our framework consists of...",
        communicative_intent="Overview of the proposed method.",
        diagram_type=DiagramType.METHODOLOGY,
    )
))

print(f"Output: {result.image_path}")

Progress callbacks: generate() and continue_run() accept an optional progress_callback argument. The pipeline invokes it with PipelineProgressEvent objects (stage, message, seconds, iteration, extra) at each step (optimizer, retriever, planner, stylist, visualizer, critic), so you can show progress in UIs or log timing without patching agents.

To continue a previous run:

from paperbanana.core.resume import load_resume_state

state = load_resume_state("outputs", "run_20260218_125448_e7b876")
result = asyncio.run(pipeline.continue_run(
    resume_state=state,
    additional_iterations=3,
    user_feedback="Make the encoder block more prominent",
))

See examples/generate_diagram.py and examples/generate_plot.py for complete working examples.

MCP Server

PaperBanana includes an MCP server for use with Claude Code, Cursor, or any MCP-compatible client. Add the following config to use it via uvx without a local clone:

{
  "mcpServers": {
    "paperbanana": {
      "command": "uvx",
      "args": ["--from", "paperbanana[mcp]", "paperbanana-mcp"],
      "env": { "GOOGLE_API_KEY": "your-google-api-key" }
    }
  }
}

Three MCP tools are exposed: generate_diagram, generate_plot, and evaluate_diagram.

The repo also ships with 3 Claude Code skills:

/generate-diagram <file> [caption] - generate a methodology diagram from a text file
/generate-plot <data-file> [intent] - generate a statistical plot from CSV/JSON data
/evaluate-diagram <generated> <reference> - evaluate a diagram against a human reference

See mcp_server/README.md for full setup details (Claude Code, Cursor, local development).

Configuration

Default settings are in configs/config.yaml. Override via CLI flags or a custom YAML:

paperbanana generate \
  --input method.txt \
  --caption "Overview" \
  --config my_config.yaml

Key settings:

vlm:
  provider: openai           # openai, gemini, or openrouter
  model: gpt-5.2

image:
  provider: openai_imagen    # openai_imagen, google_imagen, or openrouter_imagen
  model: gpt-image-1.5

pipeline:
  num_retrieval_examples: 10
  refinement_iterations: 3
  # auto_refine: true        # Loop until critic is satisfied
  # max_iterations: 30       # Safety cap for auto_refine mode
  # optimize_inputs: true    # Preprocess inputs for better generation
  output_resolution: "2k"

reference:
  path: data/reference_sets

output:
  dir: outputs
  save_iterations: true
  save_metadata: true

Environment variables (.env):

# OpenAI (default)
OPENAI_API_KEY=your-key
OPENAI_BASE_URL=https://api.openai.com/v1    # or Azure endpoint
OPENAI_VLM_MODEL=gpt-5.2                      # override model
OPENAI_IMAGE_MODEL=gpt-image-1.5              # override model

# Google Gemini (alternative, free)
GOOGLE_API_KEY=your-key
GOOGLE_BASE_URL=                            # optional custom Gemini-compatible endpoint
GOOGLE_VLM_MODEL=gemini-2.0-flash          # override Gemini VLM model
GOOGLE_IMAGE_MODEL=gemini-3-pro-image-preview  # override Gemini image model

Project Structure

paperbanana/
├── paperbanana/
│   ├── core/          # Pipeline orchestration, types, config, resume, utilities
│   ├── agents/        # Optimizer, Retriever, Planner, Stylist, Visualizer, Critic
│   ├── providers/     # VLM and image gen provider implementations
│   │   ├── vlm/       # OpenAI, Gemini, OpenRouter VLM providers
│   │   └── image_gen/ # OpenAI, Gemini, OpenRouter image gen providers
│   ├── reference/     # Reference set management (13 curated examples)
│   ├── guidelines/    # Style guidelines loader
│   └── evaluation/    # VLM-as-Judge evaluation system
├── configs/           # YAML configuration files
├── prompts/           # Prompt templates for all agents + evaluation
│   ├── diagram/       # context_enricher, caption_sharpener, retriever, planner, stylist, visualizer, critic
│   ├── plot/          # plot-specific prompt variants
│   └── evaluation/    # faithfulness, conciseness, readability, aesthetics
├── data/
│   ├── reference_sets/  # 13 verified methodology diagrams
│   └── guidelines/      # NeurIPS-style aesthetic guidelines
├── examples/          # Working example scripts + sample inputs
├── scripts/           # Data curation and build scripts
├── tests/             # Test suite
├── mcp_server/        # MCP server for IDE integration
└── .claude/skills/    # Claude Code skills (generate-diagram, generate-plot, evaluate-diagram)

Development

# Install with dev dependencies
pip install -e ".[dev,openai,google]"

# Run tests
pytest tests/ -v

# Lint
ruff check paperbanana/ mcp_server/ tests/ scripts/

# Format
ruff format paperbanana/ mcp_server/ tests/ scripts/

Citation

This is an unofficial implementation. If you use this work, please cite the original paper:

@article{zhu2026paperbanana,
  title={PaperBanana: Automating Academic Illustration for AI Scientists},
  author={Zhu, Dawei and Meng, Rui and Song, Yale and Wei, Xiyu
          and Li, Sujian and Pfister, Tomas and Yoon, Jinsung},
  journal={arXiv preprint arXiv:2601.23265},
  year={2026}
}

Original paper: https://arxiv.org/abs/2601.23265

Disclaimer

This project is an independent open-source reimplementation based on the publicly available paper. It is not affiliated with, endorsed by, or connected to the original authors, Google Research, or Peking University in any way. The implementation may differ from the original system described in the paper. Use at your own discretion.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
.claude/skills		.claude/skills
.github		.github
assets/img		assets/img
configs		configs
data		data
docs		docs
examples		examples
mcp_server		mcp_server
paperbanana		paperbanana
prompts		prompts
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
debug_pipeline_run.py		debug_pipeline_run.py
pyproject.toml		pyproject.toml
server.json		server.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperBanana

Quick Start

Prerequisites

Step 1: Install

Step 2: Get Your API Key

Step 3: Generate a Diagram

PaperBanana Studio (local web UI)

How It Works

Providers

CLI Reference

`paperbanana generate` -- Methodology Diagrams

`paperbanana plot` -- Statistical Plots

`paperbanana batch` -- Batch Generation

`paperbanana plot-batch` -- Batch Statistical Plots

`paperbanana evaluate` -- Quality Assessment

`paperbanana studio` -- Local web UI

`paperbanana setup` -- First-Time Configuration

Python API

MCP Server

Configuration

Project Structure

Development

Citation

Disclaimer

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PaperBanana

Quick Start

Prerequisites

Step 1: Install

Step 2: Get Your API Key

Step 3: Generate a Diagram

PaperBanana Studio (local web UI)

How It Works

Providers

CLI Reference

paperbanana generate -- Methodology Diagrams

paperbanana plot -- Statistical Plots

paperbanana batch -- Batch Generation

paperbanana plot-batch -- Batch Statistical Plots

paperbanana evaluate -- Quality Assessment

paperbanana studio -- Local web UI

paperbanana setup -- First-Time Configuration

Python API

MCP Server

Configuration

Project Structure

Development

Citation

Disclaimer

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`paperbanana generate` -- Methodology Diagrams

`paperbanana plot` -- Statistical Plots

`paperbanana batch` -- Batch Generation

`paperbanana plot-batch` -- Batch Statistical Plots

`paperbanana evaluate` -- Quality Assessment

`paperbanana studio` -- Local web UI

`paperbanana setup` -- First-Time Configuration

Packages