CanvasXpress Generation System

Generate CanvasXpress visualizations from natural language descriptions using Large Language Models (LLMs)

A backend service system that enables users to create scientific visualizations by describing them in plain English. Built with modern architecture and powered by LLMs and RAG (Retrieval Augmented Generation) technology.

🌟 Key Features

Natural Language Interface: Describe visualizations in plain English
Multi-LLM Support: Works with OpenAI GPT-4o, Google Gemini, AWS Bedrock, and Ollama
RAG-Enhanced Generation: Uses vector similarity search with BGE-M3 embeddings
High Accuracy: Achieves >97% accuracy through engineered prompts and few-shot examples
Backend Service Architecture: Designed as a service for CanvasXpress integration

🚀 Quick Start

Prerequisites

Docker
Make (for using Makefile commands)
Python 3.9+ Docker image (used by the containerized system)
Git

Setup & Run

Production Setup:

git clone https://github.com/buddyroo30/canvasxpress_gen.git
cd canvasxpress_gen
make build
make build_schema_context    # Generate schema information
make build_vector_db         # Create vector database for RAG
make run                     # Run as daemon (or 'make runi' for interactive) on port 5008

Development Setup:

git clone https://github.com/buddyroo30/canvasxpress_gen.git
cd canvasxpress_gen
# Build and run development environment
make build_dev
make build_schema_context_dev
make build_vector_db_dev
make run_dev  # Runs on port 5009

Fresh Environment Setup (for testing without cached dependencies):

git clone https://github.com/buddyroo30/canvasxpress_gen.git
cd canvasxpress_gen
make buildfresh              # Build without using Docker cache
make build_schema_context    # Generate schema information
make build_vector_db         # Create vector database for RAG
make run                     # Run as daemon on port 5008

Environment Variables (Optional)

Configure LLM API access and system behavior by setting environment variables, here is a complete list of available environment variables for system configuration:

LLM API Configuration:

export OPENAI_API_TYPE="azure"                    # OpenAI API type (azure or openai)
export OPENAI_API_KEY="your-openai-key"          # OpenAI API key
export AZURE_OPENAI_API_KEY="your-azure-key"     # Azure OpenAI API key
export AZURE_OPENAI_ENDPOINT="your-endpoint"     # Azure OpenAI endpoint URL
export OPENAI_API_BASE="your-base-url"           # OpenAI API base URL
export OPENAI_API_VERSION="2023-05-15"           # OpenAI API version
export AZURE_OPENAI_API_VERSION="2024-02-01"     # Azure OpenAI API version
export GOOGLE_API_KEY="your-google-key"          # For Google Gemini models

SiteMinder Authentication (Corporate Environments):

export SMVAL="True"                               # Enable SiteMinder validation
export SMLOGIN="your-login-url"                  # SiteMinder login URL
export SMTARGET="your-target-url"                # SiteMinder target URL
export SMFAILREGEX=".*<html.*AUTHENTICATION.*"   # Login failure regex pattern
export SMFETCHFAILREGEX=".*<title>BMS.*"         # Fetch failure regex pattern

System Configuration:

export DEV="True"                                 # Enable development mode
export NUM_FEW_SHOTS=25                          # Number of RAG examples to retrieve
export PORT=5000                                 # Server port (cx_llm_service only)
export SERVICE_URL="your-service-url"            # Service URL (cx_llm_service only)

Verify Setup

Open browser to http://localhost:5008 (or your domain if deployed online)
Upload a CSV/TSV data file with headers
Describe your visualization in plain English

Example with automotive data:

"Box plot of cty grouped by manufacturer"
"Scatter plot of hwy vs cty colored by drv"
"Area graph of hwy with title 'Highway MPG Distribution'"

Note: This web interface is a quick and easy way to see the system in action and confirm it's working. The production interface is integrated directly into CanvasXpress.

🔗 Connecting to CanvasXpress

Production Integration

Configure CanvasXpress to use your running service:

// In your CanvasXpress configuration
var config = {
    // Your existing CanvasXpress configuration
    graphType: "Bar",
    title: "My Visualization",
    
    // Add LLM service configuration
    llmServiceURL: "http://localhost:5008/ask"  // or your domain: "https://your-domain.com:5008/ask"
};

var cx = new CanvasXpress("canvasId", data, config);

API Usage

curl -X POST http://localhost:5008/ask \
  -F "prompt=Create a scatter plot of hwy vs cty colored by manufacturer" \
  -F "datafile_contents=[[\"manufacturer\",\"hwy\",\"cty\"],[\"toyota\",35,28],[\"ford\",30,25]]"

Public vs Private Deployment

Public: Use the publicly available CanvasXpress instance at canvasxpress.org
Private: Run your own instance for data security within corporate networks

📖 System Architecture

Backend Service Design

This system is designed as a backend service for CanvasXpress, not as a traditional Python library. Users interact with the system through:

CanvasXpress Integration: Primary intended usage via CanvasXpress UI
Direct API Calls: For custom integrations
Development Interface: For testing and verification

Core Components

LLM Integration: Multiple LLM providers through unified interface
RAG System: Vector database (Milvus) with BGE-M3 embeddings for semantic search
Guided Autocomplete: Automatic synthetic example generation (part of main CanvasXpress library)
Modular Architecture: Professional Python package structure

🛠️ Configuration

Environment Variables

Create a .env file (commonly needed variables below, see above for exhaustive list):

# LLM API Keys (choose what you need)
GOOGLE_API_KEY=your_google_api_key_here
AZURE_OPENAI_API_KEY=your_azure_openai_key
AZURE_OPENAI_ENDPOINT=your_azure_endpoint

# RAG Configuration
NUM_FEW_SHOTS=25  # Number of examples to retrieve

# Optional: SiteMinder SSO for enterprise
SMVAL=False  # Set to True for enterprise SSO

LLM Models

Edit llm_models.json and copy to ~/.cache/:

{
  "gemini-1.5-flash": {
    "type": "google_gemini",
    "provider": "google"
  },
  "gpt-4o": {
    "type": "openai", 
    "provider": "openai"
  }
}

🧪 Testing

The system includes 85+ comprehensive automated tests covering all components:

# Run all tests (uses real APIs if configured, mocks otherwise)
python -m pytest

# Run with coverage
python -m pytest --cov=src/canvasxpress_gen --cov-report=term-missing

# Integration tests only
python -m pytest tests/test_integration.py -v

Test Features:

✅ Real API integration when keys available
✅ Graceful fallback to mocks when unavailable
✅ End-to-end RAG workflow validation
✅ Works in fresh environments

For detailed testing instructions, see TESTING.md.

Code Organization

The codebase follows Python packaging standards:

src/canvasxpress_gen/
├── llm/           # LLM service and model management
├── rag/           # RAG system with embeddings and retrieval
└── utils/         # JSON, text, file, and auth utilities

📚 Documentation

API Documentation: Complete API reference for service endpoints
Integration Guide: CanvasXpress integration details
Testing Guide: Comprehensive testing instructions
Contributing Guide: Development and contribution guidelines

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for detailed guidelines on:

Development setup and workflow
Code standards and testing requirements
Submission process

📄 License & Citation

MIT License - see LICENSE file.

If you use this software in research, please cite:

@article{smith2024canvasxpress,
  title={Generating Visualizations Conversationally using Guided Autocomplete and LLMs},
  author={Smith, Andrew K and Neuhaus, Isaac},
  year={2024}
}

🆘 Support

Issues: Report bugs and request features via GitHub Issues
Documentation: Check this README and linked guides
Questions: Contact maintainers or open a discussion

Ready to integrate natural language visualization generation? 🚀

Get Started | API Documentation | Contributing

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
.github/workflows		.github/workflows
config_to_english		config_to_english
cx_llm_service		cx_llm_service
docs		docs
english_to_config		english_to_config
paper		paper
src/canvasxpress_gen		src/canvasxpress_gen
static		static
synth_examples		synth_examples
templates		templates
tests		tests
.gitignore		.gitignore
AESCipher.py		AESCipher.py
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
JOSS_reviews_response.txt		JOSS_reviews_response.txt
LICENSE		LICENSE
LLMCaller.pm		LLMCaller.pm
Makefile		Makefile
Makefile_mac		Makefile_mac
README.md		README.md
TESTING.md		TESTING.md
all_few_shots.json		all_few_shots.json
all_few_shots_dev.json		all_few_shots_dev.json
app.py		app.py
call_llm.pl		call_llm.pl
combine_few_shots.py		combine_few_shots.py
doc.json		doc.json
doc_dev.json		doc_dev.json
fix_bads.py		fix_bads.py
generate_schema_context.py		generate_schema_context.py
llm.py		llm.py
llm_models.json		llm_models.json
loo_crossval.py		loo_crossval.py
loo_crossval_subset.py		loo_crossval_subset.py
prompt.md		prompt.md
prompt_dev.md		prompt_dev.md
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
schema.txt		schema.txt
schema_dev.txt		schema_dev.txt
setup.py		setup.py
siteminder.py		siteminder.py
utils.py		utils.py
vectorize_schema_few_shots.py		vectorize_schema_few_shots.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CanvasXpress Generation System

🌟 Key Features

🚀 Quick Start

Prerequisites

Setup & Run

Environment Variables (Optional)

Verify Setup

🔗 Connecting to CanvasXpress

Production Integration

API Usage

Public vs Private Deployment

📖 System Architecture

Backend Service Design

Core Components

🛠️ Configuration

Environment Variables

LLM Models

🧪 Testing

Code Organization

📚 Documentation

🤝 Contributing

📄 License & Citation

🆘 Support

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CanvasXpress Generation System

🌟 Key Features

🚀 Quick Start

Prerequisites

Setup & Run

Environment Variables (Optional)

Verify Setup

🔗 Connecting to CanvasXpress

Production Integration

API Usage

Public vs Private Deployment

📖 System Architecture

Backend Service Design

Core Components

🛠️ Configuration

Environment Variables

LLM Models

🧪 Testing

Code Organization

📚 Documentation

🤝 Contributing

📄 License & Citation

🆘 Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages