Skip to content

buddyroo30/canvasxpress_gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

231 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CanvasXpress Generation System

License Python Docker

Generate CanvasXpress visualizations from natural language descriptions using Large Language Models (LLMs)

A backend service system that enables users to create scientific visualizations by describing them in plain English. Built with modern architecture and powered by LLMs and RAG (Retrieval Augmented Generation) technology.

🌟 Key Features

  • Natural Language Interface: Describe visualizations in plain English
  • Multi-LLM Support: Works with OpenAI GPT-4o, Google Gemini, AWS Bedrock, and Ollama
  • RAG-Enhanced Generation: Uses vector similarity search with BGE-M3 embeddings
  • High Accuracy: Achieves >97% accuracy through engineered prompts and few-shot examples
  • Backend Service Architecture: Designed as a service for CanvasXpress integration

πŸš€ Quick Start

Prerequisites

  • Docker
  • Make (for using Makefile commands)
  • Python 3.9+ Docker image (used by the containerized system)
  • Git

Setup & Run

Production Setup:

git clone https://github.com/buddyroo30/canvasxpress_gen.git
cd canvasxpress_gen
make build
make build_schema_context    # Generate schema information
make build_vector_db         # Create vector database for RAG
make run                     # Run as daemon (or 'make runi' for interactive) on port 5008

Development Setup:

git clone https://github.com/buddyroo30/canvasxpress_gen.git
cd canvasxpress_gen
# Build and run development environment
make build_dev
make build_schema_context_dev
make build_vector_db_dev
make run_dev  # Runs on port 5009

Fresh Environment Setup (for testing without cached dependencies):

git clone https://github.com/buddyroo30/canvasxpress_gen.git
cd canvasxpress_gen
make buildfresh              # Build without using Docker cache
make build_schema_context    # Generate schema information
make build_vector_db         # Create vector database for RAG
make run                     # Run as daemon on port 5008

Environment Variables (Optional)

Configure LLM API access and system behavior by setting environment variables, here is a complete list of available environment variables for system configuration:

LLM API Configuration:

export OPENAI_API_TYPE="azure"                    # OpenAI API type (azure or openai)
export OPENAI_API_KEY="your-openai-key"          # OpenAI API key
export AZURE_OPENAI_API_KEY="your-azure-key"     # Azure OpenAI API key
export AZURE_OPENAI_ENDPOINT="your-endpoint"     # Azure OpenAI endpoint URL
export OPENAI_API_BASE="your-base-url"           # OpenAI API base URL
export OPENAI_API_VERSION="2023-05-15"           # OpenAI API version
export AZURE_OPENAI_API_VERSION="2024-02-01"     # Azure OpenAI API version
export GOOGLE_API_KEY="your-google-key"          # For Google Gemini models

SiteMinder Authentication (Corporate Environments):

export SMVAL="True"                               # Enable SiteMinder validation
export SMLOGIN="your-login-url"                  # SiteMinder login URL
export SMTARGET="your-target-url"                # SiteMinder target URL
export SMFAILREGEX=".*<html.*AUTHENTICATION.*"   # Login failure regex pattern
export SMFETCHFAILREGEX=".*<title>BMS.*"         # Fetch failure regex pattern

System Configuration:

export DEV="True"                                 # Enable development mode
export NUM_FEW_SHOTS=25                          # Number of RAG examples to retrieve
export PORT=5000                                 # Server port (cx_llm_service only)
export SERVICE_URL="your-service-url"            # Service URL (cx_llm_service only)

Verify Setup

  1. Open browser to http://localhost:5008 (or your domain if deployed online)
  2. Upload a CSV/TSV data file with headers
  3. Describe your visualization in plain English

Example with automotive data:

  • "Box plot of cty grouped by manufacturer"
  • "Scatter plot of hwy vs cty colored by drv"
  • "Area graph of hwy with title 'Highway MPG Distribution'"

Note: This web interface is a quick and easy way to see the system in action and confirm it's working. The production interface is integrated directly into CanvasXpress.

πŸ”— Connecting to CanvasXpress

Production Integration

Configure CanvasXpress to use your running service:

// In your CanvasXpress configuration
var config = {
    // Your existing CanvasXpress configuration
    graphType: "Bar",
    title: "My Visualization",
    
    // Add LLM service configuration
    llmServiceURL: "http://localhost:5008/ask"  // or your domain: "https://your-domain.com:5008/ask"
};

var cx = new CanvasXpress("canvasId", data, config);

API Usage

curl -X POST http://localhost:5008/ask \
  -F "prompt=Create a scatter plot of hwy vs cty colored by manufacturer" \
  -F "datafile_contents=[[\"manufacturer\",\"hwy\",\"cty\"],[\"toyota\",35,28],[\"ford\",30,25]]"

Public vs Private Deployment

  • Public: Use the publicly available CanvasXpress instance at canvasxpress.org
  • Private: Run your own instance for data security within corporate networks

πŸ“– System Architecture

Backend Service Design

This system is designed as a backend service for CanvasXpress, not as a traditional Python library. Users interact with the system through:

  1. CanvasXpress Integration: Primary intended usage via CanvasXpress UI
  2. Direct API Calls: For custom integrations
  3. Development Interface: For testing and verification

Core Components

  • LLM Integration: Multiple LLM providers through unified interface
  • RAG System: Vector database (Milvus) with BGE-M3 embeddings for semantic search
  • Guided Autocomplete: Automatic synthetic example generation (part of main CanvasXpress library)
  • Modular Architecture: Professional Python package structure

πŸ› οΈ Configuration

Environment Variables

Create a .env file (commonly needed variables below, see above for exhaustive list):

# LLM API Keys (choose what you need)
GOOGLE_API_KEY=your_google_api_key_here
AZURE_OPENAI_API_KEY=your_azure_openai_key
AZURE_OPENAI_ENDPOINT=your_azure_endpoint

# RAG Configuration
NUM_FEW_SHOTS=25  # Number of examples to retrieve

# Optional: SiteMinder SSO for enterprise
SMVAL=False  # Set to True for enterprise SSO

LLM Models

Edit llm_models.json and copy to ~/.cache/:

{
  "gemini-1.5-flash": {
    "type": "google_gemini",
    "provider": "google"
  },
  "gpt-4o": {
    "type": "openai", 
    "provider": "openai"
  }
}

πŸ§ͺ Testing

The system includes 85+ comprehensive automated tests covering all components:

# Run all tests (uses real APIs if configured, mocks otherwise)
python -m pytest

# Run with coverage
python -m pytest --cov=src/canvasxpress_gen --cov-report=term-missing

# Integration tests only
python -m pytest tests/test_integration.py -v

Test Features:

  • βœ… Real API integration when keys available
  • βœ… Graceful fallback to mocks when unavailable
  • βœ… End-to-end RAG workflow validation
  • βœ… Works in fresh environments

For detailed testing instructions, see TESTING.md.

Code Organization

The codebase follows Python packaging standards:

src/canvasxpress_gen/
β”œβ”€β”€ llm/           # LLM service and model management
β”œβ”€β”€ rag/           # RAG system with embeddings and retrieval
└── utils/         # JSON, text, file, and auth utilities

πŸ“š Documentation

🀝 Contributing

We welcome contributions! See CONTRIBUTING.md for detailed guidelines on:

  • Development setup and workflow
  • Code standards and testing requirements
  • Submission process

πŸ“„ License & Citation

MIT License - see LICENSE file.

If you use this software in research, please cite:

@article{smith2024canvasxpress,
  title={Generating Visualizations Conversationally using Guided Autocomplete and LLMs},
  author={Smith, Andrew K and Neuhaus, Isaac},
  year={2024}
}

πŸ†˜ Support

  • Issues: Report bugs and request features via GitHub Issues
  • Documentation: Check this README and linked guides
  • Questions: Contact maintainers or open a discussion

Ready to integrate natural language visualization generation? πŸš€

Get Started | API Documentation | Contributing

About

Generating CanvasXpress visualizations from natural language descriptions of them using LLMs

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors