|
11 | 11 | - [Quick Start](#-quick-start) |
12 | 12 | - [Advanced Setup](#-advanced-setup) |
13 | 13 | - [Development](#-development) |
| 14 | +- [Testing LLM Agents](#-testing-llm-agents) |
14 | 15 | - [Building](#%EF%B8%8F-building) |
15 | 16 | - [Credits](#-credits) |
16 | 17 | - [License](#-license) |
@@ -658,6 +659,147 @@ Run the command(s) in `frontend` folder: |
658 | 659 |
|
659 | 660 | Open your browser and visit the web app URL. |
660 | 661 |
|
| 662 | +## 🧪 Testing LLM Agents |
| 663 | + |
| 664 | +PentAGI includes a powerful utility called `ctester` for testing and validating LLM agent capabilities. This tool helps ensure your LLM provider configurations work correctly with different agent types, allowing you to optimize model selection for each specific agent role. |
| 665 | + |
| 666 | +The utility features parallel testing of multiple agents, detailed reporting, and flexible configuration options. |
| 667 | + |
| 668 | +### Key Features |
| 669 | + |
| 670 | +- **Parallel Testing**: Tests multiple agents simultaneously for faster results |
| 671 | +- **Comprehensive Test Suite**: Evaluates basic completion, JSON responses, function calling, and more |
| 672 | +- **Detailed Reporting**: Generates markdown reports with success rates and performance metrics |
| 673 | +- **Flexible Configuration**: Test specific agents or test groups as needed |
| 674 | + |
| 675 | +### Usage Scenarios |
| 676 | + |
| 677 | +#### For Developers (with local Go environment) |
| 678 | + |
| 679 | +If you've cloned the repository and have Go installed: |
| 680 | + |
| 681 | +```bash |
| 682 | +# Default configuration with .env file |
| 683 | +cd backend |
| 684 | +go run cmd/ctester/*.go -verbose |
| 685 | + |
| 686 | +# Custom provider configuration |
| 687 | +go run cmd/ctester/*.go -config ../examples/configs/openrouter.provider.yml -verbose |
| 688 | + |
| 689 | +# Generate a report file |
| 690 | +go run cmd/ctester/*.go -config ../examples/configs/deepinfra.provider.yml -report ../test-report.md |
| 691 | + |
| 692 | +# Test specific agent types only |
| 693 | +go run cmd/ctester/*.go -agents simple,simple_json,agent -verbose |
| 694 | + |
| 695 | +# Test specific test groups only |
| 696 | +go run cmd/ctester/*.go -tests "Simple Completion,System User Prompts" -verbose |
| 697 | +``` |
| 698 | + |
| 699 | +#### For Users (using Docker image) |
| 700 | + |
| 701 | +If you prefer to use the pre-built Docker image without setting up a development environment: |
| 702 | + |
| 703 | +```bash |
| 704 | +# Using Docker to test with default environment |
| 705 | +docker run --rm -v $(pwd)/.env:/opt/pentagi/.env vxcontrol/pentagi /opt/pentagi/bin/ctester -verbose |
| 706 | + |
| 707 | +# Test with your custom provider configuration |
| 708 | +docker run --rm \ |
| 709 | + -v $(pwd)/.env:/opt/pentagi/.env \ |
| 710 | + -v $(pwd)/my-config.yml:/opt/pentagi/config.yml \ |
| 711 | + vxcontrol/pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/config.yml -verbose |
| 712 | + |
| 713 | +# Generate a detailed report |
| 714 | +docker run --rm \ |
| 715 | + -v $(pwd)/.env:/opt/pentagi/.env \ |
| 716 | + -v $(pwd):/opt/pentagi/output \ |
| 717 | + vxcontrol/pentagi /opt/pentagi/bin/ctester -report /opt/pentagi/output/report.md |
| 718 | +``` |
| 719 | + |
| 720 | +#### Using Pre-configured Providers |
| 721 | + |
| 722 | +The Docker image comes with pre-configured provider files for OpenRouter or DeepInfra: |
| 723 | + |
| 724 | +```bash |
| 725 | +# Test with OpenRouter configuration |
| 726 | +docker run --rm \ |
| 727 | + -v $(pwd)/.env:/opt/pentagi/.env \ |
| 728 | + vxcontrol/pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/openrouter.provider.yml |
| 729 | + |
| 730 | +# Test with DeepInfra configuration |
| 731 | +docker run --rm \ |
| 732 | + -v $(pwd)/.env:/opt/pentagi/.env \ |
| 733 | + vxcontrol/pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/deepinfra.provider.yml |
| 734 | +``` |
| 735 | + |
| 736 | +To use these configurations, your `.env` file only needs to contain: |
| 737 | + |
| 738 | +``` |
| 739 | +LLM_SERVER_URL=https://openrouter.ai/api/v1 # or https://api.deepinfra.com/v1/openai |
| 740 | +LLM_SERVER_KEY=your_api_key |
| 741 | +LLM_SERVER_MODEL= # Leave empty, as models are specified in the config |
| 742 | +LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/openrouter.provider.yml # or deepinfra.provider.yml |
| 743 | +``` |
| 744 | + |
| 745 | +#### Running Tests in a Production Environment |
| 746 | + |
| 747 | +If you already have a running PentAGI container and want to test the current configuration: |
| 748 | + |
| 749 | +```bash |
| 750 | +# Run ctester in an existing container using current environment variables |
| 751 | +docker exec -it pentagi /opt/pentagi/bin/ctester -verbose |
| 752 | + |
| 753 | +# Generate a report file inside the container |
| 754 | +docker exec -it pentagi /opt/pentagi/bin/ctester -report /opt/pentagi/data/agent-test-report.md |
| 755 | + |
| 756 | +# Access the report from the host |
| 757 | +docker cp pentagi:/opt/pentagi/data/agent-test-report.md ./ |
| 758 | +``` |
| 759 | + |
| 760 | +### Command-line Options |
| 761 | + |
| 762 | +The utility accepts several options: |
| 763 | + |
| 764 | +- `-env <path>` - Path to environment file (default: `.env`) |
| 765 | +- `-config <path>` - Path to custom provider config (default: from `LLM_SERVER_CONFIG_PATH` env variable) |
| 766 | +- `-report <path>` - Path to write the report file (optional) |
| 767 | +- `-agents <list>` - Comma-separated list of agent types to test (default: `all`) |
| 768 | +- `-tests <list>` - Comma-separated list of test groups to run (default: `all`) |
| 769 | +- `-verbose` - Enable verbose output with detailed test results for each agent |
| 770 | + |
| 771 | +### Example Provider Configuration |
| 772 | + |
| 773 | +Provider configuration defines which models to use for different agent types: |
| 774 | + |
| 775 | +```yaml |
| 776 | +simple: |
| 777 | + model: "provider/model-name" |
| 778 | + temperature: 0.7 |
| 779 | + top_p: 0.95 |
| 780 | + n: 1 |
| 781 | + max_tokens: 4000 |
| 782 | + |
| 783 | +simple_json: |
| 784 | + model: "provider/model-name" |
| 785 | + temperature: 0.7 |
| 786 | + top_p: 1.0 |
| 787 | + n: 1 |
| 788 | + max_tokens: 4000 |
| 789 | + json: true |
| 790 | + |
| 791 | +# ... other agent types ... |
| 792 | +``` |
| 793 | + |
| 794 | +### Optimization Workflow |
| 795 | + |
| 796 | +1. **Create a baseline**: Run tests with default configuration |
| 797 | +2. **Experiment**: Try different models for each agent type |
| 798 | +3. **Compare results**: Look for the best success rate and performance |
| 799 | +4. **Deploy optimal configuration**: Use in production with your optimized setup |
| 800 | + |
| 801 | +This tool helps ensure your AI agents are using the most effective models for their specific tasks, improving reliability while optimizing costs. |
| 802 | + |
661 | 803 | ## 🏗️ Building |
662 | 804 |
|
663 | 805 | ### Building Docker Image |
|
0 commit comments