Skip to content

Commit cb41bf8

Browse files
authored
Merge pull request #17 from vxcontrol/14-enhancement-openrouter-configuration
Add ctester utility for testing LLM agents
2 parents 4848949 + 7da46ef commit cb41bf8

File tree

17 files changed

+2978
-37
lines changed

17 files changed

+2978
-37
lines changed

.vscode/launch.json

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,5 +45,23 @@
4545
"url": "http://localhost:8000",
4646
"webRoot": "${workspaceFolder}/frontend/src"
4747
},
48+
{
49+
"type": "go",
50+
"request": "launch",
51+
"name": "Launch Agents Tests",
52+
"program": "${workspaceFolder}/backend/cmd/ctester/",
53+
"envFile": "${workspaceFolder}/.env",
54+
"cwd": "${workspaceFolder}",
55+
"env": {},
56+
"args": [
57+
// "--config", "${workspaceFolder}/examples/configs/openrouter.provider.yml",
58+
"--config", "${workspaceFolder}/examples/configs/deepinfra.provider.yml",
59+
// "--report", "${workspaceFolder}/examples/tests/openrouter-report.md",
60+
"--report", "${workspaceFolder}/examples/tests/deepinfra-report.md",
61+
"--agents", "all",
62+
"--tests", "all",
63+
"--verbose"
64+
]
65+
},
4866
]
4967
}

Dockerfile

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,9 @@ RUN --mount=type=cache,target=/go/pkg/mod \
6161
# Build backend
6262
RUN go build -trimpath -o /pentagi ./cmd/pentagi
6363

64+
# Build ctester utility
65+
RUN go build -trimpath -o /ctester ./cmd/ctester
66+
6467
# STEP 3: Build the final image
6568
FROM alpine:3.21
6669

@@ -83,11 +86,17 @@ RUN mkdir -p \
8386
/opt/pentagi/ssl \
8487
/opt/pentagi/fe \
8588
/opt/pentagi/logs \
86-
/opt/pentagi/data
89+
/opt/pentagi/data \
90+
/opt/pentagi/conf
8791

8892
COPY --from=be-build /pentagi /opt/pentagi/bin/pentagi
93+
COPY --from=be-build /ctester /opt/pentagi/bin/ctester
8994
COPY --from=fe-build /frontend/dist /opt/pentagi/fe
9095

96+
# Copy provider configuration files
97+
COPY examples/configs/openrouter.provider.yml /opt/pentagi/conf/
98+
COPY examples/configs/deepinfra.provider.yml /opt/pentagi/conf/
99+
91100
COPY LICENSE /opt/pentagi/LICENSE
92101
COPY NOTICE /opt/pentagi/NOTICE
93102
COPY EULA.md /opt/pentagi/EULA

README.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
- [Quick Start](#-quick-start)
1212
- [Advanced Setup](#-advanced-setup)
1313
- [Development](#-development)
14+
- [Testing LLM Agents](#-testing-llm-agents)
1415
- [Building](#%EF%B8%8F-building)
1516
- [Credits](#-credits)
1617
- [License](#-license)
@@ -658,6 +659,147 @@ Run the command(s) in `frontend` folder:
658659

659660
Open your browser and visit the web app URL.
660661

662+
## 🧪 Testing LLM Agents
663+
664+
PentAGI includes a powerful utility called `ctester` for testing and validating LLM agent capabilities. This tool helps ensure your LLM provider configurations work correctly with different agent types, allowing you to optimize model selection for each specific agent role.
665+
666+
The utility features parallel testing of multiple agents, detailed reporting, and flexible configuration options.
667+
668+
### Key Features
669+
670+
- **Parallel Testing**: Tests multiple agents simultaneously for faster results
671+
- **Comprehensive Test Suite**: Evaluates basic completion, JSON responses, function calling, and more
672+
- **Detailed Reporting**: Generates markdown reports with success rates and performance metrics
673+
- **Flexible Configuration**: Test specific agents or test groups as needed
674+
675+
### Usage Scenarios
676+
677+
#### For Developers (with local Go environment)
678+
679+
If you've cloned the repository and have Go installed:
680+
681+
```bash
682+
# Default configuration with .env file
683+
cd backend
684+
go run cmd/ctester/*.go -verbose
685+
686+
# Custom provider configuration
687+
go run cmd/ctester/*.go -config ../examples/configs/openrouter.provider.yml -verbose
688+
689+
# Generate a report file
690+
go run cmd/ctester/*.go -config ../examples/configs/deepinfra.provider.yml -report ../test-report.md
691+
692+
# Test specific agent types only
693+
go run cmd/ctester/*.go -agents simple,simple_json,agent -verbose
694+
695+
# Test specific test groups only
696+
go run cmd/ctester/*.go -tests "Simple Completion,System User Prompts" -verbose
697+
```
698+
699+
#### For Users (using Docker image)
700+
701+
If you prefer to use the pre-built Docker image without setting up a development environment:
702+
703+
```bash
704+
# Using Docker to test with default environment
705+
docker run --rm -v $(pwd)/.env:/opt/pentagi/.env vxcontrol/pentagi /opt/pentagi/bin/ctester -verbose
706+
707+
# Test with your custom provider configuration
708+
docker run --rm \
709+
-v $(pwd)/.env:/opt/pentagi/.env \
710+
-v $(pwd)/my-config.yml:/opt/pentagi/config.yml \
711+
vxcontrol/pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/config.yml -verbose
712+
713+
# Generate a detailed report
714+
docker run --rm \
715+
-v $(pwd)/.env:/opt/pentagi/.env \
716+
-v $(pwd):/opt/pentagi/output \
717+
vxcontrol/pentagi /opt/pentagi/bin/ctester -report /opt/pentagi/output/report.md
718+
```
719+
720+
#### Using Pre-configured Providers
721+
722+
The Docker image comes with pre-configured provider files for OpenRouter or DeepInfra:
723+
724+
```bash
725+
# Test with OpenRouter configuration
726+
docker run --rm \
727+
-v $(pwd)/.env:/opt/pentagi/.env \
728+
vxcontrol/pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/openrouter.provider.yml
729+
730+
# Test with DeepInfra configuration
731+
docker run --rm \
732+
-v $(pwd)/.env:/opt/pentagi/.env \
733+
vxcontrol/pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/deepinfra.provider.yml
734+
```
735+
736+
To use these configurations, your `.env` file only needs to contain:
737+
738+
```
739+
LLM_SERVER_URL=https://openrouter.ai/api/v1 # or https://api.deepinfra.com/v1/openai
740+
LLM_SERVER_KEY=your_api_key
741+
LLM_SERVER_MODEL= # Leave empty, as models are specified in the config
742+
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/openrouter.provider.yml # or deepinfra.provider.yml
743+
```
744+
745+
#### Running Tests in a Production Environment
746+
747+
If you already have a running PentAGI container and want to test the current configuration:
748+
749+
```bash
750+
# Run ctester in an existing container using current environment variables
751+
docker exec -it pentagi /opt/pentagi/bin/ctester -verbose
752+
753+
# Generate a report file inside the container
754+
docker exec -it pentagi /opt/pentagi/bin/ctester -report /opt/pentagi/data/agent-test-report.md
755+
756+
# Access the report from the host
757+
docker cp pentagi:/opt/pentagi/data/agent-test-report.md ./
758+
```
759+
760+
### Command-line Options
761+
762+
The utility accepts several options:
763+
764+
- `-env <path>` - Path to environment file (default: `.env`)
765+
- `-config <path>` - Path to custom provider config (default: from `LLM_SERVER_CONFIG_PATH` env variable)
766+
- `-report <path>` - Path to write the report file (optional)
767+
- `-agents <list>` - Comma-separated list of agent types to test (default: `all`)
768+
- `-tests <list>` - Comma-separated list of test groups to run (default: `all`)
769+
- `-verbose` - Enable verbose output with detailed test results for each agent
770+
771+
### Example Provider Configuration
772+
773+
Provider configuration defines which models to use for different agent types:
774+
775+
```yaml
776+
simple:
777+
model: "provider/model-name"
778+
temperature: 0.7
779+
top_p: 0.95
780+
n: 1
781+
max_tokens: 4000
782+
783+
simple_json:
784+
model: "provider/model-name"
785+
temperature: 0.7
786+
top_p: 1.0
787+
n: 1
788+
max_tokens: 4000
789+
json: true
790+
791+
# ... other agent types ...
792+
```
793+
794+
### Optimization Workflow
795+
796+
1. **Create a baseline**: Run tests with default configuration
797+
2. **Experiment**: Try different models for each agent type
798+
3. **Compare results**: Look for the best success rate and performance
799+
4. **Deploy optimal configuration**: Use in production with your optimized setup
800+
801+
This tool helps ensure your AI agents are using the most effective models for their specific tasks, improving reliability while optimizing costs.
802+
661803
## 🏗️ Building
662804

663805
### Building Docker Image

0 commit comments

Comments
 (0)